Sie sind auf Seite 1von 493

EDITORIAL BOARD

Elbieta Goembska, Danuta Krzemiska, Emil Panek, Wiesawa Przybylska-Kapuciska,


Jerzy Schroeder (sekretarz), Ryszard Zieliski, Maciej ukowski (przewodniczcy)
REVIEWER
Adam Sagan
COVER DESIGN
Jacek Pietrzyski
PRODUCTION CONTROLLER
Magdalena Kraszewska, DRUK-BAD Usugi Wydawnicze

Copyright by Uniwersytet Ekonomiczny wPoznaniu


Pozna 2015

ISBN 978-83-7417-

POZNA UNIVERSITY OF ECONOMICS PRESS


ul. Powstacw Wielkopolskich 16, 61-895 Pozna, Poland
tel. +48 61 854 31 54, +48 61 854 31 55, fax +48 61 854 31 59
www.wydawnictwo-ue.pl, e-mail: wydawnictwo@ue.poznan.pl
postal adress: al. Niepodlegoci 10, 61-875 Pozna, Poland
Printed and bound in Poland by:
Pozna University Of Economics Print Shop
ul. Towarowa 53, 61-896 Pozna, Poland, tel. +48 61 854 38 06, +48 61 854 38 03

CONTENTS

Introduction .................................................................................................... 11


I. Theory of human values ............................................................................ 15
Basic theoretical assumptions underlying human values ..................................15
Typology of value contents by Rokeach and Schwartzs theory ...................17
Hierarchy of human values ...............................................................................21
Values comparison across selected disciplines ...................................................23
Axiology ...............................................................................................................23
Psychology and sociology ..................................................................................24
Culture anthropology .....................................................................................26
Economics ...........................................................................................................27
Human values needs, motivation, attitudes and behaviors ............................28
Relationship between values and personal identity ...........................................32
Consumers values in marketing ...........................................................................34
Personal values in theory of value utility .........................................................34
Personal values vs. product-services values for consumers inmarketing
research ........................................................................................................35
Subjective information in market segmentation psychographics and
personal values ...........................................................................................41
Product planning and promotional strategy in context of personal values.45

II. Measurement methodology .................................................................... 47


Selected notes on the history and notion of the measurement .........................47
History of measurement ....................................................................................47
General notion of measurement .......................................................................49
Explication and additional interpretation of the measurement theories ........53
Representational measurement theory ............................................................54

Contents

Operational measurement theory ....................................................................56


Classical theory of measurement ......................................................................58
Some measurement problems in context of the social sciences research ........61
The meaning of tests in classical test theory CTT ...........................................64
Conditions of CTT .............................................................................................68
Criticism of CTT ................................................................................................71
Item response theory IRT ...................................................................................72
Notion and origins of IRT .................................................................................72
Item characteristic curve ICC ........................................................................74
Classification of basic dichotomous data models in item response theory .76
Unidimensional logistic models .......................................................................79
Dichotomous vs. polytomous item response theory models ........................88
CTT and IRT some differences ..........................................................................91

III. Selected issues on scales classification and scaling .......................... 96


The link between measurement and scaling .......................................................96
Types of scales according to measurement levels ...............................................98
Admissible transformations on scales ..................................................................101
Criticism of the Stevenss scales in relation to statistical data analysis ............102
Attitudes and preferences scales underlying classification ............................105
Scaling on summated, cumulative and comparative scales ...............................107
Summated scales .................................................................................................107
Cumulative scales ...............................................................................................111
Comparative scales .............................................................................................114
Rokeachs, Schwartzs and Kahles scales for values measurement ....................115
The Rokeach value survey .................................................................................115
Schwartzs value survey ......................................................................................117
LOV list of values ............................................................................................117
A brief review of other measures-scales applied for values analysis ................118
Rating-ranking scales controversy in values measurement ..............................132

IV. Principles of items and scale development ......................................... 139


Single-item vs. multiitem scales in measurement ofconstructs .......................139
Process of scale development in aview of classical test theory ........................141
Formative and reflective indicators measuring theoretical construct .............145
Items development for the measured construct andrespective scale ..............150
Items identification .............................................................................................150
Items construction ..............................................................................................151
Dichotomous and Thurstone item formats .....................................................153

Contents

Items review ........................................................................................................155


Preliminary pilot tests of items .........................................................................156
Rossiters C-OAR-SE a new concept for scale development and its
criticism ...........................................................................................................157

V. Reliability and validity in aview ofClassical Test Theory CTT ... 161
Principles and meaning of reliability and validity ..............................................161
Reliability estimation ..............................................................................................165
Homogeneity or heterogeneity of the group ...................................................173
Standard error of measurement and estimation .............................................175
Reliability estimation for unidimensional reflective indicators ........................178
Selected methods of reliability estimation ...........................................................181
Test-retest method ..............................................................................................181
Parallel-test, alternate forms ..............................................................................184
Internal consistency reliability methods ..........................................................185
Alpha factor analysis and principal components reliability estimation ..........197
Types of validity ......................................................................................................200
Content validity ..................................................................................................200
Pragmatic-criterion validity ..............................................................................202
Construct validity ...............................................................................................204
Validation methods ................................................................................................205
Group differences analysis and measuring change in scores withtime
lapse ..............................................................................................................206
Correlation between ameasure of the construct and designated ................206
Factorial validity and multi-trait-multi-method ............................................207
Items analysis in reference to difficulty and discrimination indices ................209

VI. Exploratory (EFA) and confirmatory (CFA) factor analysis for scale
development .................................................................................................... 213
Relationship between factor analysis and classical test theory CTT ............213
Underlying aims of factor analysis in the field of statistics ...............................216
Differences between EFA and CFA ......................................................................218
Common factor analysis model CFAM ............................................................225
Variance decomposition, matrix of correlation and factor loadings ...............230
Principal component analysis PCA vs. common factor analysis ..............234
Methods of factor loadings estimation and factors extraction .....................239
Selected approaches to communality estimation ...........................................247
Number of factors ...............................................................................................250

Contents

Geometrical identification and techniques of factors rotation .....................256


Factor scores analysis .........................................................................................267
Sample size and soundness of observed variables ..........................................271
Interpretation of factors and factor indeterminacy .......................................273
Confirmatory factor analysis model CFA ........................................................276
Model of CFA ......................................................................................................276
CFA unstandardized vs. standardized solution and covariancemean
structure ......................................................................................................280
Scaling latent variable factor ..........................................................................281
CFA model identification ..................................................................................282
CFA fitting function and methods of estimation ...........................................293
CFA model evaluation selected fit indices ...................................................297
Statistical power and significance of the CFA models parameter
estimates ......................................................................................................312
Alternative models in CFA ................................................................................316
Respecification of CFA model ...........................................................................317
Sample size and distributional properties of observed variables inCFA ....319
Reporting practices and final remarks about the process of CFA model
construction ................................................................................................322
Measurement invariance and multi group confirmatory factor analysis
MGCFA ............................................................................................................325

VII. Scale development for hedonicconsumerism values .................... 332


Scale construction underlying assumptions ....................................................332
Hedonism construct definition .....................................................................332
Hedonism in context of the consumerism ......................................................334
Relationship between consumption, hedonism and utilitarism ...................335
Hedonic vs. utilitarian benefits and the sense of guilt ...................................337
Some other influence of hedonic values on consumers behavior ...............339
Research methodology ...........................................................................................341
The rationale choice of young consumers as asample for personal values
analysis .........................................................................................................342
Exploratory interviews and pilot study in reference to initial pool of
items .............................................................................................................344
Sample selected characteristics from data collection .................................347
Empirical analysis ...................................................................................................349
Items and data preliminary screening .............................................................349
Items correlation and their adequacy for factorial model .............................351
Exploration of dimensionality and number of components factors .........354

Contents

Comparison of extraction methods and rotations in exploratory factors


analysis EFA .............................................................................................361
Hierarchical exploratory factor analysis HEFA ...........................................375
Scale reliability in reference to exploratory factor analysis ...........................378
Confirmatory factory analysis CFA ..............................................................384
Structures and types of CFA models ................................................................386
CFA Model 1 fit evaluation and rejection ....................................................393
CFA model 2 with further split into model 2a and 2b comparison ..........402
CFA models (2a and 2b) parameter estimates .............................................408
Ultimate CFA model in the context of reliability and validity ......................416
Hedonic-consumerism values (HCV) ultimate scale and its implications
for marketing ..................................................................................................422
Measurement and scales development in marketing field conclusions ........424

Appendix questionnaire ............................................................................ 428


References ........................................................................................................ 431
List of figures ................................................................................................... 470
List of tables .................................................................................................... 472
Subject and author index .............................................................................. 476

To my wife and children

INTRODUCTION

Contemporarily many scientific and marketing research projects are the


subject to serious and complex methodological problems related to the
measurement. These problems cast ashadow on the ways of the scales development. They arise indirectly of many environmental conditions, reflecting for example ahigh level of dynamic growth in society or the advanced
technology development.
On both sides (science and marketing practice), regardless of the adopted
goals and acceptable methods of activity (e.g. aimed at increasing the knowledge level about the external environment), the quality of information plays
asignificant role. In fact, our comprehension of this world depends largely
on what and how we can measure and retain. We may also fair to say that
the information determining our perception of the world and the knowledge
about the environment (in the broad light of Darwins theory) helps us, not
only in terms of expansion, but above all, in life survival. Hence, the high
quality information either facilitate or impede our existence. We can also
conclude that the way we approach to the data and identify their sources and
the methods which we select in their defining, classifying, sorting (based on
good patterns of measurement), pre-determines the quality of information.
Therefore, the measurement issues and the measurement instruments are of
huge importance.
These reasons inspired author to write abook dedicated to the construction of scales. This work had three main objectives. The first objective was to
provide the solid theoretical background for all interested in development of
measurement scales. The measurement applications and their implications

12

Introduction

for people working in the sphere of marketing practice were also considered.
The second objective related to the teaching area which author wished to accomplish indirectly through this monograph. Finally, the last objective was
based entirely on the empirical analysis of the consumers personal values.
In particular, the attention was paid to scale developed for measuring the
hedonism and consumerism. This scale will be important for those readers
interested in the exploration of consumers values, who would like to transform them into possible marketing applications undertaken in conjunction
with: 1). market segmentation, 2). identification and selection of the most
profitable consumers, or 3). selection of appropriate marketing-mix instruments.
Structure of the book is as follows. It consists of seven chapters. The first
chapter is focused on the types and hierarchy of values in general across different social science disciplines. This description is further supplemented
with the human needs, motivation, personal identity and the sequence taking place between values, attitudes and behaviors.
The second chapter describes the measurement issues such as: the nature
and origin of the measurement, logical foundations of the measurement,
measurement problems in the context of social sciences research, the meaning of tests and classical test theory vs. item response theory. Finally, there
are considered differences occurring between measurement theories.
In the third chapter, there were presented selected approaches to scales
classification (e.g. according to classical Stevens configuration). There were
also mentioned issues in reference to the links between the measurement
and scaling. A comparison of the scaling types, e.g. the summated, cumulative and comparative scale, was also provided. More importantly, the author has discussed different types of scales based on values. In consequence,
the Rokeachs, Schwartzs and Kahles contribution to personal values measurement was presented. Discussion is further continued on the subject of
advantages and disadvantages of the rating and ranking scales for values
measurement.
The chapter fourth pertains to broader aspects associated with the design
and selection of appropriate items to scale development. Some preferences
over the choice of single or multi-items in scale development are considered.
The author also describes the differences between formative and reflective
indicators measuring theoretical construct. At last, attention is paid to the
Rossiters C-OAR-SE, that is, anew concept for scale development.
The fifth chapter presents issues on the reliability and validity in aview of
classical test theory. In particular, the following issues are discussed in the

Introduction

13

context of: the homogeneity of sample, reliability standard error of measurement, correction for attenuation in reliability, methods of reliability estimation and methods of validity assessment.
The sixth chapter refers to exploratory (EFA) and confirmatory factor
analysis (CFA) responsible for developing and finalizing the scales. The following aspects are taken into account as: the relationships between factor
analysis and classical test theory, differences between EFA and CFA models,
methods of factor loadings estimation and factors extraction, selected approaches to communality estimation, rotation techniques, sample size and
soundness of observed variables, CFA model: identification, evaluation and
respecification, as well as issues in reference to measurement invariance and
multi-group confirmatory factor analysis.
The final chapter, seventh, is based on the empirical research findings and
presents selectively applications of the previously discussed theoretical issues in relation to the measurement values. The principal methods of scale
development in the consumers personal values have included exploratory
confirmatory factor analysis, as well as the reliability and validity analysis. The author strove to offer the reader in-depth description and practical
hints, concerning the analysis. Thus, one attempted to encourage reader to
try different solutions in order to find the best match from different perspectives. For the author, it was important to stress several passes (as with
different types of factor models) through application of different analytical
procedures, which need to be made, until the results would yield necessary
and satisfactory information for the investigated problem. The effects of empirical analysis, data quality were also discussed.
The list of all issues raised in this book, according to the authors point of
view, certainly does not exhaust the subject of investigation. However, author hopes that this subject of investigation will throw, to some extent, anew
light on the perspective of the measurement and scales development, especially pertaining to consumers personal values. The work will be useful for
awide range of researchers, academic workers (involved in the measurement
and methodology of scales construction) who want to analyze personal values, or those who want to use them in their own research. It will be also ideal
resource for marketing professionals familiar with classical testing principles
and for participants of the graduate level courses in business, consumer, psychology, sociology and other social sciences or as asupplement for courses
on applied statistics, multivariate statistics and research design.
Finally, the author would like to make his sincere thanks to those people
who gave him many insightful comments on particular stages of the book

14

Introduction

preparation. They are: dr Cyprian Kozyra, dr Pawe uraw and prof. Jerzy
Niemczyk from Wrocaw University of Economics, dr Micha Podgrski
from Analyx, dr Beata Pachnowska from IMAS International and dr Robert
Skikiewicz from Poznan University of Economics.
The author would also like, with all his heart, to thank prof. Adam Sagan
(i.e. the reviewer from Cracow University of Economics). Without his indepth comments, this book would probably have never been so well developed.

I. THEORY OF HUMAN VALUES

Basic theoretical assumptions underlying human values


Perhaps the most influential definition of values can be traced back to Kluckhohn [1951, p.395] where he explained that a value is a conception, explicit
or implicit, distinctive of an individual or characteristic of a group which
influences the selection from available modes, means, and ends of action.
This definition was influential in behaviorist time because of its focus on
the potential, both for action and reward and because it covered individuals
and group aspects. Lesthaeghe and Moors [2000] argued that Kluckhohn has
taken a more deterministic view according to which values were treated as
cultural imperatives that necessarily lead to certain actions. They contrasted this view with the other common definition of values given by Rokeach
[1973, p. 5] who described them in context of enduring beliefs, where a specific mode of conduct is personally or socially preferable to an opposite or
converse mode of conduct or end-state of existence. According to Rokeachs
point of view, value system is an enduring organization of beliefs concerning
preferred modes of conduct or end-states along an importance continuum.
Basically, he conceived a personality as a system of values1.
Rokeach in his efforts investigated mainly the role of values in public
opinion research. He constructed a model which posits that beliefs, attitudes, and values are all organized together into a functionally integrated
cognitive system [Rokeach 1968, 1973]. Within this system, beliefs represent the most basic element and may be considered simple propositions,
1

tion.

Kluckhohn emphasized more action and Rokeach saw values as giving meaning to ac-

16

Theory of human values

c onscious or unconscious, and also may be inferred from what a person says
or does. As he claimed [Rokeach 1968, p.8788] the content of a belief may
describe the object of that belief as true or false, correct or incorrect, evaluate
it as good or bad, or advocate a certain course of action or a certain existence
as desirable or undesirable. Thus, the first kind of a belief may be called as
descriptive or existential belief (for example I believe that the sun rises in
the east). The second kind of a belief may be called an evaluative belief (Ibelieve that this ice cream is good). The third kind may be called a prescriptive
belief (Ibelieve it is desirable that children should obey their parents).
In Rokeachs model, value is viewed as a belief which guides actions and
judgments across specific situations and beyond immediate goals to more
ultimate end-states of existence. The distinction between preferable modes
of behavior and preferable end-states of existence implies a differentiation
between means and ends or what Rokeach called instrumental and terminal values. Instrumental values relate to modes of conduct and include such
characteristics as ambition, independence, and responsibility. On the other
hand, terminal values describe the individuals desired end-state of existence
and include such conditions as an exciting life, family security, and salvation.
Schwartz and Bilsky presented additional view on the characteristics
of values. As they explained a value is: 1) a belief, 2) pertaining to desirable end states or modes of conduct, that 3) transcends specific situations,
4) guides selection or evaluation of behavior, people and events and 5) is
ordered by importance relative to other values to form a system of value
priorities [Schwartz and Bilsky 1987, p.551; see also Schwartz and Bilsky
1990; Schwartz 1992]. These are formal characteristics that distinguish in
their opinion values from such related concepts as needs and attitudes (further discussed in text). They concluded, for example, that security and independence are values, whereas thirst and preference (e.g. for blue ties) are not.
However, the crucial content aspect that distinguishes values is the type of
motivational goal they can express. Such a being case, different contents of
values have different universal requirements in human existence:
biologically based organism needs,
social interactional requirements for interpersonal coordination,
social institutional demands for group welfare and survival.
Groups and individuals represent these requirements cognitively (as specific
values about which they communicate), in order to explain, coordinate and
rationalize their behavior.
What else are values? In short, values are evaluative beliefs that synthesize affective and cognitive elements, in order to orient people to the world

Basic theoretical assumptions underlying human values

17

in which they live [Marini 2000, p.2828]. Values are derived in part from
but also influence ideologies [Maio et al. 2003].
Moreover, values are also treated as static mental structures, with the emphasis on their place within action. Values do not act only as internalized
schemata. They play an important role in human action. Values, commonly
conceived of as ideal ends within an action situation, need to incorporate the
means through which they will be reached. Much empirical psychology artificially separated ends and means when conceptualizing and studying human values and action. Pragmatists hold this to be a mistake [Dewey 1939]2.
In Verplanken and Hollands point of view, one can link values and action through four sequential processes where at first values must be activated. Secondly, values are motivational and lead toward the privileging of
certain actions over others. Next they lead to the third process, the influence of values on attention, perception, and interpretation within situations.
And finally values (when activated), influence the planning of action. Values
motivate behavior, but compete with normative pressures [Verplanken and
Holland 2002].

Typology of value contents by Rokeach and Schwartzs theory


Rokeach [1973] formed an approach that he never elaborated to classify
values according to the societal institutions that specialize in maintaining,
enhancing, and transmitting them (e.g. family values). Lacking a theory
of value types from which values could be sampled systematically to build
a value survey, Rokeach sought comprehensive coverage instead. He did
this by reducing the vast number of values (through the agency of interviews with Michigan samples) and had implied by personality-trait words to
asmaller set of values that were maximally different conceptually and minimally intercorrelated empirically. Having based on empirical analyses, then
concluded it is unlikely that 36 values can be e ffectively reduced to some
2

Although, as Dewey held, values take root in us and are generally the basis for our goals
[Joas 2000]. Values operate as guiding mechanisms and valuing process occurs necessarily as
we encounter the world [Mandler 1993]. This active process might look like as Feather [1995]
demonstrated, highlighting the fact that judging objects according to mans value standards
is often an effortless process. Feather holds that we appraise objects, actions, situations, and
people in relation to our values without engaging a great deal of cognitive effort. Values serve
as latent guides for evaluations of the social world without themselves requiring much reflection. Values operate much as the pragmatists described them, rarely consciously applied to an
action.

18

Theory of human values

smaller number of factors [Rokeach 1973, p.44]. Nonetheless, Rokeach did


not abandon the idea of value types where he distinguished personal from
social values, moral from competence values and so on.
Later on, Schwartz tried to resolve the issue of classifying value contents.
Somewhat modifying earlier definitions of values, he defined values as desirable transsituational goals, varying in importance, that serve as guiding
principles in the life of a person or other social entity. He believed values
as goals which: 1) serve the interests of some social entity, 2) can motivate
action-giving it direction and emotional intensity, 3) function as standards
for judging and justifying action and 4) are acquired both through socialization to dominant group values and through the unique learning experiences
of individuals. However, these characteristics, tell us nothing about the substantive content of values, i.e. what different types of values there are.
In the end, Schwartz [1992, 1994] developed empirically a schematic representation of almost universal structure of human values. He constructed
ten motivationally distinct types of values3. For example, conformity was derived from the prerequisite of smooth interaction and group survival, which
prescribes that individuals should restrain impulses and inhibit actions that
might hurt others. And the motivational type self-direction was derived from
organic needs for mastery and from the interaction requirements of autonomy and independence [Schwartz and Bilsky 1987, 1990; Schwartz 1992].
The ten Schwartzs value types were listed in the first column of Table 1,
where each was defined in terms of its central goal. And the second column
lists exemplary specific values that primarily represent each type. When
people act in ways that express these specific values or lead to their attainment, they promote the central goal of the value type. Column three lists the
universal requirements of human existence from which each value type was
derived.
Several researchers have also formed typologies of value contents empirically [Feather and Peay 1975; Mahoney and Katz 1976; Hofstede 1980;
Braithwaite and Law 1985; Crosby, Bitner and Gill 1990]. However, they
3

However, there is an almost infinite number of specific values one could study. There
are significant theoretical and practical advantages to identify a limited set of value types that
are recognized in various human groups and used to form priorities. These features are also
silent regarding the structure of relationships among different types of values-what values are
compatible or are likely to come into conflict with one another. By identifying a structure in
the relationships among these types of values, we can advance from studying associations of
particular single values with other variables to studying associations with the whole system
ofvalues.

Basic theoretical assumptions underlying human values

19

Table 1. Motivational types of values


Definition
Power: Social status and prestige, control or predominance over people and
resources
Achievement: Personal success through
demonstrating competence according to
social standards.
Hedonism: Pleasure and sensuous gratification for oneself. Stimulation: excitement, novelty, and challenge in life
Stimulation: Excitement, novelty, and
challenge in life
Self-direction: Independent thought
and action-choosing, creating, exploring
Universalism: Understanding, appreciation, tolerance, and protection for the
welfare of all people and for nature
Benevolence: Preservation and enhancement of the welfare of people
with whom one is in frequent personal
contact
Tradition: Respect, commitment, and
acceptance of the customs and ideas that
traditional culture or religion provide
Conformity: Restraint of actions, inclinations, and impulses likely to upset or
harm others and violate social expectations or norms
Security: Safety, harmony, and stability
of society, of relationships, and of self

Exemplary values
social power authority,
wealth

Sources
interaction group

successful capable ambitious

interaction group

pleasure enjoying life

organism

daring and varied life,


organism
exciting life
creativity, curious, freedom organism,
interaction group
broad-minded, social jus- organism
tice, equality,
protecting the environment
helpful, honest, forgiving
organism,
interaction group

humble, devout,
accepting my portion in
life
politeness, obedient, honoring parents and elder

interaction group

national security,
social order, clean

organism,
interaction group

interaction group

Source: Schwartz 1994, p.22.

have not followed Rokeachs intuition that at least some of them might be
interdependent and that they might stand in opposition to one another (e.g.
moral vs. competence, personal vs. social). Instead, they have treated these
different dimensions of values as independent. Consequently, they have
not suggested ways to conceptualize value systems as coherent structures
[Schwartz 1994].
The structure of value relations is based on the assumption that actions
taken in the pursuit of value have psychological, practical and social consequences which may conflict or may be compatible with the pursuit of other

20

Theory of human values

value types. Analyses of such conflicts and compatibilities (that are likely to
arise when people pursue values) simultaneously yield hypotheses about potentially universal relations among value priorities4. This pattern of relations
of conflict and compatibility among value priorities (that is postulated to
structure value systems) is shown in Figure 1, where competing value types
emanate in opposing directions from the center and compatible types are in
close proximity going around the circle. For example, the location of tradition outside of conformity implies that these two value types share a single
motivational goal. Although, the theory discriminates among value types,
it postulates that, at a more basic level, values form a continuum of related
motivations. It is this continuum that gives rise to the circular structure.
The nature of the continuum is clarified by noting the shared motivational emphases of adjacent value types. And the partitioning of single values
Openess to
change

Self-transcendence
Self-direction

Universalism

Stimulation

Benevolence

Hedonism

Self-enhancement

Achievement

Conformity

Power

Tradition
Security

Conservatism

Figure 1. Theoretical model of relations among motivational types of values,


higher order values, and bipolar value dimension
Source: Schwartz 1992, p.13.
4

For example, the pursuit of achievement values may conflict with the pursuit of benevolence values: seeking personal success for oneself is likely to obstruct actions aimed at enhancing the welfare of others who need ones help. In similar manner, the pursuit of tradition values
conflicts with the pursuit of stimulation values: accepting cultural and religious customs and
ideas handed down from the past is likely to inhibit seeking novelty, challenge, and excitement.

Basic theoretical assumptions underlying human values

21

into value types represents conceptually convenient decisions about where


one fuzzy set ends and another begins in the circular structure. The motivational differences between value types are continuous rather than discrete,
with more overlap in meaning near the boundaries of adjacent value types.
Consequently, in empirical studies, values from adjacent types may intermix
rather than emerge in clearly distinct regions. In contrast, values and value
types that express opposing motivations should be discriminated clearly
from one another. The oppositions between competing value types can be
summarized by viewing values as organized in two bipolar dimensions. For
example, we can see that in Figure 1, one dimension contrasts higher order value types such as openness to change and conservation. This dimension
opposes values emphasizing own independent thought and action, favoring
change (self-direction and stimulation) to those emphasizing submissive self
restriction, preservation of traditional practices, and protection of stability
(security, conformity, and tradition). Also another dimension contrasts higher order value types such as self-enhancement and self-transcendence. This
dimension opposes values emphasizing acceptance of others as equals and
concern for their welfare (universalism and benevolence) to those emphasizing the pursuit of ones own relative success and dominance over others
(power and achievement). Hedonism is related both to openness to change and
to self-enhancement [Schwartz 1994].

Hierarchy of human values


People may distinguish different types of values according to their own preferences5 in ascending or descending order, which means they let them exist
5

The term preference is used in a variety of related, but not identical, ways in the scientific literature. In psychology, preferences could be conceived of as an individuals attitude towards a set of objects, typically reflected in an explicit decision-making process [Lichtenstein
and Slovic 2006]. Alternatively, one could interpret the term preference to mean evaluative
judgment in the sense of liking or disliking an object [Scherer 2005] which is the most typical
definition employed in psychology. However, it does not mean that a preference is necessarily stable over time. Preference can be notably modified by decision-making processes, such
as choices [Brehm 1956; Sharot, De Martino and Dolan 2009], even in an unconscious way
[Coppin et al. 2010]. And in economics, preference refers to the set of assumptions related to
ordering some alternatives, based on the degree of happiness, satisfaction, gratification, enjoyment, or utility they provide, a process which results in an optimal choice (whether real or
imagined). Although economists are usually not interested in choices or preferences in themselves, they are interested in the theory of choice because it serves as a background for empirical demand analysis.

22

Theory of human values

in a sort of internal hierarchy. In fact, this rule has been already known
by our ancestors, who spoke of the leading global values as they exist
in human hierarchy (such as: truth, good, beauty) and also which lay the
principles under lower-level values. This type of hierarchy remains objective and exists regardless of human nature. Moreover such a hierarchy is
strongly grounded on criteria that are constituted by stability (constancy)
of values and profundity of satisfaction [Szymaski 1998]. Hartmann supported this view by concerning human values hierarchy, i.e. dividing them
into higher and lower dimensions. Peoples awareness always needs some
comparison where different kinds of values are placed in higher ranks next
to values placed in lower ranks. However, Hartmann did not always share
that opinion, because lower ranked values may occasionally turn upwards
and higher ones may, on the other hand, turn downwards. It is also known,
that some of higher or major values follow by the lower values. This point of
view is quite similar to Maslows needs hierarchy [Maslow 1954; Galewicz
1987].
Hartmann believed that values should be placed in the context of their
degree and extent to other important values. The same, Rokeach, (as well as
Schwartz [1992]) contended that values exist in a hierarchical and interconnected structure. That is, while all values are important and linked together,
some values are more important than others. For example, this notion is
intuitively appealing for the average 22-year-old young person (e.g. where
value an exciting life will be usually rated as more important than salvation).
This thought was also confirmed by Vinson, Scott and Lamonts [1977] studies who added that hierarchy of values exists as a sort of central-peripheral dimension in human life. Within this dimension values range from
the most centrally held to the least centrally held area. However, if young
persons would have been asked to rank-order the Rokeachs Value Survey,
probably, many would complain that certain values clump together and these
value clumps take on differential importance [Williams 1968, p. 287]. Still
differentiation among values is important, and in order to compare superior
values from the inferior ones, we can do it on the basis of:
values relation to people, which expresses the intensity of particular
values and their classes according to the peoples rule of ordering values in their life,
axiology relation, which leads to an expression of higher and lower
(in importance) values defined in system of values,
numerical relation, enabling quantification, arrangement and assignment numbers to values [Najder 1971].

Values comparison across selected disciplines

23

Also Borowicz [1991] proposed a way of setting values in human hierarchy which runs across:
vertical linear configuration, where some values prevail over other
values,
horizontal terrace configuration, with no superiority and inferiority relation between values and their types,
vertical / horizontal mixed, characterized by both configurations.

Values comparison across selected disciplines


Theoretically, interest in values is spread across many disciplines including
philosophy, economics, sociology, psychology, cultural anthropology, political science and history. In general, in all these disciplines values are considered as the most abstract type of cognition. Still in many scientific areas,
meaning and definition of value is getting wider. The more often it is studied,
the more its term is differentiated and blurred in literature. Therefore, the
term of values is ambiguous and difficult in interpretation. According to the
Polish researcher [Tatarkiewicz 1985], putting a value in the frame of real life
meaning becomes almost impossible, and thus it is impossible to integrate
it across many disciplines which are only interested in pursuing their own
research interests within the values area. Below a description of values according to selected fields of science is presented.

Axiology
According to axiology6 or the theory of value [Hilliard 1950; Taylor 1961;
Hartman 1967], we can understand a given type of value by considering its
relationship to other values. In other words, we can understand one type of
value if we compare it with other types of values, to which it is closely related. We can also grasp the meaning of values if people themselves ascribe
meaning for each value. However, people (owing to subjective cognition)
classify values diversely. For example, Ingarden [1966, p. 64] explained that
values are state of being that arouses positive emotions to which a human
6

Axiology in reference to the term value appeared between XIX and XX century in works
of the following authors: D. Hume, I. Kant, R.H. Lotze, F. Nietzsche, F. Brentan, A. Meinong.
And the term axiology was used by P. Lapie in his publication Logique de la volonte [see:
Sagan 2011].

24

Theory of human values

being expresses and directs its own desires and aspirations. In philosophy,
Nietzsche questioned the existence of objective values, pointing at subjective, and biological level depending on situational predisposition. In his
conviction, values such as: stateliness, dignity, firmness, mental and physical
fitness, self-confidence belong to superior and dominant people than soft and
weak-minded humans who preferably choose: pity, tenderness, love, altruism
and ambivalence [Nietzsche 2004].
Axiology classifies value from various points of view such as:
value source things, subjects possess value or otherwise, value is delivered to the things, subjects by human (this point of view is based on
the objective and subjective context of value understanding),
value dependence things, subjects may have stable value or they can
be dependent on the respective circumstances,
historical changeability of value where value changes over time and
under the influence of historical circumstances,
conviction to the value allowing to express opinion with the complete or partial certainty about the validity of value,
cognitive meaning of value explaining the way and to what extent,
value can be learnt by human being,
problem of axiological order in values explaining the possibilities
of making hierarchy in values [Biaynicka-Birula 2011, pp. 1112].

Psychology and sociology


Pointing now on the definition of values in psychology and sociology, it appears to be quite a difficult task. Description of values, their meaning, both
in context of psychology and sociology, lies in the background of human
desires and efforts undertaken by individuals or societies who strive to attain
certain ideas and cause certain behaviors or states. Values are also framed
in terms of subjects possession in the surrounding environment. Elements
such as desired behaviors, ideas and objects and finally predicted effects
results make up a value.
In sociology, values should be characterized through behavior, social relations and ideas. Usually, values adhere to subject of desires and direct behavior
and belief of members in society. They bring cohesiveness into society and give
their members hints for the appropriate and commonly acceptable behavior
in society. It is attribute of particular subject, to which people take a stance
and which affects their personal behavior. It is also perceived as potential and

Values comparison across selected disciplines

25

dynamical dimension, which is challenging and very demanding, blatant and


latent in itself. It can be relativistic and obvious form composed into transcendental, higher and ideal state that members respect and fully support in society
[Kluckhohn 1962, p. 74; Doniec 1991, p. 59; Mendras 1997, p. 81].
In contrast, in psychology, value is the subject, thing, action or relationship which fulfills the desire or certain need that is wanted by human being.
Values belong to conscious and cognitive conception of desires and the most
attractive needs that human being wants to fulfill [Rumiski 1996, p. 72]. In
other words, value is the subject of unfulfilled need and remains attractive
as long as it is not realized. Skinner [1971] thought that values in the longer
run affect directly on humans existence and simultaneously they constitute
relative components of human personality and measure intensity of desires
in subject possession, accomplishment of desired state of mind, making and
development friendship with other people. Simply said, the more subject is
desired and more emotionally loaded, more value it has [Mysakowski 1965,
p. 63; Grzegorczyk 1970; Mariaski 1989].
Values described by psychologists and sociologists represent individual,
social and absolute form. Values are mainly perceived as the category highly
subjective, dependent on the:
individuals characteristics,
way of reality perception,
availability of things in certain place and time,
position of the individuals in social hierarchy,
way of interaction of the individual with the reference groups where
individual must coexist.
Thus, social values are subject to change and they depend on historical,
cultural and local conditions. Some of them may be considered as unconditional (absolute), because they were and still are commonly accepted in all
ages and the human system. They establish principles, where individual and
social life objectives are integrated [Brzozowski 1989].
In recent years, research in the sphere of values mainly focuses on personality traits [Brandstaetter 1996], time preference [Loewenstein and Prelec
1992], and the ability to deferred awards [Roman and Kaplan 1995; Lunt
and Livingstone 1991]. These elements make up the value in the context of
human biological strives, that are combined with cultural and social determinants. There are also undertaken attempts to combine the empirical research results (based on psychological and socio-economic information, e.g.
income, level of education, age, attitudes, habits) in order to create a coherent model of behavior in this area [Furnham and Argyle 1998].

26

Theory of human values

Culture anthropology
As far as the cultural anthropology7 context of values is concerned, they are
defined by commonly desired subjects in society in reference to their symbolic and non-symbolic characteristics. They are also part of the commonly
acceptable standards of existence, widespread beliefs and behaviors [Ossowski 1967; Misztal 1980]. Also they can be viewed as a conception [Kluckhohn 1951], generalized meaning and unconscious assumptions [Homans
1951], interest relations (Radcliffe-Brown) or even perceived from the side
of ethos (Kroeber) [Graeber 2001].
Symbols and meanings are defined as certain ideas that affect the choices
of different patterns of action and which determine not only of what individuals much desire, but also, what they should desire from the groups point
of view, to which they belong or aspire. Comparison of these ideas should
be made by orientation values, which set out the framework for the human
existence. In cultural anthropology, values make up the objective, they are
shared ideas represented by the signs and symbols, giving a suitable sense for
individuals in relation to the borne costs and benefits. Values are
shared by
individuals and objectified in repeated acts of action in certain situations. In
this approach, values must be distinguished from the needs (e.g. that result
from the lack of a sense, that they are not legitimized by means of signs and
symbols, that do not form a coherent system), motives and impulses (which
are a function of only the biological mechanisms of individuals, satisfaction
(that is not a psychological state of individuals), or utility or quality (which
are not attributes of material objects or the result of the relationship between
perception and expectations for these objects or states [Sagan 2011].
In anthropological sense, the primary function of the value is the understanding of the actions of individuals. Values are the key to understanding
the meaning of these actions. Firth [1953] stresses that research in the field
of values should help people better understand the importance of their actions. With reference to the values we can build a theory of stability and
7

Culture is defined as a set of orientations on the values that are treated as generalized
and organized ideas influencing human behavior, nature and mans place in it, desirable and
undesirable relationships such as human-human and human-nature [Henry 1976]. Kluckhohn
and Strodtbeck present it in the form of a matrix value orientation, which operates on four
dimensions 1) the relationship human-nature, 2) the perception of time, 3) the type of activity
and personal aspirations, 4) relationships between people. These dimensions refer also to the
alternative orientations on values which are: 1) compliance, 2) harmony, 3) reign [Kluckhohn
and Strodtbeck 1961].

Values comparison across selected disciplines

27

observe change in social action. Understanding the behaviors provides solution to understand the values, to which these behaviors are addressed. Hence
the role of values in the cultural system boils down to the validation of these
activities. Values are reflected as models, standards or criteria for the behavior of individuals and their heterogeneous nature is often a source of conflict
in the culture system.

Economics
Finally, in the economics, values are both expressed by subjective and objective dimension. In objective theory, economists perceived values mainly in
the frame of costs incurred to produce a certain good (e.g. Smith, Ricardo,
Carey, Marks). Here, underlying values are covered by the costs of production or work. In objective theory, manufacturers have the right to determine
values for certain goods assuming certain level of production cost. Therefore
economists, for a long time, focused mainly their attention on the supply
side of the market. They have not taken into account factors related to the
market demand. They just omitted the impact of the buyers values on the
market and his role in shaping values of the goods. A revolutionary change
in understanding of value came with the advent of subjective theory in the
economy in the 70s. The creators of this line of thinking were: Jevons (1871),
Menger (1871), and Walras (1874) [Biaynicka-Birula 2011].
Representatives of the subjective position have perceived values through
the lens of benefits, which can be delivered to consumers through the agency
of certain goods. According to them, the source of value and price is strongly related to human needs rather than costs, efforts or labor necessary to
produce the goods. They focused mainly on the market demand and often
described sources of values in the frame of consumers feelings. These concepts combined value with the perspective of the individual, its subjective
feeling which is strongly associated with a particular good in the process
of consumption. In subjective theory, man seeks to maximize pleasure and
minimize pain. Value is identified on the basis of general theory of utility
(Condillac), theory of extreme degree of utility (Jevons, Menger, Gossen,
Walras, Pareto, Wicksell, Cassel), and theory based on the marginal utility of
the value (Wieser, Bohm-Bawerk) [Biaynicka-Birula 2011].
Subjective theory represents more the general theory of utility in economics. In contrast, in objective theory, values are analyzed in context of
their real assets or proprietary. First aspect epitomizes some sort of human

28

Theory of human values

personal experience where values are given the names by their diverse and
wide extent, according to human experience and consciousness. In the latter, values are treated independently from the perspective of human experience and consciousness; hence they are not relativistic and are not changing
under the influence of subjective experiences and feelings of the people
[Buczyska-Garewicz 1975, pp. 1112].
An attempt to connect the objective and subjective understanding of the
concept of value was undertaken by Marschall when he formulated the principles of the synthetic theory of value. He pointed out that value and price is
affected both from supply and market demand perspectives. A certain good
has value, as it includes the cost of production incurred by the manufacturer,
and also as it is useful for the consumer [Sagan 2011].

Human values needs, motivation, attitudes and behaviors


Subjective theory perceives values also through the lens of human needs8
and motivation9. Needs are mostly related to biological influences, and values capture a distinguishing feature of social life. Values do serve as socially
acceptable and culturally defined ways to articulate needs. The expression
and satisfaction of biological needs can be reflected through culturally prescribed values, but these values are not the needs.
On the other side, values in the context of motivation (according to
Schwartzs [1994] and Feathers [1995] theory) are not simply abstract desirable conceptions, but are motivational. They help to express basic human
8

A need is something that is necessary for organism to live a healthy life. Needs are distinguished from wants because a deficiency would cause a clear negative outcome, such as
dysfunction or death. Needs can be objective and physical, such as food, or they can be subjective and psychological, such as the need for self-esteem. On a social level, needs are sometimes controversial. Understanding needs and wants is an issue in the fields of politics, social
science, and philosophy.
9
Motivation is a psychological feature that arouses an organism to act towards a desired
goal and elicits, controls, and sustains certain goal-directed behaviors. It can be considered
adriving force; a psychological one that compels or reinforces an action toward a desired goal
[Schater 2011, p.325]. Motivation has mainly roots in physiological, behavioral, cognitive, and
social areas. Motivation may be rooted in a basic impulse to optimize well-being, minimize
physical pain and maximize pleasure. It can also originate from specific physical needs such
as eating, sleeping or resting, and sex. Motivation is an inner drive to behave or act in acertain manner [Pritchard and Ashwood 2008, p. 6; Piers and Knig 2006].

Human values needs, motivation, attitudes and behaviors

29

needs [Rokeach 1973; Schwartz 1992], and these needs by definition, motivate social behavior. Here, values are more backed up by their affective components than cognitive components. Values are socialized into human world
throughout the process of teaching moral absolutes. They are representations of emotions, often employed in the support of our affective reactions
[Maio and Olson 1998]. However, Williams [1979] argued that values are
not motivational in the emotional sense, but they are rather part of cognitive
structure, which is coupled with emotion and which leads to action. Thus,
values are certainly not the sole motivational factor behind action. Values act
in concert with other motives [Staub 1989]10.
Values are also related to attitudes and can be even applied to concrete
social objects11. Some researchers [Bem 1970] hold that values represent
10
Values may be e.g. motivationally related to the volunteers work. Batson [1989] hypothesized a three-path model of the relationship between personal values and prosocial motivation where values operate through two egoistic motives (hedonism and arousal reduction) and
one empathetic, altruistic motive. Benevolence values appear as the most strongly endorsed
motivation for volunteer work [Omoto and Snyder 1995] and are highly correlated with most
other measures of volunteer activity.
Some other yet example links adult work values (a specific domain of ones value structure) with social stratification outcomes. That is, valuing entrepreneurial versus bureaucratic
job properties (the former emphasizing self-sufficiency and control at work, the latter focusing
on stability and security) strongly mediates the effects of social origins and adult occupational
achievement [Halaby 2003]. Specifically, gender, years of schooling, and cognitive ability have
strong influences on job values, which in turn influence occupational attainment. Johnson
[2002] suggested a similar process through which adolescent work values influence job selection
and highlighted the reciprocal effects of employment opportunities on those very work values.
11
An attitude is an expression of favor or disfavor toward a person, place, thing, or event
(the attitude object). Prominent psychologist Allport once described attitudes as the most
distinctive and indispensable concept in contemporary social psychology [Allport 1935,
pp.789844]. Attitude can be formed from a persons past and present [Eagly and Chaiken
1998]. Attitude is also measurable and changeable as well as influencing the persons emotion
and behavior. An attitude can be defined as a positive or negative evaluation of people, objects,
event, activities, ideas, or just about anything in environment, but there is debate about precise
definitions. Eagly and Chaiken, for example, define an attitude as a psychological tendency
that is expressed by evaluating a particular entity with some degree of favor or disfavor [Ajzen
2001, pp. 2758]. Though it is sometimes common to define an attitude as affect toward an
object, affect (i.e. discrete emotions or overall arousal) is generally understood to be distinct
from attitude as a measure of favorability [Wood 2000]. This definition of attitude allows for
ones evaluation of an attitude object to vary from extremely negative to extremely positive,
but also admits that people can also be conflicted or ambivalent toward an object meaning
that they might at different times express both positive and negative attitude toward the same
object. This has led to some discussion of whether individual can hold multiple attitudes toward the same object [Breckler and Wiggins 1992].

30

Theory of human values

the special kind of attitude object, albeit a more ephemeral kind, that can
be assimilated into attitude research [Schuman 1995]. Katz [1960] includes
value expression as one of the four functions of attitudes, referring to attitudes and expressing values as central to the self-concept. Attitudes that
are valueexpressive lead to stronger relations between values and attitudes
than do other types of attitudes [Maio and Olson 1994]. Kristiansen and
Zanna [1991] suggested that attitudes can either express values or influence
the perception of values (what they term halo effects). On the other side,
Maio and Olson [2000] found an empirical link between values and attitudes, which is mediated by what they termed goal-expressive attitudes that
express an underlying motivational value structure. Other research links
values with issues such as group prejudice [Biernat et al. 1996] or attitudes
toward high achievers [Feather 1995]. The general consensus is that values
hold a higher place in ones internal evaluative hierarchy than attitudes.
Compared with attitudes, values are more central to issues of personhood
[Smith 1991; Erickson 1995; Hitlin 2003] and are less directly implicated in
behavior [Schwartz 1996].
Values are similar to attitudes in this sense, that both are adaptation abstractions that emerge continuously from the assimilation, accommodation,
organization and integration of environmental information in order to promote interchanges with the environment favorable to the preservation of
optimal functioning [Kahle 1983]. Because values are the most abstract of
the social cognition, they reflect the most basic characteristics of adaptation. These abstractions serve often as prototypes from which attitudes and
behaviors are manufactured. Cognitions and values also guide individuals
about which situations to enter and about what they do in those situations
[Kahle 1980]. Within a given situation, the influence should theoretically
flow from abstract values to mid-range attitudes and then to specific behaviors. This sequence is as follows: value attitude behavior [Homer and
Kahle 1988].
As the theoretical arguments suggest, values have causal influence on
subsequent behaviors12. Williams contended that explicit and fully conceptualized values become criteria for judgment, preferences, and choice. Even
when values are implicit, they function as if they were grounds for behavio 12

For instance Carman [1977] developed a model proposing a causal relationship between
terminal and instrumental values and consumption behaviors. In this model values influenced
behaviors, such as shopping and media exposure patterns, both directly and indirectly through
intervening attitudinal variables (measured as activities, interests and opinions).

Human values needs, motivation, attitudes and behaviors

31

ral decisions. Moreover actual selections of behavior result from concrete


motivation in specific situations which are partly determined by prior beliefs
and values [Williams 1979, p.20]13.
Behaviors may be influenced by more than one value [Bardi and Schwartz
2003]. Wojciszke [1989] proposed three preconditions in understanding the
influence of a value structure on behavior. The structure must be: 1) a well
established entity in a persons cognitive system, 2) activated from long-term
memory and 3) accepted by a person as relevant and proper for conceiving
of the current situation.
Finally, we need to mention about the empirical evidence which explain
on how the behavioral changes can be brought about by changing core values
[Rokeach 1973]. For instance, Rokeachs self-confrontation method makes
individuals aware of their particular value hierarchy and which gives information about how such a value constellation situates them with respect to
positive and negative reference groups. In Rokeachs study, when individuals have found their value rankings were divergent from positive reference
groups and were closer to negative reference groups, they were more likely to
shift both values and behavior [Rokeach 1973]. Thus, being confronted with
negative feedback, it led to self-dissatisfaction and possible changes in values
[Sanders and Atwood 1979]. In consequence, values that are central to the
self, when energized, influence behaviors related to those values, suggesting
that behavioral changes can occur through cognitively activating important
values [Verplanken and Holland 2002].
However, in literature not all the theories support the significance of relationship between values attitudes behaviors. Skinner [1971], for example, believed values are epiphenomena, and because they are merely words,
they cannot be related to guidance of behavior or attitudes. This is not to
say that values are completely unrelated to behavior, but merely that situational forces can overwhelm values [Maio et al. 2001]. In Maio and Olson view [1998], one reason for this less-than-perfect fit between attitudes
and behaviors is that individuals often hold values without strong cognitive
support. This approach holds that values are truisms, believed in without
well-articulated defenses, rendering them susceptible to arguments or social
comparisons that challenge individuals values.
13

Williamss theory, however, excluded attitudes.

32

Theory of human values

Relationship between values and personal identity


Personal identity represents a sense of self built up over time as the person
embarks on and pursues projects or goals that are not thought of as those
of a community, but as the property of the person. Thus, personal identity
emphasizes a sense of individual autonomy rather than a communal involvement [Hewitt 1997, p.93]. It is experienced by individuals as a core or
unique to themselves in ways that group- and role-identities are not. It is
often discussed as a set of idiosyncratic attributes that differentiate the person from others [Tajfel and Turner 1986; Thoits and Virshup 1997].
Individuals values, deeply personal but socially patterned and communicated, are essential for understanding personal identity. They offer us the
ability to identify empirical links between self and social structure. For example, Gecas [2000] attempted to draw a connection between identity and
values what he termed as value-identities relationship. A focus on the
value-identities as Gecas argues allows researchers to concentrate on the role
of culture in maintenance and development of various social identities. In
focusing on culture, however, one may overlook patterned structural effects
on the individuals values-structure. The concept of value-identities is useful because it links values to identity theory [Gecas 2000, p.96].
Hitlin [2003] argues more strongly than Gecas that values are the primary
phenomenon in the experience and personal identity of human. Values lead
to the development of the reflexive value-identities but are not reducible to
identities. Nor do values-identities, phenomena that situate us with the respect to others in the social word, tell us much more about our relationship with ourselves and our other values. Our values lead to experiences of
personal identity, which in turn lead to reflexive constructions of various
role-, group-, and value-identities. These latter identities are tied more directly into behaviors than are values. Conceiving oneself, for example, as
open to change (a reflexive use of a value to define a value-identity) is not the
same as seeing oneself as shy14. Such self-descriptions situate individuals in
relation to others in social space. Thus, in Hitlins concept, a value-identity
focuses on individuals relationship to the wider social and symbolic sphere
rather than only to one self and ones other values. Values cause the people to
possess a sense of unified, transsituational personal identity. Such values are
enacted and articulated through the intermediate development of various
14

Example used by Thoits and Virshup [1997] in linking identities with phenomena other
than roles or groups.

Relationship between values and personal identity

33

role-, group-, and value-identities. The behaviors people enact, as a result of


their identities, can cause them to reflect on their values and, over time, to
find different values the most compelling. In consequence, people experience shifts in their personal identity (their sense of who they are).
By conceptualizing the core of personal identity as composed of particular value-structures, researchers may advance their works on the self in at
least three directions.
Firstly, one can apply its understanding of the self to the disparate, rather
than fragmented knowledge on values. Value is often a term for whatever
phenomenon various researchers are examining. By using, for example, the
validated Schwartzs measures, we can begin to pull together the various
strands of research on values. In reconceptualizing the work on values as the
core of the self, we can address the fragmented nature of knowledge on self
[Prentince 2001].
A second advantage of incorporating values addresses (what Deaux
[1996], Thoits and Virshup [1997] regarded), an over-cognitive approach to
social identification process. People do not simply form cognitive attachments to particular group and role-identities. Indeed, the term attachment
signifies something deeper. People feel as though important aspects of self
reflect who they really are, a topic about which most individuals do not
feel neutral. Values deal intrinsically with issues of cognition and of feeling.
Values are emotion-laden conceptions of the desirable that underlie value
identities, which themselves are developed around affective meanings appropriated to self [Hitlin 2003].
Third and last advance permitted by such understanding of personal identity is primarily sociological. Various studies, notably Kohns works [1959,
1969, 1976], Kohn and Schooler [1982], also Inglehart [1997], Inglehart and
Baker [2000], demonstrate links between a number of sociological variables
and values. Gender, race, ethnicity, social class, nation of origin, and education are all found, through a variety of different and sometimes incommensurate measures, to shape peoples values. By linking self to values through
personal identity, we can begin to systematically connect these structural
positions to self in a way championed by the pioneers of self-theories. Ones
place in the social structure has discrete effects on many outcomes; values
are both important intermediate force and an interesting outcome in their
own right. By understanding values as the core of the self, sociologists can
understand patterns of perception, self-conception, and action across members of particular groups while also allowing room for individual agency and
action. That is, all members of a particular social group may place priority on

34

Theory of human values

particular values, but also may possess important individual differences in


the constellation of values and identities that make their sense of self. Social
class, race, and gender do not strictly determine value structures, yet the key
values privileged in the transmission of social-group membership indicate
commonalities among members of such groups [Hitlin 2003].

Consumers values in marketing


Personal values in theory of value utility
As already mentioned, values are strongly associated with the economics
principles, where one can refer to concept of theory such as value utility.
This concept is inextricably linked to the category of goods. It is one of its
essential characteristics. Utility value of the goods encompasses all (even
physical) characteristics which enable consumers satisfy particular need(s).
In other words, utility value means the ability to satisfy human needs within
the product or service consumption.
The utility theory is related both to the concept of utility value and exchange value [Galbraith 1991, p.16]. According to Smith, utility value depends on subjective factors (i.e. how an individual uses a certain product
and to what extent it can meet its needs). In turn, the exchange value is determined by the amount of labor needed to produce the product / a certain
good. Ricardo also understood this difference between these two concepts.
He believed that utility value cannot be the basis of exchange value, although
it is a necessary condition for it. In market exchange only those goods participate that are of utility value for consumers. Hence if product is out of
utility value, it has also no exchange value.
Subjectivists have completely moved away from the theory of value based
on the human work. They simply argued that work bears no relation to the
assessment attributed to a particular good by the user. Menger stated that
theory of value is grounded mainly on the utility (i.e. the ability of goods
to satisfy the subjective needs of individuals). He also stated that about the
greatness of the value decides the marginal utility [see: Zagra-Jonszta 2004,
p. 34]. And for particular consumers there are actually useful all units of
consumed goods. Hence the sum of utility (which consumer derives from
consumption of all units) will be much greater than utility of the last consumed unit [Sagan 2011].

Consumers values in marketing

35

In short, theory of utility stresses that consumers purchase products to


obtain certain benefits from consumption. However, in order to derive these
benefits, one need to act rationally when making appropriate decisions,
which are optimally adjusted to actual and quite often limited resources on
the market. As a result, all undertaken decisions are often reflected by the
internal system of human personal values as guidelines.
Personal values suggest people which products bring them the most
benefits from the consumption. In consequence, accumulated by people
personal values throughout their whole life, provide them some sort of reference point, principle and clear view through which they can adjust their
own market choices. The similar situation refers to subjectively perceived
by consumers satisfaction level regarding each product. Thats why consumers when scan a variety of offered products in the market (from all other
available alternatives to be chosen), virtually scan their personal values, even
though they are not aware of them.
Preferences with regard to product and simultaneously the underlying
background such as values can be quantified by the respective value utility
function, allowing to assign each variant (such as being the case with values)
a numerical characteristic. This can be expressed in the following way [Bk
and Walesiak 2000]:
Ui = (z1, , zk),
where: represents association between variables, making up a point of reference in defining value preference structure along all considered variants
of values being described by variables Z = {z1, , zk}, from the perspective
ofi-th consumer.

Personal values vs. product-services values for consumers


inmarketing research
Marketers have long acknowledged the importance of attitudes, needs, etc.,
in marketing, or more precisely in the consumers research. But the role of
personal values has received relatively very little attention so far. Unlike,
studies dedicated to product (service) values offered to consumers and which
play a different role and have different face as compared to personal values.
In the beginning, research studies (solely dedicated to the measurement
of values in reference to marketing activities) explored mainly the following
aspects [Kanter 1978; Carman 1977; Vinson, Scott and Lamont 1977]:

36

Theory of human values

differential product preferences,


cross-cultured consumption patterns,
market segmentation potential,
consumer dissatisfaction,
life style,
cognitive structure.
Most of these studies have important implications for marketing practitioners, especially in the area of influences on consumers behavior [Scott and
Lamont 1974; Vinson 1977; Vinson, Scott and Lamont 1977]. Quite interestingly, in almost all these studies, data collected represented the Value Survey,
i.e. the Rokeachs terminal values.
Perhaps, a good starting point for differentiation between the consumers
personal values and product/services values offered to consumers, would be
hierarchical model of values proposed by Vinson, Scott and Lamont [1977,
pp. 4546], which contains three mutually dependent and consistent levels
of values abstraction. These levels were arranged in the following hierarchical network:
global values (or personal values),
domain-specific values,
values based on evaluation of product attributes.
Taking into account of global terminal values such as exciting life, equality, love, or forgiveness, beliefs exist as the most elementary unit within the
system of values. Very centrally held and enduring beliefs guide actions and
judgments of consumers across specific market situations. Global values are
more abstract and generalizable than less centrally held beliefs. They consist
of closely held personal values which are of high importance in evaluations,
decisions or choices.
The second-level of values, domain-specific values (e.g. durable products,
health-promoting products, products easy to repair) reflects the beliefs that
people acquire values through certain experiences in specific situations or
domains of activity and that behavior cannot be understood or efficiently
predicted except in the context of specific environment. Thus, individuals
obtain values specific to economic transactions through economic exchange
and consumption, at social level through peer group interaction. These intermediate values construct a bridge between the traditional conception of
closely held, but very general, global values and the less closely held descriptive and evaluative beliefs about product attributes.
Finally, third category of values is less abstract and consists of descriptive
and evaluative beliefs. While such beliefs may be important, they are less cen-

Consumers values in marketing

37

trally held. Among many kinds of beliefs in this category are evaluative beliefs
about the desirable attributes of product classes as well as specific brands.
In consequence of the above distinction, values for consumers cannot be
mistaken for consumers personal values, because in the former approach,
value is primarily defined in reference to the objects such as product/service,
while the latter option accounts for values which come from internal system of human being (they are part of ones: existence, style of living, culture,
formed social relations, experienced feelings, emotions, etc.)
In context of values for consumers, value is supplied to the consumers
through the agency of certain goods (products or services) and is therefore
associated with the total product value that is substituted from the cost of obtaining a product by consumers. In other words, consumers have to bear certain level of cost in order to get it. In return, they may also expect benefits from
the product (for example, lower price, easy in use products, good availability
of products, high quality longer durability, etc.) as long as they still consume
it. The longer such process lasts (i.e. takes time), the more benefits they derive
(or the more costs they bear) from consumption of the product. If we follow Kotlers point of view, this process is concerned with the consummation
of exchanges, a transaction between two parties (usually, between company
and consumer) in which each party gives up something of value, in return for
something of greater value [Kotler 1991]. Consumer obtains benefits in the
context of his personal, social or practical effects and simultaneously bears
either financial (e.g. price, cost of searching for low bargains in the market,
cost of lost bargains, costs of distribution, costs of product utilization, costs
of product use) or nonfinancial costs (relationships with other people, psychological consequences, physical effort, and lost time) [Woodall 2003, p. 14].
Concept of values for consumer was further developed on the basis of
values classification for consumers according to their needs [Park, Jaworski
and MacInnis 1986] such as:
elementary needs (related to human existence) which motivate consumers for searching products which fulfill their expectations in consumption,
symbolic needs concerning self-realization, the consumers social
role, his/her membership in the group and perception,
cognitive needs associated with the sensual pleasure, the diversity of
the offer or the cognitive stimulation.
In another yet attempt of the values conceptualization, there has been made
classification which contained five types of values in the products such as
[Sheth, Newman and Gross 1991]:

38

Theory of human values

functional values representing the perceived usefulness of products


in physical, utilitarian and functional category, based on attributes and
characteristics,
social values referring to the image and the symbolism of product
which is connected with the demographic, socio-economic or cultural
reference groups,
emotional values associated with the products capability to stimulate emotions and affective states, such as: comfort feeling, safety, excitement, love, fear, or feeling of guilty,
cognitive values resulting from the product capability to arouse the
curiosity of the customer, delivering him/her novel solutions and enriching his/her knowledge,
cultural values referring to the specific situation or the physical and
social context, for example, to living conditions and conditions of the
decision making.
The turning point in scholars works highlighting importance of the consumers personal values and values brought by products for marketing activities came partially through the agency of Holbrooks works and Corfman.
Values were then perceived through [Holbrook 1984, Holbrook and Corfman 1985]:
Interaction where consumers values entail an interaction between
them and objects (products). Essentially, this interactionism perspective maintains that values depend on the characteristics of some
object but cannot occur without the involvement of consumer who
appreciates these characteristics according to his/her own state of values. In marketing, this viewpoint explains why consumer orientation
assumes that a product has value only when it pleases its consumer
from personal point of view. In other words, consumers and no one
else are the final arbiters of value.
Preference where the general concept of preference embraces a wide
variety of values related to terms such as: affect (pleasing vs. displeasing), attitude (like vs. dislike), evaluation (good vs. bad), predisposition
(favorable vs. unfavorable), opinion (pro vs. con), or response tendency
(approach vs. avoid) [Lamont 1955; Brandt 1967].
Experience where consumers values reside neither in the product
being purchased, nor in the brand which is being chosen, but rather
in the personal experience [Holbrook and Hirschman 1982; Woodruff and Gardial 1996]. In essence, this argument explains that all
products provide services in their capacity to create need that should

Consumers values in marketing

39

satisfy personal experiences and simultaneously personal values


[Morris 1956].
Holbrook [1984] also conducted categorization of values into three dimensions, as follows:
Extrinsic versus intrinsic values
Extrinsic values pertain to means-end relationship, wherein for example
consumption (that is based on value) is prized for its functional, utilitarian
instrumentality in serving as means to accomplishing some further goals
[Parsons 1937; Hilliard 1950; Lamont 1955; Diesing 1962; Bond 1983]. In
short, most of us respect money primarily as means to the accomplishment
of particular goals when e.g. buying a newspaper, paying for a meal, or purchasing an automobile.
By contrast, intrinsic values occur when some consumption experience is
appreciated as an end in itself for its own sake as self-justifying, ludic, or
autotelic. Hence, only a consumption experience can confer intrinsic value
[Frankena 1967; Bond 1983].
Self-oriented versus other-oriented values
Values are self-oriented (for myself) when we consider some aspect of
market consumption selfishly or prudently for our own sake, for how we
react to it, or for the effect it has on us. For example, my sweater has for me
personal value at least partly because it keeps me warm or it provides me
agood social feeling.
And conversely, other-oriented values look far beyond the self to someone or something else, where someones consumption experience on which
it depends is valued for sake of someone else.
Active versus reactive values
Values are active when they entail a mental manipulation of some tangible
or intangible object. It involves things, tasks to be done by a consumer to or
with a product as part of some consumption experience. Active consumer
value could involve the physical manipulation of a tangible object (e.g. driving a nice car makes me proud of it because of its value). And the mental
manipulation of an intangible object (e.g. solving a crossword puzzle makes
me proud because it makes me as valuable person).
And conversely, consumers values will be reactive when they result from
apprehending, appreciating, admiring, or otherwise responding to some object, that is, when it involves things to be done by a product to or with aconsumer as part of some consumption experience.

40

Theory of human values

On the basis of such classification of values, Holbrook constructed typology of selected consumers values which contained eight elements. In
Table2, each cell represents a logically distinct type of value in the consumption experience.
Table 2. Typology of selected consumers values
Consumers values
Extrinsic
Self-oriented
active
efficiency (convenience)
reactive excellence (quality)
Other-oriented active
status (success, impression
management)
reactive esteem (reputation, materialism, possessions)

Intrinsic
play (fun)
aesthetics (beauty)
ethics (virtue, justice, morality)
spirituality (faith, ecstasy,
sacredness, magic)

Source: Holbrook 1984.

In another yet study, Sweeney and Soutar [2001] identified values, which
might determine attitudes and buying behaviors of consumers. They extracted four dimensions:
emotional offered by the product in reference to category such as:
joy, relaxation, sense of well-being, pleasure,
social appealing for the feelings approval by other members of society, and also the improvement of own image, or making an impression
on the others,
functional referring to the product quality, i.e. the workmanship according to the accepted standard of the quality, which is based on stability and long-term usage, without malfunctions,
functional associated with the price and value of the product that
is, a product which is offering a good value with a reasonable price to
money spent by consumer.
Also Woodall [2003, p. 14] distinguished five types of values for consumers: 1) net value, 2) derivative value, among which the accent of personal
values was crossed with purchase experience, 3) rational value, 4) sale value
and 5) marketing value.
All above values will arise mainly in consequence of:
1) handed over, by the company, information to consumers about the
product, e.g. through advertisements, action public relations, packages,
2) development of the products adapted under the influence of information provided by consumers and market research departments,
3) systematical interactions taking place between consumers and company,

Consumers values in marketing

41

4) influences through the purchase conditions, e.g. using the appropriate


illumination in the shop, music, exhibition, or parking car.
At last Smith and Colgate distinguished [2007] :
functional-instrumental values perceived as the degree, in which the
product has qualities desired by the consumer and is useful or is fulfilling specific functions,
hedonic-cognitive values perceived as the degree, in which the
product creates suitable experience, emotions and feelings for the consumer,
symbolic-expressive values meaning the degree, according to which
consumer finds in the product psychological significance,
values perceived in the categories of costs and the benefits as the
result of different marketing strategies used by companies.
Hedonic-cognitive values and symbolic-expressive values deserve special
attention. Former values are perceived as the degree, in which product creates appropriate experience, emotions and feelings for the consumers. In this
category, there are classified: emotional values (e.g. pleasure, amusement,
excitement, adventure, humor), social-relational values (e.g. personal interaction, responsibility, confidence, commitment) and cognitive values (e.g.
curiosity, knowledge, fantasy, novel solutions).
The latter values relate, to luxury products, which cause an increased
sense of self-worth (for example, as a result of possessing a car or giving
away gifts such as jewelry to other people). In turn, some of these values
bring back memories from the past events or persons particularly important
to individuals (e.g. music songs, tourist resorts, restaurants). They allow for
expressing oneself (by using clothing, cosmetics) or simply refer to the customs and traditions.

Subjective information in market segmentation psychographics


and personal values
Undoubtedly segmentation differentiates company from many other competitors in the market. One could even say that management and adaptation
of marketing activities (e.g. configuration of the marketing-mix elements as:
product, price, distribution and promotion) to specific groups of consumers,
assuming they all represent different groups, is more effective in comparison
to analysis of the aggregated market (i.e. where groups do not differ or cannot be identified). In market theory, consumers and products, as well as their

42

Theory of human values

respective market relations, are not only grounded in homogeneous area. As


Pociecha [1996, pp. 3233] states, these assumptions give rise to the need
for the extraction of groups from the diversity of many individuals (people)
and objects (products/services) in the market, which we call market segments. Operations which involve processes of creating diverse groups are
called market segmentation15. So, the concept of market segmentation refers
to the potential market division of units into segments and their selection
for targeted segments. Various forms of segmentation are shown in Figure 2.
S1,

S1,

S1,

S1,

S1,

S1,

Market completely
homogeneous (from point of
view of the economist)
S1,

S1,

S1,

S1,

S1,

S1,

Market segmented
by disaggregation

S1,

S1,

S1,

S1,

S1,

S1,

Segmentation
as a result of the
development of
technology,
telecommunications and
logistics

Market completely
atomized

Figure 2. Market segmentation from various perspectives

Process of market segmentation relies on specific criteria we need to take


into account, as a consequence of the presence (on the market) homogeneous population versus heterogeneous one [Duliniec 1994]. These effective
criteria (according to Prymons view [2009, pp. 115118]) are:
homogeneity of the segment,
potential sensitivity of the segment on marketing activities,
measurability of the segment, the corresponding volume of the segment and its stability over time.
15
According to Garbarski, Rutkowski and Wrzosek [2001, p. 120], segmentation of the
market means a division of the market according to a specific criteria into consumer groups
(segments) that define for the company an area of market operations and provide them a point
of reference for the formulation of the specific marketing programs of action. And the market segment is defined as a homogeneous group of buyers, separate from the general group
of buyers of certain product (brand) or a service based on certain criteria of socio-economic,
demographic, behavioral, psychological, cultural and other.

Consumers values in marketing

43

On the other hand, Mazurek-opaciska [2002] added:


sufficient absorptive capacity of the segment,
the segment compliance with the advantages of the company (competitive advantage),
lack of competitors,
the extent and availability of segment.
Process of market segmentation depends largely on choice of model of
segmentation, which can be descriptive or predictive, a priori or post hoc16.
In the psychographics-based segmentation, a priori model is rarely used and
more often a post hoc segmentation is practiced. In a priori models, information is determined in advance and variables assume relationship between
the intentions, preferences and behaviors, etc. Unlike, in post hoc models,
rules of segmentation are ongoing and set in the process, for example, in the
course of reduction of large number of variables in factor analysis. The basis
of segmentation are usually opinions of consumers in relation to their interests and activities in the highlighted areas, as well as beliefs, personal traits
and more importantly personal values [Rszkiewicz 2011].
In segmentation we dispose objective information (descriptive) or
subjective (e.g. psychographics) and general or specific information. The
general and objective factors include (but not only) demographic, socioeconomic and geographic information. Descriptive criteria make up the
main characteristics of consumers. Their source of segmentation is the pure
fact of belonging individuals to formally defined subgroups in the population of consumers, where categories of these subgroups often result from
natural organization of social and economic life of humans in given region,
country etc.
Descriptive factors owe their popularity in any research because rely on
greater legibility and availability of the data and they can be easily applied to
the individuals identification. They are important in segmentation, however,
in contemporary realities of the market they do not play a decisive role in
the strategic market segmentation [Bass, Tigert and Lonsdale 1968; Johnson
1971]. Due to this fact, dynamics of the market segmentation studies rest
largely on the assumptions of subjective factors e.g. psychographics [Yankelovich 1964].
Subjective descriptive information pertains to the individuals perception
and evaluation of market reality. On the other hand, subjective behavioral
16

More information on this topic can be found in works of Sagan [2004] and Rszkiewicz
[2011].

44

Theory of human values

information relate to subjective motives of individuals and their choices being made. Subjective (descriptive) characteristics include personal values
but also personality traits, aspirations in life, motivations, activities, opinions which provide in-depth information, i.e. a more complete picture
about individuals lifestyle. These characteristics are part of the examination
of consumers psychographic, which combine contradictory psychographic
characteristics (in the course of data analysis) with some other selected descriptive characteristics [Yankelovich and Meer 2006]17.
Thus, segmentation conducted on the personal values, espoused by consumers (contains information expressed in more subjective manner) is
more complex than segmentation pertaining to demographic, economic or
geographic data (Table 3). Here, personal values refer to consumers more
thoroughly latent sphere of activities, as well as their expressed interests and
opinions, etc. In other words, personal values are logically linked to consumers ways of life, where the basic objectives of identification are psychographic
typologies of consumers by their lifestyles [Plumer 1974]. These typologies
can be formed either through AIO method (Activities-Interest-Opinion) or
VALS (Values, Attitudes and Lifestyles) [Vyncke 2002]. VALS studies use the
assumption that social community, market segment can be identified on the
basis of different states of life, which are coupled to individual beliefs, behaviors and needs. Thus, having based on identified states of life covering awide
range of consumer typologies according to their maintained lifestyles, we
can distinguish respective categories of consumers [Myers 1996], e.g.: survivers, sustainers, belongers, achievers.
Segmentation which is based on personal values helps to expand largely knowledge beyond simple socio-demographic properties. If prospective
market segments are to be identified, the marketing strategist cannot omit
information derived from value profiles which maximally enhance the com 17
When we use psychographics for consumers segmentation, we often refer to proposed
by Lazers theoretical relationship between ones personal life style and choices made (i.e. by
person on the market).
Research on psychographics is closely linked with the research program VALS at Stanford
University in 1978, under the direction of Mitchells. The results of empirical studies were in
a short period of time assimilated by the advertising agencies. The research team VALS has
worked out measurement instruments on the grounds formed by the sociologist Riesman from
Harvard University (author of The Lonely Crowd) and also a psychologist Maslow [1954],
who was the creator of all currently known hierarchy of needs theory. In the consumers segmentation by VALS method, there has been going on for a long time discussion in the literature [Lembkin et al. 2001] on the validity of this method.

Consumers values in marketing

45

Table 3. Consumers segmentation information


Type of
informaType of data
tion
Objective demographic

Detailed type
age, sex, family status, profession, education, anthropological characteristics

geographical

home, climate

socio-economic

income, social groups

situational

the place where the consumer is temporarily located

Subjective psychographic
behavioral*

motives, interests, personality, opinions


activities, personal values
benefits expected,
frequency and willingness to use products,
scale use of the product,
degree of knowledge of the product,
confidence in the product

* Behavioral information should be considered as much more effective in identification of market segments than the descriptive information. In segmentation, which is based on behavioral criteria, there
are the most often applied consumer typologies relating to: phase of standby purchase, type of user,
frequency of use, and confidence in the brand.

pany chances in precise segmentation. These profiles in comparison to more


traditional information will be useful in more advanced considerations concerning the marketing research [Vinson and Munson 1976, p. 505].

Product planning and promotional strategy in context of personal


values
Careful analysis of consumers personal values (through the segmentation
processes) and observation of emerging values trends allows for the identification of new product opportunities and the repositioning of existing
products. Noticing changes and varying importance of global-personal values such as pleasure, an exciting life, a comfortable life one may signal the
need for change in products-brand, e.g. their names, package design, etc.
For example, a furniture manufacturer might connect value change of consumers with an increasing demand for contemporary styled furniture and
design a new line having bright, colors, unique materials of construction,
and unusual comfort features [Vinson, Scott and Lamont 1977]. Thus, the
appearance of values-based information and simultaneously value-based
segments suggests that certain products can be successfully positioned in

46

Theory of human values

the market if we appropriately choose and design their attributes in accordance to values expressed by consumers. For instance, a market segment with
consumers who respect values as imagination, an exciting life or independence suggests we dispose a group of consumers with more orientation on
individuality and self-expression.
Since, personal values appear to be connected to the importance of attributes and the appeal of different product classes, they will be also useful in
promotional strategy design when we create and reinforce messages directed
to consumers. Messages about products can be developed not only in reference to the desirable attributes but also in order to enhance the personal values associated with the product attributes. Additionally, the appeal to closely
hold personal values might have the effect of making consumers aware of an
attribute of the product, which previously may not have been considered salient or of which an awareness may not have existed. For example, a department store (knowing that consumers hold more to consumption values) may
care more about the needs of individual consumers and prompt service on
complaints. Simultaneously, this store can use personal values such as politeness, cheerfulness to initiate an advertising campaign emphasizing courteous,
helpful personnel, and the store as a pleasant, cheerful place to shop.

II. MEASUREMENT METHODOLOGY

Selected notes on the history and notion of the measurement


History of measurement
Contemporarily, as throughout much of the history, measurement is still
considered by some scientists and practitioners as mystery. However, there
is applied mathematical science that is slowly chipping away at a portion of
the mystery. This subfield, usually called as measurement theory, focuses
on how numbers should enter into science. One part of this field searches
for rules (axioms) that allow to assign numbers to entities in such a way as
to capture their empirical relations numerically. Another part attempts to
use such qualitative axioms to understand, to some degree, the nature and
form of the variety of empirical relations among various dimensions. Such
relations, when expressed numerically, are commonly called laws [Wright
1997].
Many leading mathematicians, philosophers, physicists, statisticians,
economists, sociologists and psychologists worked to the progress of the
measurement. Their works have resulted in the detailed mathematical development of new structures, and provided greater understanding of the range
of mathematical structures which people are likely to encounter and use in
life and science [Narens and Luce 1986, p.1]. The notion that measurement
is crucial to science or market seems to be a commonplace and unexceptional observation.
Historically, measurement has been more of an abstract, almost ritualistic concern instead of being an integral and central aspect of the conducted
scientific activity. Early applications of the measurement in human life can
be even ascribed to number of historical events in the past which laid the
primary foundations under the theory of measurement. One can notice, for

48

Measurement methodology

instance, a biblical proof of human earliest evaluation of objects, possessions and other references concerning with the measurement. Also Aristotle referred to officials being charged with checking weights and measures.
Another example could be historical France, where the revolution took
place, owing to peasants poverty due to unfair measurement practices.
Now, as far as the specific fields of science are concerned, the considerable development of statistical methods also made a large-scale progress in
the theory of measurement. It has begun with Darwins work on evolution
and his observation and measurement of systematic variation across species.
Later on, Darwins cousin Sir F. Galton extended the systematic observation
of differences to humans. A chief concern of Galton was then the inheritance
of anatomical and intellectual traits. Also Pearson (regarded as the founder of statistics and who was a junior colleague of Galtons), developed the
mathematical tools, including the product-moment correlation coefficient
(bearing his name up to now), which we may use to examine systematically
relationships among variables [Kelly 1958].
Development of the theory of measurement took also its large credit from
psychology and psychological measurement tests1. For example, in France,
Binet developed first individual tests of intelligence as part of his work on
the study of individual differences. German scientist, Stern, developed the
intelligence test (IQ), which he defined as the ratio of mental (measured)
age to chronological (actual) age. In Great Britain, Spearman has followed in
the footsteps of Galton and Pearson, and his work led to modern concepts of
test reliability and factor analysis. Further development of measurement was
also continued by Stevens, who made the first classification of scales dividing
them into four types: nominal, ordinal, interval and ratio. Since then, a great
development in mathematical factor analysis was grounded by Thurstone,
who credited from earlier Stevenss works on application psychophysical
methods to the scaling of social stimuli [DuBois 1970].
According to DuBois, measurement theory as a discipline began to blossom in the 1930s. In 1935 the journal Psychometrika was founded and
then Educational and Psychological Measurement followed in 1941. Also
the British Journal of Statistical Psychology began on 1947, and Multivariate Behavioral Research began in 1966. The first textbooks in measurement
1

Most of the early psychological tests were designed for administration to only one individual at a time. Although work had begun on tests that could be given to many examinees
at once, group-administered tests did not become widely used or accepted, until after their
introduction by the United States in World War I. The success of military testing led to the
widespread development and use of group tests in schools and industry.

Selected notes on the history and notion of the measurement

49

theory are Thorndikes An introduction to the theory of mental and social


measurements, which appeared in 1904. Many other books have followed by
Guilford [1950], Nunnally [1967, 1978], and the article by Churchill [1979].
Although most of the foundation for present day measurement theory was
completed by the 1950s, research into methods of psychological measurement or psychometrics2 continues, what can be observed from the most
popular works such as Handbook of psychological testing by Kline [2000] and
Psychological constructs edited by Embretson [2010].
Modern measurement theories are currently pursuing alternative models and techniques [DuBois 1970]. For example, C-OAR-SE, which is a first
and foremost theory and secondly aprocedure. It is a rational theory, testable only by the evidence of logical argument. It is not testable by any
empirical means. Rossiter claimed that psychometric theory, by its reliance
on statistics for validation, has produced and continues to produce many
erroneous empirical results and therefore leads researchers to wrongly accept and reject hypotheses and entire theories [Rossiter 2011].

General notion of measurement


Considering the essence of scientific research, it reflects the attempt to construct a model of reality, which is further tested [Coombs, Dawes and Tversky 1977]. In the measurement of the phenomena we use mathematical
model, which contains a number of parameters. Hence, its complete identification requires the effective estimation of the model parameters, which
can be done, e.g. by a number of measurements of the modeled phenomenon. Ameasurement (as opposed to the theory of measure)3 is generally
perceived through the assignment of numbers to objects in accordance with
the predefined rules [Choynowski 1971; Stobiecka 2010].
Measurement has turned towards more practical applications. This simple description of the measurement approximates, to some degree, to the
2

In brief, psychometrics has been defined as the integration of psychological theory with
formal statistical models [Millsap 1994].
3
In contrast to the theory of measurement, in the assumptions of theory of measure,
there are mainly considered definitions and properties of various measures, apart from usual
problems of their practical use. From the general theory of measure, we can learn for example, what is the measure of set, what is the probabilistic measure or distance measure. We do
not know, however, on how the distance measure is used, i.e. how to measure the distance between two points [Ostasiewicz 2003, p. 16].

50

Measurement methodology

S tevens [1951, p. 5] definition of measurement (as far as the social sciences measurement is concerned). Stevens said that measurement is the
form of assignment of numbers to objects or events according to specific
rules. This concept, unfortunately, lacked of appropriate accuracy level in
interpretation. Duncan [1984, p.12] commented the measurement (as opposed to Stevens alternative) more precisely adding that measurement is
the assignment of numerals in such a way as to correspond to different degrees of a quality or property of some object or events. More broadly stated,
it depends on assignment a number according to the size, value, or some
other characteristic of a tangible or intangible object4. In this respect, it will
be appropriate if we now distinguish theory of measurement from scaling
theory which refers to the measurement issues. According to Nowakowska
[1975, p.198], theory of measurement is primarily grounded on the analysis of formal conditions, to be fulfilled by the investigated trait. Therefore,
measurement differs from the theory of scaling (further discussed in text)
which is closely related to the designing of methods and techniques for scaling, i.e. assigning numbers to objects in such a way, so they would reflect the
property of measured trait.
Generally, measurement can be considered on the basis of: abstract
mathematical systems (which possess some formal properties), or empirical systems of objects, along with their observable properties and relations
between them. When we carry out measurement we need to find some kind
of transformation allowing for representation of the empirical relational
system through an appropriate numerical relational system (see Figure 3).
Substantive meanings have only those relationships between the numerical
results of the measurement, which are clearly represented or defined on
the basis of relationships between real objects [Stobiecka 2010]. In consequence, one of the above systems represents a system of objects (that we
want to measure), and the other represents a system of numbers. In both
cases, we talk about the system rather than a set, due to fact that elements
of both systems have specific structure. And one of the simplest structure
is the ordinal structure (the term which assumes that a set of objects with
4
Furthermore, in Duncans point of view, all measurement is the social measurement.
And this could be a reference to the earliest formal social measurement processes such as voting or census-taking. He notes that their origins seem to represent early attempts to meet
every day human needs, not merely experiments undertaken to satisfy scientific curiosity. He
continues, saying that similar processes can be drawn in the history of physics where measurement of length or distance, area, volume, weight and time was conducted by ancient people in
the course of solving practical and social problems [Duncan 1984].

Selected notes on the history and notion of the measurement

51

Measurement
Empirical
relational system

Base of
understanding

Formal relational system

Statistics
Mathematics

Interpretation

Score (empirically
significant)

Numerical score

Figure 3. Measurement process by Krizs


Source: Zuse 1994, p.137, followed by Stobiecka 2010.

specific relationships is based on their order in this set). For example, lets
assume that population of objects to be measured, will be marked as X,
and the order relationship as . Pair (X, ) denotes empirical system, and
(R, >) represents theoretical system. Thus, measurement will be a function
[Ostasiewicz 2003]:

f : X R, 

(2.0)

which satisfies the following condition:


X Y f (X) > f (Y).(2.1)

This type of measurement (2.1) is called an intensive measurement. If on the


elements of the set X one can make concatenation of those elements, the
function f must also satisfy the second condition which is:

f (X Y) = f (X) > f (Y). 

(2.2)

where symbol denotes the operation of concatenation.


The second type of measurement (2.2) represents an extensive measurement, where function f is called a scale.

52

Measurement methodology

In addition to the intensive and extensive measurement we also distinguish fundamental5 and derivative measurement. These concepts were introduced by Campbells, and in his view, in fundamental measurement, quantity
of the measured object property is determined by a simple and direct comparison with an object that has a standard level of this property. Anecessary
condition (which has to be fulfilled in the fundamental measurement) relies
on the significance of structural similarities in mathematical system (which
is applied to obtained measures) and the empirical system, an area of reality,
which includes the measured objects.
Unlike, derivative measurement requires the logical and mathematical
rules in regards to fundamental measures. According to Choynowskis view
[1971] if measurement is neither derivative nor fundamental, then it is not
a measurement at all.
Campbells work had its impact on the development of other measurement types, i.e. concerning: arbitrary measurement (relying on the presumed
relationships between the observations and the measured properties) and
measurement on indicators (depending on the assignment of numbers to
objects on the basis of direct reading from the scale being calibrated in appropriate way to correspond to the scores derived from fundamental or derivative measurement6.
According to Choynowski [1971] or Stobiecka [2010] theory of measurement puts greater emphasis on the issues related to:
1) presence of numerical representation of the empirical relational system (problem of variables measurability),
2) unambiguity of numerical representation (problem of scope in the design of the measurement scale),
3) importance of numerical representation (problem of interpretation of
the measurement results),
4) scaling, the construction of a numerical representation of the system
(problem of scaling rules for the scales construction and measurement
errors estimation).
Finally, in literature one can also note the following unique characteristic of the measurement in general such as relative or standardized meas 5

An example of measurement, which does not require previous measurements, is the


density.
6
According to Stobiecka [2010, p. 62], in measuring psychological traits, researchers
commonly use the indicators-based approach (e.g. tests, questionnaires), although the obtained results/scores do not match any of the known fundamental or derivative measurement.
Nevertheless, such a measurement allows to obtain a much more useful data.

Explication and additional interpretation of the measurement theories

53

Table 4. General classification of the measurement approaches by reference and


method
Method
Direct
Indirect
Standardized Measures of physical parameters
Determining physical measures by
and countable items
their effects
Relative
Measures derived from countable
Measures of qualities and abstract
items complaints/sales;
attributes satisfaction, helpfulness,
defects/cars
kindness, honesty
Reference

Source: own construction based on Kaydos 1999.

urement [Kaydos 1999] (see Table 4). Relativity defines a measurement that
is not referenced to something else and has no meaning. For example, in
marketing research, one might assume that a sales representatives performance is measured by the percentage of prospects from a given group that
make a purchase each month. If we further assume that sales representative achieves 53% ascore, we could ask is this good or bad performance?
On the other hand, if the reference in measurement for comparison is set
and is recognized as international standard (such as grams, seconds), then
measurement will be standardized. Countable items such as dollars, defects,
or late deliveries in services should be considered as standardized because
everyone agrees what they represent. So if there are no accepted standards,
the measurement will be rather relative.

Explication and additional interpretation of the measurement


theories
According to Hand [1996, p. 1]: as there are different interpretations of
probability, leading to different kinds of inferential statements and different
conclusions about statistical models and questions, so there are different theories of measurement, which in turn may lead to different kinds of statistical
model and possibly different conclusions. This has led to much confusion
and a long running debate about when different classes of statistical methods
may legitimately be applied7.
7
See for example the works of Stevens [1946, 1951], Lord [1953], Adams, Fagot and
Robinson [1965], Gaito [1980], Townsend and Ashby [1984], Michell [1986], and Stine
[1989].

54

Measurement methodology

Here we will distinguish three basic ways of understanding theories of the


measurement. They are as follows [Hand 1966]: 1)representational measurement theory8, 2) operational measurement theory9, 3) classical and other
theories of measurement.

Representational measurement theory


In representational measurement theory one begins with a set of objects,
each of which has one or more common attributes, each in turn of which
can be divided into mutually exclusive and exhaustive equivalence classes.
For example, at the level of single attribute, each object is uniquely allocated
to a single equivalence class according to the value of its attribute. Then
the objects and the relationships between them (induced by the relationships
between the equivalence classes for the attribute) constitute an empirical
relational system. In parallel with this, one can construct a numerical relational system comprising numbers (typically the real numbers, though they
need not be) and the relationships between them [Hand 1996].
Representational measurement theory is concerned with establishing
asort of mapping from the objects, via the equivalence classes to which they
belong, to the number system in such a way that the relationships between
objects are matched by relationships between numbers. These numbers form
the values of a variable. In particular, representational measurement theory
presents axioms which the objects must satisfy to permit such numerical
representation10. In consequence, statistical operations can be carried out
on the numbers and the aim is that conclusions reached about relationships
between the numbers will reflect corresponding relationships between the
objects.
This theory is closely referred to the isomorphism which is established
between the equivalence classes and the positive real numbers, and a homomorphism between the objects and the positive real numbers. In mathematical terms, a homomorphism is established from the empirical relational
system denoted as [A, ], where A represents the set of objects, to the nu 8

The magnum opus of representational measurement theory is the three-volume work


Foundations of measurement [Krantz et al. 1971; Suppes et al. 1989; Luce et al. 1990].
9
Operationalism was developed by Bridgman [1927] and adopted by Dingle [1950].
10
For example, axiom systems for extensive structures are given by Suppes [1951], Suppes
and Zinnes [1963], Narens and Luce [1986]. They have been generalized in various ways (e.g.
Roberts and Luce [1968], Narens [1974]).

Explication and additional interpretation of the measurement theories

55

merical relational system denoted by [R+, ]. Thus, for a set of objects A,


if the attribute has relationships R1, R2, , Rn, then we seek to establish
ahomomorphism from the empirical relational system [A, R1, R2, , Rn],
to numerical relational system [R, r1, r2, , rn], where ri are relationships
between numbers. In this case, different relationships Ri will be represented
by different ri.
The homomorphisms from the given empirical relational system to a particular numerical relational system will not be unique. There are typically
more than one set of numbers which models the empirical relationships, so
that the Ri may be accurately represented by ri for more than one numerical assignment11. And the fact that a given relationship between objects can
be represented by the particular numerical relational system in more than
one way induces taxonomy on the representations and hence leads to different types of scale. The set of homomorphisms leading to numerical representations of the empirical relational system and which are related by a given
type of transformation fall into one class. Those related by another type of
transformation fall into another class, and so on. This is also the essence of
Stevenss [1946, 1951] differentiation of scale types. In fact the modern classification is produced by noting that there is a one-to-one correspondence
between the set of homomorphisms of an empirical relational system into the
numerical relational system and the group of automorphisms of the empirical
system and then classifying the automorphism groups. The latter is done in
terms of the degree of homogeneity (k) and the degree of uniqueness (l) [Narens 1981a; Narens and Luce 1986] of the empirical relational system. These
tell us the size of structures which are preserved by the automorphisms12.
11
For example, given an acceptable assignment of numbers to the lengths of the objects,
then an arbitrary rescaling of the lengths (changing inches to centimeters) will also produce
an acceptable assignment: the ordering and the end-to-end concatenation operation will
be properly represented by and + respectively, in both numerical assignments. More generally, the structure of a model must be invariant to changes in the numerical assignment. This
is what lies at the heart of dimensional analysis in physics so that, for example, changing the
units in which length is measured, leads to balancing changes on both sides of a model formula. The dimensions of length must be balanced. Finney [1977] pointed out that dimensional analysis is at least as applicable to statistical models, presenting a series of examples which
show how the method can be used to detect model inadequacies.
12
For example, using the pair (k, 1) to classify scales, we find that ratio scales are of type
(1, 1) interval scales are of type (2, 2) and ordinal scales are of type (, ). Moreover, various
results have also been established about the possible scale types that can arise, so helping to
explain why so few scale types are used in the sciences. Details were given by Narens [1981a,
1981b] and Alper [1984, 1985, 1987].

56

Measurement methodology

According to Hand [1996], although concatenation operations, yielding


extensive measurement, played a fundamental role in the early development
of formal measurement theory and are central to the physical sciences, they
are of little use in the social and behavioral sciences where concatenation
operations are typically unavailable. This absence has been the source and
stimulus of much of the work on measurement theory. It stimulated thought
about alternative theories, and led to the development of alternative axiomatic structures which have subsequently also become important. These
include models for forming weighted means by von Neumann and Morgenstern [1947] and conjoint measurement.

Operational measurement theory


In operational measurement theory one defines scientific concepts in terms
of the operations used to identify or measure them. It avoids assuming an
underlying reality and so is fundamentally different from representationalism, which is based on a mapping from an assumed underlying reality. In
operationalism, things begin with the measurement procedure. In the operationally-based approach to measurement one depends on giving a sense for
operational abstracts according to predefined earlier rules and principles of
measurement [Hand 1996].
In this theory, an attribute is defined by its measuring procedure, no more
and no less, and has no real existence beyond that. In operationalism the
attribute and the variable are one and the same. This approach thus defines:
a measurement as any precisely specified operation that yields a number
[Dingle 1950, p.1]. It follows that the numerical assignment procedure has
to be well defined. Arbitrariness in the procedure will reflect itself in ambiguity in the results. This is one reason why problems arise in the social and
behavioral sciences, where, inevitably, measuring procedures are complex.
Acomplete specification of the procedure is often difficult or impossible and
different researchers may use the same name for variables that actually have
subtly different definitions, leading to different conclusions. Since the definition of the concept lies in the measurement procedure it is not a cause for
concern that different procedures lead to different conclusions, but rather an
indication that more refined theory needs to be developed.
Niederee [1994, p.568] said that operational approach removes ambiguity by defining a phenomenon in terms of a specified measurement procedure: this outspoken conventionalist procedure appears suitable for people,

Explication and additional interpretation of the measurement theories

57

say, who just want to establish plausible formal decision rules, or for practical situations where it doesnt really matter, or in scientific contexts where
some vaguely formulated theory is to be rendered plausible with the help of
generally accepted statistical methods. But in many scientific or practical
contexts, this strategy usually just begs the question. It must be therefore
admitted, that in many applications such an approach may not be suitable.
However, in others it may be. Firstly, if everyone uses the same conventions
to discuss some phenomenon then useful discussions can take place. And,
secondly, operational measurements in which the measurement sits properly
and effectively in a theoretical network of relationships with other variables
(i.e. those which yield effective predictions) are useful. Non-useful measurements presumably are not used, at least in good science.
Techniques for constructing operational measurements fall generally into
two classes [Hand 1996]: 1) those that focus on single variables and 2) those
that define a variable in terms of others. Examples of the former are paired
comparisons and rating scales. Examples of the latter are Guttman scaling and
unfolding methods (see, for example, van der Ven [1980]). Multidimensional
scaling [Cox and Cox 1994] is also an example of the latter, though until
recently the intrinsic non-linearity of many of the methods made it difficult
to give a meaning to the dimensions of the lower dimensional representation space in terms of the contributing variables. This has been, however,
overcome [Gower and Hand 1996]. Optimal scaling methods such as correspondence analysis and the more general methods described by Gifi [1990]
should also be mentioned in this context. They identify a numerical coding
of the raw variables which optimizes some additional criterion typically
some relationships between variables13.
All these mentioned above techniques identify a particular mapping from
the objects to numbers, i.e. they identify a unique measuring instrument.
This leads us to a fundamental point. In representational theory, the number assigned to an object is not unique. It could be any number from a set
of numbers. However, for a particular object, the number chosen depends
on the numbers chosen for the other objects, and this dependence arises via
the empirical relationships between the objects. In contrast, in o
perational
13

For example, we might find that particular numerical assignment for two ordinal scales
which optimizes the Pearson correlation coefficient between them, subject to fixed means and
variances. Or we might find that particular numerical assignment which maximizes the minimum possible correlation between the assigned numbers and all possible patterns of numbers satisfying constraints such as ordinality (as was explored by Abelson and Tukey [1959,
1963]).

58

Measurement methodology

theory the number assigned to an object is unique. It emerges from the


measuring instrument. Of course, for a particular statistical statement, other
numerical assignments might yield the same truth values [Hand 1996].

Classical theory of measurement


Third theory represents other variants in addition to the representational
and operational measurements. As Kyburg [1984, p.253] stated: most approaches to measurement that have been suggested, have taken the process
of measurement to be the assignment of numbers to objects and events.
Ihave suggested that the value (or interval of values) assigned to an object
or event by measurement is a magnitude (or interval of magnitudes), rather
than a number. This author also drew attention to the central role of error
in measurement. Measurement error represents yet another link between the
two concepts of measurement and probability [Hand 1996]14.
Michell [1986] described the classical theory of measurement, which he
contrasted with the representational and operational theories. He called it
classical because traces may be found in the works of Aristotle. According to this theory, measurement addresses the question of how much of
a particular attribute an object has and thus only refers to attributes which
are quantitative. A quantitative attribute is an attribute whose values satisfy ordinal and additive relationships. Michell distinguished this approach
from the representational theory by stressing that it is the attribute which
has these properties and not the objects. The behavior of a set of objects may
14

Any mismatch between theoretical predictions and observations can be explained in


two ways: either the theory is inadequate or there is measurement error or, and probably more
usually, both. Implicit in representational measurement is a theory about the objects, that they
are related in terms of some kind of behavior the attribute, in certain ways that form the
relationships of the empirical relational system. So, for example, one might assume that the
objects are ordered and satisfy some concatenation relationship, despite the fact that one can
establish this only for a finite number of (sets of) objects. The question of establishing relationships for the empirical relational system is tightly bound up with the problem of induction the
core of statistical inference itself. If the assumed relationships do not hold, or hold only approximately, then we should expect the predictions and inferences drawn from our numerical
calculations not to hold or to hold only approximately. But such approximations will also appear if the measurements are not perfectly reliable measurement error manifests itself most
clearly in the fact that repeated measurements of the same attribute of the same object (using
the same measuring instrument) can yield different values. It is a ubiquitous aspect of measurement in all scientific investigations. As such, one might argue, it should be integrated into
the theoretical structure describing measurement [Hand 1996].

Explication and additional interpretation of the measurement theories

59

or may not reflect the quantitative nature of the attribute in question. The behavior of objects is a function of their other properties as well as the attribute
in question. A physical concatenation operation between objects certainly
provides evidence for the quantitative nature of an attribute, but the lack of
such an operation does not mean that the attribute is not quantitative15.
In classical theory of measurement, the hypothesis that an attribute is
quantitative is a scientific hypothesis just like any other. Measurement then
involves the discovery of the relationship between different quantities of the
given attribute. The key word here is discovery. Whereas the representational theory assigns numbers to objects to model their relationships, and the
operational theory assigns numbers according to some consistent measurement procedure, the classical theory discovers pre-existing relationships. By
definition, any quantitative attribute has an associated variable. Developing
a measurement procedure according to the classical theory requires relating the hypothesized quantitative attributes to observable quantities within
some theoretical framework. The hypothesized quantitative attributes can
then be measured by virtue of their relationships. Here the hypothesized attributes, as well as their quantitative nature, are all a part of the theory being
studied. For example, Raschs [1977] notion of specific objectivity might be
regarded as fitting naturally into this framework16. Rasch gave an example
in which observed scores on a test are described by a Poisson model. The
parameter of the Poisson model can be viewed as an underlying measure of
ability for a given test. Rasch then showed that this model permits parameters to be separated into a set describing comparisons between the abilities
of individuals (independent of which test is used) and a set describing comparisons between test difficulties (independent of which subject is assessed).
It is this separation which leads one to believe that, for example, ability is an
intrinsic property of the individual. The link between the observed scores
and the parameters is stochastic and non-linear.
15

Evidence for the assertion that an attribute is quantitative may be found in other ways.
Michell [1986] cited the example of temperature: objects that have temperature do not satisfy
a concatenation relationship and yet this attribute is generally regarded as quantitative.
16
The property called specific objectivity (sometimes described as subject or item invariant measurement) is crucially important in comparing subjects in different groups (e.g.
across groups of subjects or over time) [Andrich 1988]. The Rasch model has become very
important in fine-tuned testing situations, especially in educational testing, because it allows
the equation of tests that are meant to measure the same concept at different but overlapping
levels, and it allows (computerized) adaptive testing, which gives the same measurement precision with a reduced set of items.

60

Measurement methodology

In classical theory, measurements are always real numbers. If we have


been able to measure them, the numbers which have resulted satisfy all the
properties required for arithmetic manipulation, so that we can manipulate
them by using any statistical operation. This is as true for latent variable
scores measures of a hypothesized underlying quantitative attribute as
it is for straightforward observables such as length or weight. It is also true
for measures such as preference scores. They are held to be measurements of
aquantitative preference attribute, though with measurement error and possible bias, which may indeed be non-linearly related to the attributes value.
Such bias, non-linearity and measurement error can be investigated by refining the theory in which the preference scale is embedded by relating the
scores to other variables and by using subtle statistical methods.
Finally, both representational and classical views are realist, and both produce mechanistic models (which is not to say that they cannot be used to produce descriptive models). However, the representational theory maps from
an assumed underlying reality and chooses numbers to produce a model of
the observed relationships between values of the attribute. So, only those
relationships observed to exist between objects (and hence between values
of the attribute) are modelled in the representational measurement system.
In contrast, in classical theory the numbers are a fundamental part of the
reality. Relationships not directly observed between objects may also appear
in the numerical system. There might, for example, be indirect evidence for
such relationships.
In classical theory the underlying attribute is assumed to be quantitative
and relationships between objects can be described in terms of it (perhaps
via simple concatenation operations or perhaps via something more subtle).
Take the case of score on a scale measuring preference for one of two alternatives. Representational measurement theory will assign people to positions
(and hence numbers) on the scale and will assert that only ordinality applies and can apply. In contrast, classical theory may assert that there is an
underlying quantitative variable, but that the scale is but a poor measure of
it (no doubt, with an unknown non-linear relationship to it). Operational
theory will define this particular type of preference as being the number
that emerges from the exercise. To take yet another example, representational theory may assign numbers so that the ratio between the two numbers
assigned to different attribute values is preserved by different numerical assignments. In contrast, classical theory will assert that the numerical value
of the ratio is an empirical property of the attribute, not something assigned
[Michell 1990].

Some measurement problems in context of the social sciences research

61

Some measurement problems in context of the social


sciences research
In social sciences, measurement and simultaneously scale development may
cause serious problems, because lots of studies e.g. analyses of phenomenons
in relation to people remain unobservable and quite often are too abstract
in order to be adequately characterized17. In many cases, opinion of various
authors in reference to the possibilities of measuring the traits (especially the
psychological ones), in social sciences is divided. Some even doubt in the
possibility of measurement of latent human traits at all, being often subject
of the research, for example, in psychology or marketing research, arguing
that, both areas deal with the research phenomena, which in its nature, is
impossible to be precisely caught in all levels of the measurement scales distinguished by Stevens. Although the problems of measurement are essentially
analogous in all scientific disciplines, in social sciences they are much more
difficult than in the natural and technical sciences, where e.g. physical measurements are based on formal (from mathematical perspective) procedures of
the measurement [Choynowski 1971].
The logic of appropriate choice of the measurement model in social sciences is strongly related with theoretical assumptions, which play akey role
in conceptualization of problems before the process of measurement begins. In this context, it is worth to mention that scientists tend to rely on
numerous theoretical models that concern rather narrowly circumscribed
phenomena. However, very often measuring of elusive and intangible phenomena is derived from multiple and quickly evolving theories. As aresult,
when one considers measurement, it is important to be mindful not only of
measurement procedures and to recognize their strengths and shortcomings, but also to know profoundly the theoretical background of explored
phenomena, as well as the abstract relationships that exist between hypothetical constructs and the quantitative tools which are being available. The
more one knows the better one will be equipped in the process of development of reliable and valid measurement instruments. As Sagan explained,
for the properly carrying out measurement process, it is essential to accumulate knowledge from theoretical issues underlying the field and specificity
17

One of the difficulties in social sciences is that many constructs are theoretical abstractions, with no known objective reality. Such theoretical constructs may be unobservable cognitive states, either individual or shared (e.g. cultural values). These constructs may exist more
in the minds scientists than in the minds of examinees.

62

Measurement methodology

of the explored problem. Therefore, it is not just a matter of familiarity with


technical rules, which we apply in the phase of empirical process including
conceptualization and operationalization, but it is also the matter of knowing the subject of measurement from theoretical background, especially in
the field in question [Sagan 2000]. The combination of theory and technical
knowledge helps the researcher to clarify and formulate appropriate content
of the various empirical indicators, then analyze concepts and finally generate on their basis theoretical constructs. A set of indicators and corresponding to them subsequently theoretical constructs may be further refined in
reference to reliability and validity.
In social sciences there appears a great issue about the choice of appropriate measurement strategy and choice of the appropriate observable variables, on the basis of which an unobservable/latent traits are constructed
such as: personal values, emotions, feelings, personality, etc. Measurement
of latent variables is most often carried out by indirect methods18, and
is much more difficult than measuring physical characteristics or demographics19. Usually, to the question on how and to what extent the measurement instrument can be used for measuring e.g. intelligence, personal
values, is difficult to answer. Such variables require a greater effort that
has to be taken in reconstruction, interpretation, judgment, comparison
or evaluation of less available sources of information [Walesiak 1996]. In
the analysis of these variables we use subjective criteria, which represent
arbitrarily established patterns of collected information. In essence, these
criteria are individualized and therefore not always comparable. However,
a one rule, that makes them comparable, relies on the assumption they
belong to the category of attitudes, so their measurement should be carried out in accordance with the methodology of attitudes research20. In
this methodology, an attitude is unobservable hence must be identified by
18
Demographic traits can be classified to objective criteria. The composition of these criteria includes also the geographic or economic criteria. In that sense, they are objective, because there exist real patterns, to which the observed human traits can be referred.
19
Age, gender variables, as the examples, have relevance to many theories but rarely require
a multiple-item scale for its accurate measurement. People usually know their age, gender so
they can retrieve information about it quite easily, which is opposite to unobservable (latent)
traits such as: intelligence, personal values, etc.
20
In the identification of consumer attitudes it is assumed that for a defined set of consumers there is an established range of consumer attitudes, common to all. However, individuals
may differ in their range of ability to integrate with particular attitudes [Rszkiewicz 2011,
pp.2627].

Some measurement problems in context of the social sciences research

63

a set of indicators, i.e. observable variables. Indicators of the specific attitude may be formulated as statements, to which examinee refers in the
course of the research, using for this purpose, most often the ordinal scale
[Ostasiewicz 2003].
The other yet type of measurement in social sciences refers to direct
methods. Goude [1962] based on these methods in the analysis of psychological traits and stated that the fundamental measurement in psychology
is plausible at the level of ratio scale, stressing that this possibility includes
both, a human as a subject and instrument in the measurement. In this context, methods of the fundamental measurement within the bounds of psychological variables are exactly the direct methods. And because different
methods give consistent results, they can be regarded as different ways of
measuring the same variables. Direct methods, were also a subject of interest of Stevens and Ekman [Stevens 1946, 1951, 1966]. The rich experimental
material included: psycho-physical scales, values of the aesthetic drawings, values of music songs, and writing style21.
In sum, in social sciences (especially in practice of the psychological or
marketing research studies) problems arise mainly of what has to be measured [Crocker and Algina 2008]. As Guion [2005, p.278] explained, the
problem of measurement appears always wherever the concept is not fully
or partially operationalized and becomes a problem everywhere, if the experiment does not confirm the theoretical predictions. In such cases, the
researcher must ask himself whether a failure arose due to the misleading axiomatic theoretical assumptions and its false, non-existing relationships or the conclusions derived from measurements were invalid and
unfounded.
Another problem is, that no single unified approach to the measurement
of any theoretical construct is universally accepted. Because measurements
of the psychological construct are mainly indirect, e.g. based on behaviors
that are perceived as relevant to the construct under study, there is always
apossibility that two theorists who talk about the same construct may select
very different types of behavior to define that construct operationally.
Moreover, psychological measurements are usually based on limited samples and the measurement obtained is always subject to error. So then, most
measurements are based on a limited sample of observations and usually are
taken at only one point in time.
21

Their empirical results indicated the possibility of fundamental measurement application in psychology.

64

Measurement methodology

The lack of well-defined units on the measurement scales poses another problem. Does the fact that examinees can answer none of the items on
a long-division test indicate that they have zero mastery of this skill? Defining the properties of the measurement scale, labeling the units, and interpreting the values derived are complex issues which also must be considered
whenever instrument is developed and a scoring system devised.
Finally, constructs in social sciences cannot be defined only in terms of
operational definitions but must also demonstrate relationships to other
constructs or observable phenomena. This provides a basis for interpreting the measurements obtained. If such relationships cannot be empirically
demonstrated, the measurements obtained are of no value. Obtaining evidence of how a set of measurements relates to measures of other constructs
or events in the real world is the ultimate challenge in test and scale development [Lord and Novick 1968].

The meaning of tests in classical test theory CTT


Polish pioneer of psychometrics, Choynowski, adopted the following definition of test: tests are instruments, sets of questions or situations designed to
study psychological properties of individuals or groups of people, by causing
in these individuals certain observable verbal or non-verbal reactions, which
as far as possible should be representative [Choynowski 1971, p.66]22. He
also added: tests must meet the various, defined by theory and practice,
criteria such as reliability, validity, specificity, discriminance, objectivity, and
standardization.
Aranowska [2005] assumed that test should be based on a specific diagnosis procedure. It may be represented by specific set of questions, which
under standard conditions should cause certain types of behavior and provide the desired results of the psychometric properties, that is, on desirable
level of the estimated reliability and validity. In her opinion, the most important criteria are test validity (defined as proof confirmed throughout the em 22

Term test is interchangeably used with other term such as scale which is popular in
social research and attitudes scaling [Mokken 1997]. Scale can be regarded as a kind of integral
component of the test or it can be equivalent form for it. Test can be associated with already
available (at a given time for the researcher) measurement instrument or it can be associated
with newly designing instrument. The examples of such tests are Rokeachs or Schwartzs scales
measuring personal values.

The meaning of tests in classical test theory CTT

65

pirical process of the test), and test reliability (i.e. an accuracy to be obtained
after the application of this test, which enables to estimate error that can be
committed by the researcher). In short, a useful test measures accurately and
effectively some property, attitude or behavior of the human being. The test
is reliable if it measures exactly what it measures, and is valid, if it measures
what it should measure23.
Standardization of test (which means a uniform way to use the test),
has the task of minimizing the dependence of test results from the side effects, such as: behavior of the person carrying out the test, conditions under
which the test is conducted. Hence, a well-standardized test follows the instructions24 and key, or set of rules according to which we ascribe, the exact
meaning of the particular items of the test and then make interpretation of
the results [Thorndike and Hagen 1969, pp. 204207].
The last two important criteria in test construction are the objectivity
and normalization of the test. In the former one there are two different
people compiling the test results and conducting simultaneously further
analytics, who will reach the same results. The latter criteria mean that
result of the test can be applied to general population. In this situation,
researcher must have the appropriate reference system with which he will
be able to compare the obtained results, and such system will be achieved
by testing a representative sample.
If we carry out the test or construct measurement instrument, we generally base on the rules which were developed in classical test theory CTT.
The CTT originates from Spearmans early works [Spearman 1904a, 1904b;
1907; 1910]25 explaining that scores of measurement are reflecting inaccu 23

These issues are discussed in Chapter 5.


For example, the user should find an explanation on how to respond to test items. It
should be clear whether or not to respond to all items in sequence or the order is not mandatory.
25
Spearman borrowed a perspective from the fledgling statistical approach of the time and
posited that an observed total score on the instrument X was composed of the sum of a true
score T and an error E. The introduction of an error term allowed for a quantification of inconsistency in observed scores, which was part of the solution to the problem with Guttman
scales. Guttman scaling issues were focused on the meaningfulness of the results from the instrument (e.g. its validity), whereas in CTT models, the statistical nature of the scores focused
attention on the consistency of the results from the measurement instrument (e.g. its reliability). There has been a long history of attempts to reconcile these two approaches. One notable
early approach was that of Thurstone [1947], who clearly saw the need to have a measurement
model that would have combined the virtues of both.
24

66

Measurement methodology

rate measures of psychological traits26. This thought was further developed


by Guilford [1950] and Gulliksen [1950].
Gulliksen has introduced to academic research society, a measurement
model which was used in physics. He claimed, that everyone at any given
moment of time possesses one of the true values of the psychological trait (as
measured by specific test). Both, true value (denoted here as true score) and
measurement error is unobservable. Observable is the only score obtained
from the test. To estimate the true score and error, one must accept certain
assumptions that make up the axiomatic theory. Just to remind one of these
assumptions is that the score of the examinee is the realization of a certain
trait, of which the cumulative distribution function represents a propensity
distribution of the examinee towards the respective instrument, test, or even
a single item. True score reflects the point placed on the intensity continuum
of variable27. According to Gulliksen, it is a limit to which an average score
of j-th person in given number of tests drifts, assuming that the number of
parallel tests k increases unlimitedly. With atestk going to infinity , scores
can be scattered on certain distance of the scale (with the uniform distribution) or focus around several points of the scale forming two- or multimodal distributions.
Denotation of X trait distribution, i.e. the examinees propensity distribution towards measurement instrument test, is nearer, unknown form of
acontinuous function28:
F(x) = P(X x), < x < . 


26

(2.3)

Indeed, there are potentially large errors in psychological measurement [Lord and Novick
1968].
27
Magnusson [1981] argues that in CTT, instruments lead to reproduction of the measurement in the form of numerical values. Variables are thus measurable, and psychological traits are
reflected on the latent trait continuum. However, there is an important distinction between the
real, observed data and theoretically latent continuum, which displays the items distribution of
the examinees according to studied trait. While making the measurement we are interested in
what type of latent trait underlies the observed scores and how these scores are determined. If
we count on the presence of the latent continuum, we can make different assumptions, in part,
taking into account the relationship between the items and their position on that latent continuum and also by considering distribution of empirical data, resulting from distribution of
scores of the examinees on the latent continuum. Finally, we may assume that the distribution
of trait of all examinees assumes normal shape and that there is a relation between test score
and the position of the examinee on a continuum. However, when measuring attitudes depending on the type of measured attitude we have to reckon with the occurrence of the latent
continuum where distribution of scores may have different shape than normal.
28
In general, this is a normal distribution.

The meaning of tests in classical test theory CTT

67

This distribution often remains unobservable, due to the of lack of the


infinite number of independent observations of x value for variable X of one
person [Nowakowska 1975]. However, even a finite number of observations
seems to be unacceptable because in social studies the obtained scores are
part of strongly invasive tests. A person who solves the same test once again
is actually a different person by the influence of certain side effects. For example, that person may have the knowledge about the test, etc.
According to the Gulliksens works, we can assume that a true score of
the examinee has a specific location on the continuum, and that this score
is the same in all parallel tests. Thus, if scores are obtained by j-th examinee
then:

Xj = Tj + Ej,

(2.4)

where:
Xj independent variable of the total test score for particular examinee,
Tj dependent variable which is the examinees true score,
Ej the examinees error on testing occasion.
In more general way, CTT model (for population) is expressed by equation:

X = T + E, 

(2.5)

where:
X observed score,
T true score,
E error score or error of measurement.
In other words, Eq. (2.4) denotes that observed score X by j-th examinee
is the sum of two parts: T true score, and E error of measurement. That
means the observed scores combine additively and directly to predict the
true score.
Equation (2.5) describes the influence of measurement errors within the
observed and true scores in more general way, that is, population. In this approach, true and error scores are part of unobservable construct. True scores
represent an average score taken over repeated independent tests. It should,
however be emphasized that assumption in Eq. (2.5) is very strong and does
not always stems directly from the assumptions of repeatability. Although,
the current realities of the research findings undermine in many cases, the
meaning of this assumption, classical test theory in so-called old garment
persists to this day.

68

Measurement methodology

Conditions of CTT
There are a few important conditions to be met in CTT theory. These conditions are as follows [Allen and Yen 1979]:
1. E(X) = T
First condition states that the expected value (population mean) of X is T.
This assumption of the definition of T: T represents the mean of the theoretical distribution of X scores that would be found in repeated independent
tests of the same examinees with the same test. Gulliksen, when he formulated the concept of errors assessment, has assumed the abolition of all errors
in testing a given examinee infinitely many times, using the same test.
2. E(e) = 0
The second condition explains that the expected value (population mean) of
the error scores for any examinee is 0. This condition says that measurement
instrument should be unbiased and error score is only a random error.
3. ET = 0
Next condition refers to correlation between true score and error score. In
consequence (assuming that error score belongs to random error) there is no
correlation between true score and error score.
4. E1E2 = 0
The fourth condition assumes there is no correlation between errors, where:
E1 is the error score for a test number one, and E1 is the error score for a test
number two.
5. E1T2 = 0
Finally, the condition no. 5 explains, there is no correlation between error
score for a test no. 1, and true score for a test number two.
For the definition, T in the assumptions underlying the condition no. 1,
we assume that the tests are independent, that is, each test has no influence
on any subsequent test. Because the lack of contamination among tests is impossible in practice, and an infinite number of tests are not available, Tmust
remain a theoretical construct. From the condition no. 2, we see that T is
defined in terms of an expected test scores rather than in terms of any real
trait of the examinee.

The meaning of tests in classical test theory CTT

69

As far as the conditions no. 2 and 3, the error scores and true scores
obtained by a population of examinees on a test are uncorrelated, and the
measurement instrument should be unbiased, and error score is only a random error. This implies that examinees with high true scores do not have
systematically more positive or negative measurement errors than examinees with low true scores. However, this assumption would be violated if, for
example, one group of examinees with low true scores would copy somehow
the answers from the other group with high true scores. This situation would
create a negative correlation between true scores and error scores.
Condition no. 4 states that the error scores on two different tests are uncorrelated, that is, if examinee obtains a positive error score on test number
one, he or she is more likely to have a positive error score on test number
two. This assumption is not reasonable if the test scores are greatly affected
by factors such as fatigue, practice effects, the examinees mood, or effects of the
environment29. For example, suppose that two tests are usually administered
alone, but on occasion they are given as the last tests in a long battery of
tests. Some of the examinees may have become fatigued during the testing,
and consequently perform unusually poorly on the last two tests, resulting
in negative errors of measurement on those two tests. Other yet examinees
may have been benefited from the practice provided by the earlier tests and
therefore perform unusually well on the last two tests, resulting in positive
errors of measurement on both those tests. A situation such as this would
produce a positive correlation between errors of measurement on two tests.
If we knew what the examinees error of measurement was for one of the
tests, we could predict the error of measurement for other test.
Condition no. 5 explains that error scores on one test E1 are uncorrelated
with the true scores on another test T2. This assumption would be violated
if test two will measure a trait that influences errors on test number one. It
would also be violated under the same conditions that lead to violations of
the condition no. 3.
Finally, on the basis of all above five conditions, we can say that in CTT:
If two tests have observed scores X1 and X2 that satisfy the conditions
1 through 5, and if, for every population of examinees, T1 = T2 and
E21 = E22 then the tests will be parallel. Here X1 is an observed score
29

If we want to apply classical test theory to tests that are greatly influenced by practice
effects, fatigue, or environmental conditions, then we need to undertake an attempt to ensure
that testing conditions are as homogenous as possible for all examinees on all tests over all
testing occasions. This control should reduce the size of the errors of measurement on each
test as well as the correlations of errors of measurement between tests.

70

Measurement methodology

= its
E22error variance. Error varifor one test, T1 is its true score, and E21 is
ance is the variance of error scores for that test among the examinees
1
2
in particular population. On the other side, X2, T2,and
E1 = E2 are the
observed score, the true score, and the error variance, respectively for
the second test.
If two tests have observed scores X1 and X2 that satisfy conditions
1 through 5, and if, for every population of examinees, T1 = T2 + ci,
where ci is a constant, then tests are called essentially tau-equivalent.
Tau represents the true score, T. And for tau-equivalent tests there is
T1 = T2.
Intuitively, content of the above concepts is a formal expression of the
requirements, which one needs to meet for two tests measuring the same
thing. Using the definition of true score as the expected value of the propensity distribution, it seems natural to assume that two tests measure the
same thing, if the expected results are equal (i.e. their expected values of the
propensity distributions are equal for each examinee).
The adoption of linearly experimental independence of measurements,
(that is specific assumption for a test theory, concerning the repeated measurements), introduces us to antoher condition, which assumes that tests are
tau-equivalent, or essentially tau-equivalent, provided they have true scores
that are the same except for an additive constant ci. Formally speaking, two
tests are tau-equivalent, if they are linearly independent, and for each examinee the true scores are equal. In other words, the expected value of score for
each examinee in one test does not depend on the second test score and the
expected values of the examinee in both tests are equal.
The above definition can be strengthened by imposing additional requirements. So that, for each examinee, variances of the propensity distributions
should be equal too. Thus, the two tests will be tau-equivalent if they are linearly independent, and for each examinee, propensity distributions will obtain the same or at least the approximate means and variances. On the other
hand, we can weaken definition of tau-equivalentness, assuming that the true
test scores may be different for each examinee. In this understanding, we can
formulate definition of essentially tau-equivalentness, which says that two
tests are essentially tau-equivalent if they are linearly independent and there
exists a certain constant, so that each examinees true scores may differ by
this constant [Nowakowska 1975]. For example, in one test four examinees
might have true scores of 10, 11, 13 and 18. If this test and a second test were
essentially tau-equivalent with c1 = 3, the examinees would have true scores
of 13, 14, 16, and 21 on the second test.

The meaning of tests in classical test theory CTT

71

Criticism of CTT
Criticism of measurement models according to CTT referred mainly to the
assumptions of the objectivity and independence of measurement [Nowakowska 1975]. Just to remind, according to Guilfords opinion [1950], true
score is the average score of a given examinee, which is being obtained on
the basis of infinite number independent studies, using the same test. Although technically such investigations are not possible, this potential experimental approach provided a basis for the formulation of quasi-operational
definition of the true score. This point of view was also supported by Lord
and Novick [1968], who suggested that true score is the finite expected value
from the probability distribution of the examinees propensity. However,
with such understanding of the true score as theoretical concept, it may not
have reasonable interpretation and may not have always empirical sense.
The advent of modern theories, the so-called theory of generalizability
and items response theory changed the perspective of measurement30. In the
former theory, generalizability (developed by Cronbach et al. [1972], as the
attempt to increase the accuracy of test interpretation, we explicitly consider
different systematic sources of variance in measurements and describe ways
of estimating the amount of variance contributed by these sources. Generalizability theory perceives classical test theory as being oversimplified and
ambiguous. Observations are seen as samples drawn from a universe of admissible observations, where the universe describes conditions, under which
examinees can be observed or tested, that produce results which are equivalent to some specified degree. The examinees universe score is defined to be
the expected value of his or her observed scores over all admissible observations and the universe score is directly analogous to the true score in CTT.
Generalizability theory emphasizes that different universes exist and it is the
test constructors responsibility to define carefully his or her universe. This
definition is done in terms of facets. For example, particular facets could be
the size of the testing group, the types of training received by the examinees,
the test form, the occasion for testing, and so on. And usually, generalizability theory involves conducting two types of research studies: a generalizability and decision study31 [Allen and Yen 1979].
30

Discussion on item response theory is conducted in the next section.


Generalizability study is done as part of the development of the measurement instrument.
The main goal of the study is to specify the degree to which test results are equivalent when
obtained under different testing conditions. Such study involves collecting data from examinees tested under specific conditions, estimating variance components due to facets and their
31

72

Measurement methodology

In general, critical arguments, that were pointed towards the CTT theory,
were associated with the concept of true scores and the reduction of issues
concerning the possible replications of measurement, to equally abstract
idea concerning the parallel measurements. However, in practice the utility
of both two concepts appears to be in many research cases very weak. True
score is like a kind of mathematical abstraction, it is not directly measurable.
Similarly, parallel measurements in practice are hardly to be obtained. As
Embretson and Reise [2000] argued, true scores apply to items on a specific
test or to items on test with equivalent item properties. That is, since no provision for possibly varying item parameters is included in CTT model, they
must be regarded as fixed on particular test. If more than one set of items
may reasonably measure the same trait, latent variable, the generality of true
score depends on test parallelism or test equating. Secondly, the omission
of item properties from the model requires that they be justified outside the
mathematical model for CTT. Hence using item difficulty and discrimination to select items is justified by their impact on various test statistics, such
as variance and reliabilities. Finally, independent variables are not separable for an individual score. Instead, the model is used to justify estimates of
population statistics. When combined with other assumptions (error distributions), Eq. (2.5) provides a rationale for estimating true variance and error
variance. Although true and error components may be further decomposed
under extensions of CTT, multiple observations under varying conditions
are required [Cronbach et al. 1972].

Item response theory IRT


Notion and origins of IRT
Item response theory (IRT) represents the alternative way of latent trait measurement as compared with classical test theory (CTT). In particular, IRT
will be useful when we develop a scale with items measured on dichotomous
interactions using analysis of variance, and producing coefficients of generalizability. Acoefficient is the ratio of universe-score variance to observed-score variance and is the counterpart
of the reliability coefficient used in CTT.
Decision study is what measurement instrument produces data to be used in making decisions or researching conclusions, such as admitting people to programs who display certain
skills. The information from this study is used in interpreting the results of generalizability
study and in reaching sound conclusions.

Item response theory IRT

73

data32. This theory originates from early works of Lazarsfeld, Green, Lord
and Torgerson which were produced in 50s of the past century as the opposite direction to deterministic Guttmans perfect scale model33 [Aranowska
2005].
These new developments, depending on whether they regard imperfect
data deviations from a perfect Guttman scale as systematic (such that they
must have a substantive interpretation) or random (such that they should be
treated in a probabilistic manner), can be divided into two major approaches.
Advocates of the first approach include, for example, Bart and Krus [1973],
Dayton and MacReady [1980], Ganter and Wille [1999], who tried to systematically represent deviations from a perfect Guttman scale by invoking
one or more separate additional dimensions. Advocates of the second approach include Rasch [1960] and Mokken [1971, 1997]. Their approach was
in the first row known as modern test theory but it is now better known as
item response theory [van Schuur 2011].
At the heart of the IRT theory one can find a mathematical model which
explains how the examinees at different ability levels for the trait should respond to an item. The term ability can be referred to an unobservable,
latent trait. Although, such variable is easily described and knowledgeable
examinees can list its attributes, it cannot be measured directly (as compared
with height or weight), since the variable is rather concept than a physical
dimension.
The approach taken to measure ability is to develop a scale consisting of
a number of items (questions). Each of these items measures some facet of
the particular ability of interest. From a purely technical point of view, such
items are free-response items for which the examinee can write any response
that seems appropriate. The examinee must then decide whether the response is correct or not. When the item response is determined to be correct,
the examinee receives a score of one. An incorrect answer receives ascore
of zero. So, under item response theory, the primary interest is whether the
32

In many marketing research projects, there are mostly used Likert scales. In consequence,
nominal scales with regard to dichotomous data are largely neglected [Likert 1932; Bearden
and Netemeyer 1999; Salzberger 1999].
33
However, IRT test theory has its roots in the psychological measurement work of Binet,
Simon and Terman as far back as 1916 [for more information, see the work of Bock 1997].
And the formal basis of IRT as an item-based test theory is generally attributed to the work
of Lawley [1943; see Baker 1992; Weiss 1983]. His pioneering work was, in turn, expanded
significantly by Lord [1952] who formalized IRTs role as an extension of the classical theory
[Baker 1992].

74

Measurement methodology

examinee got each individual item correct or not. This is because the basic
concepts of this theory rest upon the individual items of a test rather than
upon some aggregate form of the item responses such as a test score in CTT
[Baker 2001].

Item characteristic curve ICC


A reasonable assumption in IRT theory lies in fact, that each examinee,
responding to item on test, possesses some amount of the underlying ability. Thus, one can consider that each examinee has a numerical value, score
that places him/her somewhere on the scale. This ability score is denoted
by the Greek letter theta, . At the same time, at each ability level, there
appears a certain level of probability denoted as P(). In case of the tested
item, this probability will be small for examinees with low ability and large
for examinees with high ability. If we decide to plot P() as a function of
ability, the result will be a smooth S-shaped curve (known as ICC curve34).
The probability of correct response is near zero at the lowest levels of ability.
It increases until the probability of correct response approaches the highest levels of ability. The S-shaped curve describes the relationship between
the probability of correct response to an item and the ability in the context
of hypothetical scale [Baker 2001]. In other words, the curve reflects the
probability of selecting a positive (correct or keyed) response to an item.
It can be thought of more generally as reflecting the probability associated
with moving from one response category to the next along the entire trait
continuum. Thus, this function depicts the probability of making the transition from responding in one category to responding in the next, across the
boundary between categories that the function represents.
Mathematically speaking, in IRT, the examinee has a potential ability
and probability which equals of 1, to give correct answer on i-th item under
investigation. When difficulty level is lower or equals to , the success of
making right answer will be 0. In short, IRT assumes a probabilistic model
34
Instead of term trace line used by Lazersfeld, Thurstone or Mokken, nowadays we use
another term item characteristic curve ICC. This curve plots the probability of responding
correctly to an item as a function of the latent trait denoted by underlying performance on
the items on the test. ICC typically describes how changes in latent trait level relate to changes
in probability of the specified response. For dichotomous items, in which a specified response
is considered correct or in agreement form with an item, the ICC regresses the probability
of item success on trait level.

75

Item response theory IRT

Pi ( )
1

0.5

Latent trait/ability

Figure 4. The relationship between ability and item response on item


characteristic curve ICC
Source: based on Rosenbaum 1987, Raju 1988, Embretson and Reise 2000.

wherein the likelihood that examinee will respond in a particular manner to


an item is proportional to that examinees position on a latent trait or continuum (see Figure 4).
Item characteristic curve permits us to see how the probability of answering correctly depends on the latent trait. Lord [1980] has mentioned two
acceptable approaches to interpret the probability of responding correctly
to an item:
At first, we must conceptualize a subpopulation of examinees at each
point on the latent trait. The defining characteristic of each subpopulation is that its members all have the same latent trait score. Members
of such a subpopulation will be described as homogeneous with respect to the latent trait, and a subpopulation of such examinees will be
called ahomogeneous subpopulation. The probability of responding
correctly is then interpreted as the probability that a randomly chosen
member of a homogeneous subpopulation will respond correctly to
an item35.
The second acceptable interpretation refers to a subpopulation of
items all of which have the same ICC. The probability of responding
35

Consider for example an ICC which indicates that for = 2 the probability of responding
correctly is 0.87. This can be interpreted to mean that the probability, that a randomly chosen
examinee with = 2 will respond correctly, is 0.87. An equivalent interpretation is that of the
examinees with = 2 the proportion who can answer the item correctly is 0.87.

76

Measurement methodology

correctly is then interpreted as the probability that a specific examinee


will answer an item randomly chosen from the subpopulation of items.
Pi ( )
1

0.5

Latent trait/ability

Figure 5. A step function ICC


Source: based on Crocker and Algina 2008.

Although the S-shaped ICC is widely used in test development and scale
construction for dichotomous items, there is another type of ICC shape (see
Figure 5) in reference to the step function. This function implies that there
is a minimum latent trait score denoted by below which examinees cannot
answer the item correctly. However, any examinee with ability level equal to
or greater than will respond correctly to the item. Such step functions are
useful in introducing several important concepts of item response theory.
However, step function of ICCs is less commonly used in scale construction
than are the S-shaped ICCs because actual test data are generally more consistent with the S-shaped curve [Crocker and Algina 2008].

Classification of basic dichotomous data models in item response


theory
Item response theory represents a family of models rather than a theory
which specifies a single set of procedures. One important way in which the
alternative IRT models differ is the number of item parameters with which
they are concerned [Andrich 1988]. However, the mechanics of IRT can be
presented most easily in terms of a dichotomous model, that is, amodel

Item response theory IRT

77

for items with only two response alternatives. Typically, such items require
responses that are either correct or incorrect36.
The general classification of IRT models for binary (dichotomous)37 data
includes:
unidimensional models which measure a single latent trait (e.g. based
on logistic one-parameter Rasch model, two-parameter logistic model,
three-parameter logistic model),
multidimensional models measuring two or more latent traits.
Additionally, multidimensional models can be classified into two cate
gories:
exploratory multidimensional models such as: multidimensional
Rasch model, multidimensional extension of the two parameter logistic model, or multidimensional extension of three-parameter logistic
models38.
confirmatory multidimensional models which include: models for
noncompensatory dimensions, models for learning and change, models with specified trait level structures, models for distinct classes of
examinees39.
In unidimensional IRT models, a single trait is deemed sufficient to characterize examinee differences, hence such models are appropriate for data in
which a single common factor underlies item response (see Figure 6). However, unidimensional IRT models are suitable for items with two or more
underlying factors too. For example, a unidimensional IRT model will fit
the data when all items involve the same combination of each factor. Unidimensional IRT models are not appropriate for data in which: 1) two or
more latent traits have differing impact on the items, and 2) examinees differ
36

In dichotomous IRT models, the item category that represents a positive response (coded
as 1) is described as indicating a correct response to an item (the alternative category, coded
as 0, indicates incorrect responses).
37
On the other hand, factor analysis of binary items has become increasingly similar to
multidimensional IRT models. In fact, under certain assumptions, it can be even proved that
they are the same [Tanaka and de Leeuw 1987]. McDonalds [1967] non-linear factor analysis
can be considered a unifying foundation for factor analysis, classical test theory and item response theory. Having based on that theory, we assume that a persons potential on an item is
weighted combination of their standing on the underlying trait dimensions.
38
Their full description in literature as well complete procedures can be found in works
of the following authors: Reckase and McKinley [1982], Stegelmann [1983], Reckase [1997].
39
For comprehensive review, please review works of Whitely [1980], Embretson [1984,
1991, 1997], Wilson [1985], DiBello, Stout and Roussos [1995], Rost [1990], Wang, Wilson
and Adams [1997].

78

Measurement methodology

Item 1
Response

Latent
variable

Item 2
Response

Item I
Response

P ( X1 j = 1)
P ( X1 j = 0)

P ( X2 j = 1)
P ( X2 j = 0)

P ( X ij = 1)
P ( X ij = 0)

Legend: The observed variables are responses (e.g. 0, 1 or 3) to specific items. The latent variable influences the probabilities of the responses to the items. The number of response options varies across items.

Figure 6. The IRT model for measurement of latent variable


Source: based on Embretson and Reise 2000, p.42.

systematically in the strategies, knowledge structures, or interpretations that


they apply to the items. In these cases, multidimensional IRT models should
be preferably applied, which contain two or more parameters to represent
each examinee. Multiple dimensions provide increased fit for item response
data, when examinees differ systematically in which items are hard or easy.
And in many multidimensional models, multiple item discrimination parameters represent the impact of the dimensions on specific items [Embretson and Reise 2000].
Multidimensional models (either exploratory or confirmatory) differ in
some aspects. For example, analogously to common factor analysis (as it is
in case of CTT), exploratory multidimensional IRT models involve estimating item and parameters on more than one dimension to improve the fit
of the model to the data. Here, theories about substantive nature of factors
do not determine the estimation process or the required number of factors.
In contrast, confirmatory multidimensional IRT models involve estimating
parameters for specified dimensions. Confirmatory analysis involves the relationship of the items to the dimensions.

Item response theory IRT

79

Unidimensional logistic models


In this section we focus on description of the unidimensional logistic models.
Starting with an early Rasch model (name was derived after its inventor,
the Danish statistician Georg Rasch), known as the one-parametric logistic model (1PL), we can say that for simple Rasch model the dependent
variable represents dichotomous response (success/failure or reject/accept)
for aparticular examinee to a specified item and the independent variables
are the examinees trait score and the items difficulty level bi. Independent
variables are combined additively, and the items difficulty is subtracted from
the examinees ability. The relationship of this difference to item responses,
rests on dependent variable which is modeled, log odds or probabilities40.
Odds are expressed as a ratio of the number of successes to the number
of failures [Rasch 1960]:

ln

Pi ( )
= bi,
1 Pi ( )

(2.6)

where: bi parameter corresponding to the location of i-th item, assuming


that difficulty item is ; bi, when success of correct answer for i-th position is equal 0.5 where answer is random.
For example, if the odds that the examinee passes an item are 4/1, so out
of five chances, we expect four successes and one failure. Alternatively, odds
are the probability of success divided by the probability of failure, which
would be 0.80/0.20 in this case. If the trait level equals item difficulty, then
the log odds of success will be zero. On the other hand, taking the antilog
of zero yields an odds of 1.0 (or 0.50/0.50) which means that respective examinee is as likely to succeed as to fail on particular item.
Solving the Eq. (2.6) we obtain the second version of the Rasch model
which reflects principle for scaling of single metrical trait [Embretson and
Reise 2000]:

40

Pi ( ) =

e( bi )
1+ e

( bi )

1+ e

( bi )

.(2.7)

In the log odds version of Rasch model, the dependent variable is the natural algorithm
of the odds of passing an item.

80

Measurement methodology

This version can be also applied to interval scale construction. Here, dependent variable is predicted as the probability rather than as log odds. It is
due to its exponential (e exp) form in predicting probabilities and the
inclusion of only one item parameter (e.g. difficulty) to represent item differences. The dependent variable is the simple probability that examinee passes
an item.
In the course of time, the IRT basic Rasch model was extended with additional parameters. As a result, there appeared a two-parametric logistic
model (2PL) of Birnbaums [1968]41. In 2PL model, there are two parameters to represent item properties. Both item difficulty bi and item discrimination ai are included in the exponential form of logistic model:

Pi () =

e Dai ( bi )
1+ e

Dai ( bi )

1+ e

Dai ( bi )

,

(2.8)

where:
ai parameter of item discrimination, corresponding to curve slope,
D some constant.
When D = 1.7, then (2.8) reflects normal distribution function.
Noteworthy is the item discrimination which is a multiplier of the difference between trait level and item difficulty. Item discriminations are related
to the biserial correlations between item responses and total scores. For the
above Eq. (based on two-parametric logistic solution), the impact of the difference between trait level and item difficulty depends on the discriminating
power of the item. Specifically, the difference between trait level and item
difficulty has greater impact on the probabilities of highly discriminating
items [Embretson and Reise 2000].
In applying one- and two-parameter logistic models to data obtained from
multiple-choice or true-false items, a problem arises because these formats
permit correct responses from guessing42. For the one- and two-parameter
models, value of Pi() tends to approach zero as gets smaller. However, one
41

In general, 2PL model is appropriate for measures in which items are not equally related
to the latent trait, or from another perspective, the items are not equally indicative of the examinees standing on the latent trait.
42
Neither of the two previous models took the guessing phenomenon into consideration.
Birnbaum [1968] modified the two-parameter logistic model to include a parameter that represents the contribution of guessing to the probability of correct response. Unfortunately, in
so doing, some of the nice mathematical properties of the logistic function were lost [Baker
2001].

81

Item response theory IRT

Pi ( )
1

0.5

bi

bj

Latent trait/ability
Legend: Item j-th is much more difficult and simultaneously it discriminates more particular answer

Figure 7. Illustration of items with different discrimination and difficulty


parameters on ICC
Source: based on Embretson and Reise 2000.

might suspect that even for examinees with very low abilities, the proportion
responding correctly will be greater than zero because these examinees guess
the correct answer. To consider this possibility, we can use a three-parameter
logistic model with guessing parameter.
The three-parameter logistic model 3PL is [Birnbaum 1968]:
Pi ( ) = ci + (1 ci )

e Dai (bi )
1+ e

Dai ( bi )

= ci +

1 ci

1 + e Dai (bi )

fori = 1, , k,  (2.9)

where k denotes number of item position.


In three-parametric solution, a one more parameter is added to represent an item characteristics curve that does not fall to zero. For example,
when an item will be solved by guessing (as in multiple choice items), the
probability of success is substantially greater than zero, even for low trait
levels. Simply put, this model accommodates guessing by adding a lower
asymptote parameter ci which represents the probability of getting the item
correct by guessing alone. It is important to note that by definition, the value of ci does not vary as a function of the ability level. Thus, the lowest and

82

Measurement methodology

highest ability examinees have the same probability of getting the item correct by guessing. The parameter ci has a theoretical range of 0 c 1.0, but
in practice, values above 0.35 are not considered acceptable [Baker 2001].
A side effect of using the guessing parameter ci is that the definition of the
difficulty parameter is changed. Under the previous two mentioned models
(1PL and 2PL), bi was the point at which the probability of correct response
was 0.5. But now, the lower limit of the item characteristic curve is the value
of ci rather than zero. The result is that the item difficulty parameter is the
point on the ability where [Baker 2001]:
Pi () = ci + (1 ci ) 0.5 =

1 + ci
.
2

(2.10)

This probability is halfway between the value of ci and 1.0. So what has
changed here is that the parameter ci has defined a floor to the lowest value
of the probability of correct response. Thus, the difficulty parameter defines
the point on the ability where the probability of correct response is halfway
between this floor and 1.0. The discrimination parameter ai can still be interpreted as being proportional to the slope of the item characteristic curve at
the point = bi. However, under the three-parameter model, the slope of the
item characteristic curve at = bi is actually ai(1 ci)/4. While these changes
in the definitions of parameters bi and ai seem slight, they are important
when we need to interpret the results of analyses.
Finally, in order to select an appropriate model: whether 1PL, 2PL or 3PL,
we can follow the criteria such as: 1) the weights of items for scoring (equal
vs. unequal), 2) the desired scale properties for the measure, 3) fit to the data,
and 4) purpose for estimating the parameters43.
In case of the simple IRT model (e.g. Rasch model), the important information one needs to know is the difficulty level of the item. Hence, this
model is often called a one-parameter model44. Of course, the more items
one tries, the more accurate will be the estimate of the examinees ability.
More complex IRT models require other parameters, such as the discrimination of the items (two-parameter model) and the effect of guessing on the
item (the three-parameter model).
43

For comparison of these criteria, see Embretson and Reise [2000].


Model based on the assumptions established by Rasch is useful in the construction of
unidimensional scale, where items constructing such a scale, must have the trace lines of the
parallel form. On the other hand, assuming different (nonparallel and intersecting) traces of
the items, a best solution would be the multidimensional models.
44

83

Item response theory IRT

Some technical aspects of parameters estimation in IRT models


Noteworthy are technical aspects of the parameters estimation in IRT models. Formal nuances and in-depth information can be found in the following works: Hambleton, Swaminathan and Rogers [1991], Wright and Stone
[1979], Lord [1980], Hulin, Drasgow and Parsons [1983], Baker [1992].
For instance, in order to estimate the parameters, one can determine their
estimates by maximum likelihood of the previously constructed likelihood
function [Lord 1980], e.g. let Xj be a binomial variable of value 1 (when the
correct answer to i-th item appears), and the value 0, in opposite direction.
And let Xj be k 1 dimensional response vector of the j-th examinee on the
set of k items, and X = (x1, x2, , xn) will be a matrix of all answers of n people. The probability derived from the currently observed random sample is
expressed by formula [Aranowska 2005]:

P ( X | , ) =

P( X j | , ) = Pi ( j )
j

Xj

1 X j

Qi ( j )

,(2.11)

where: Qi() = 1 Pi(), and are vectors of the constrained, unknown


model parameters.
Such a being case, in the three-parameter logistic model, elements of the
vectors reflect difficulty, discrimination and parameters of guessing process with k items. Formula (2.11) is interpreted as a likelihood function of
L(, | Xj) for and at the given Xj.
A special model of the parameters estimation for items is called the marginal maximum likelihood, and is expressed as follows:

L( | X j ) =

P( X j | , ) f ()d,

(2.12)

where f() is ability function, which may be known or not. In practice it is


assumed that function is equal to the density of the standardized normal
distribution N ( , 2 ) with average = 0 and variance 2 = 1.
Rasch parametric vs. Mokken nonparametric model
In literature appear also non-parametric Mokken models (stochastic models), which led to the developments of further non-parametric IRT models
that differ from Guttmans original cumulative model in the specification of

84

Measurement methodology

their item response function. In Mokkens model, examinee is giving positive


answers on items that have significantly higher than zero probability of positive response to the items less difficult [Mokken 1971]. In short, Mokkens
nonparametric model is an alternative to parametric ability models including responses based on dichotomous items [van Schuur 2011].
Mokken proposed two models [Mokken 1971; Mokken and Lewis 1982;
Mokken 1997]: the model of monotone homogeneity MH, and the model of double monotonicity DM. The DM model has the property of ordinal specific objectivity or invariant item ordering, and can be interpreted as
the ordinal version of the Rasch model. On the other hand, MH model can
be compared to a variant of the Rasch model in which each item has a specific discrimination parameter, which replaces the constant. This model in
the IRT literature is also known as the two-parametric logistic model (2PL
model), or the Birnbaum model [Birnbaum 1968]. In the 2PL model, or
MH model, the item response function of the items with their different
item discrimination parameters will intersect. This implies that the order of
equality or difficulty of the items is not the same for all examinees and item
invariant measurement is not possible (see Figure 8).
The Mokken scaling model of monotone homogeneity makes three fundamental assumptions [van Schuur 2011].
Firstly, there is a unidimensional latent trait (e.g. an ability or an attitude)
on which examinees j J have a scale value j and on which dichotomous items
Pi ( )
1

Latent trait/ability

Figure 8. Four item response functions of monotone


homogenous items that conform 2PL model
Source: based on van Schuur 2011, p.144.

Item response theory IRT

85

i I have a scale value i. Examinees can give a positive (X = 1) or a negative


(X = 0) response to each item. If the scale values of examinee and item are
identical, the probability of a positive response p(X = 1 | j = i) = 0.50. If the
scale value of the examinee is a lower value than that of the item: j < i, then
0.00 < p(X = 1) < 0.50. When it is higher j > i, then 0.50 < p(X = 1) < 1.00.
Secondly, the item response function is monotonically non-decreasing,
that is, the probability of a positive response to an item increases (or at least
does not decrease) with increasing examinee value : for all items i I and for
all values j s we therefore assume that pi(j) pi(s). If all members of aset
of items measure the same latent trait, then the ordering of the examinees (by
their probability of a positive response) should be the same for all items.
Thirdly, responses by the same examinee are locally stochastically independent, which means that responses to two or more items by the same examinee are influenced only by j, the scale value of the examinee on the
latent trait, and not by any other aspect of the examinee or the items. In
consequence we obtain:

P (X = x | ) =

P ( Xi = x i | ),(2.13)
i =1

where X is the vector of responses to all k items, with x as its realization,


and P(Xi = xi|) as the conditional probability that a score of X has been obtained on item. This assumption of local stochastic independence is basic to
most probabilistic theories of measurement, and implies that all systematic
variation in the peoples responses is due only to the examinees locations on
the latent trait. It follows from these assumptions that pis is the proportion
of correct answers to item i by examinees with the sum score s is non
decreasing over increasing score groups s, where s is the unweighted sum
score calculated on the basis of the remaining k 1 items. This is a testable
hypothesis, and it is used in testing Mokkens MH model.
If all these three assumptions hold, then as Mokken [1971] proved, all
pairs of items are non-negatively correlated for all subgroups of examinees
and all subsets of items.
In testing a model of monotone homogeneity (in reference to Mokken
scale with k items), we usually test whether the probability of a positive response to any given item increases with increasing scale values of the examinees. An examinees scale value is based on the rest scores the examinees
sum score on the remaining k 1 items. The k 1 items give rise to k differ-

86

Measurement methodology

ent possible scale values ( 0 (k 1) ), so we can differentiate the examinees


into k different groups, depending on their rest score.
For homogeneity analysis, we apply Loevingers coefficient [Loevinger
1948] by relating the number of model violations observed (denoted as the
number of errors observed or E(obs)) to the number of violations that can
be expected under the model of stochastic independence, denoted here as
E(exp). And the homogeneity of the entire scale, H is defined in terms of the
ratio of the total sum of all errors observed versus expected, or alternatively,
as the ratio of the sum of all pairwise covariances versus the sum of all pairwise maximal covariances [Mokken 1971]:
k

k 1

E(obs)ij

i = j +1 j =1
= 1 k k 1

E(exp)ij

k 1

cov( Xi , X j )

or

i = j +1 j =1

i = j +1 j =1
k k 1

cov( Xi , X j )max

.(2.14)

i = j +1 j

Item coefficients of homogeneity, Hi for item i, are similarly defined as


[Mokken 1971]:
k

E(obs)ij

Hi = 1

j =1
j i
k

E(exp)ij
j =1
j i

cov( Xi , X j )
or

j =1
j i

cov( Xi , X j )max

.(2.15)

j =1,
j i

At last, in the second model, proposed by Mokken, i.e. double monotonicity DM [Mokken 1971], a set of monotone homogeneous itemsI satisfies the condition of double monotonicity with respect to a set of examinees
J, irrespective of their value on the latent continuum, if for all pairs of items
(i, k) I. It holds that if for some examinee with scale value 0, pi(j) < pk(j),
then for all examinees irrespective of their value on the latent continuum,
pi(0) pk(0), where item i is assumed to be more difficult of the two items.
This is the ordinal variant of Raschs requirement of specific objectivity, or
item-independent subject measurement, a model property that increases the
validity of comparisons of scale scores of examinees on the same scale in different data sets.

Item response theory IRT

87

According to the DM model, the order of the manifest probabilities pi


reflects an ordering of the items according to their difficulty that is uniform
across (sub)groups of examinees. The more general model of monotone homogeneity MH does not imply this, so the item response functions of any
two items may intersect because they may increase with different slopes. For
two groups of examinees (one with a lower and one with a higher value than
the scale value indicated by the intersection point) the manifest probabilities
will indicate different orders of difficulty of the items.
There are a few procedures for assessing whether a set of items conforms
to the requirement of double monotonicity. In first procedure, one can test
whether the order of difficulty of the items is the same across subgroups of
examinees (e.g. men vs. women, age groups), using the order of the marginal
probabilities of the positive response for the total sample as a baseline. In
the case of discrepancies from the baseline, a binomial test is carried out
[Molenaar 1973] to determine whether the samples might have come from
populations in which the items have the same marginal probability. If this
hypothesis is rejected, then the model assumption of double monotonicity is
violated for this pair of items in these subgroups of examinees.
Two other procedures, similar to the test of monotone homogeneity,
which are available to compare the order of the probability of the positive
response to pairs of items for groups of examinees that are distinguished by
their sum scores on the remaining items are: the rest score group method and
the rest score splitting method [Sijtsma and Molenaar 2002].
Now a major advantage of IRT (including Mokken as well as Rasch models) over the classical test theory models lies in the model parameters for
items. It explicitly takes into account that the items might differ in equality45.
The advantage of Mokken scale analysis over CTT models lies in the detailed
emphasis on model fit. All coefficients (and therefore all pairwise correlations) must be positive, and each item must be sufficiently homogeneous
with the others. These requirements lead to the development of scales that
conform to higher standards of reliability and homogeneity than scales that
have been inspected only in a standard reliability analysis [van Schuur 2011].
Another advantage of the Mokken models is its bottom-up hierarchical clustering search procedure that identifies a maximal subset of homogeneous items. Especially in exploratory research aimed at developing new
measurement instruments, this procedure helps the researcher to detect new
45

Measurement models which are defined by CTT theory and which pertain to scales development, assume that all items are more or less equal, i.e. have the same distribution.

88

Measurement methodology

candidates for latent variables, even when only a limited number of items are
available [van Schuur 2011].
Finally, Mokkens solution can successfully be used on small numbers of
items. However, Molenaar [1997] showed that when the number of items is
relatively small, the results of a Mokken scale analysis and the more stringent
Rasch scale analysis often lead to essentially the same results.

Dichotomous vs. polytomous item response theory models


Because measurement with multiple response options also exist in item response theory, their use in the research is becoming more prevalent. Such
items may include the ubiquitous Likert-type items, as well as ability test
items that provide partial credit for partially correct answers, portfolio assessment test formats, and even multiple-choice items when each response
option is scored separately.
Polytomous IRT models operate quite differently from dichotomous
models. Here, knowledge of the characteristics of one of the response category functions does not determine the characteristics of the other category
functions, and each category function therefore must be modeled explicitly.
The non-determinate nature of the category response functions is that they
are no longer exclusively monotonic functions. Hence, in the items with ordered categories, only the functions for the extreme negative and positive
categories are monotonically decreasing and increasing respectively.
1

0.5

0
4

Figure 9. Category boundary response functions


for a five-category polytomous item
Source: Ostini and Nering 2006, p.10.

Item response theory IRT

89

As shown in Figure 9, the function for the second category rises as the
probability of responding in the most negative category decreases, but only
up to a point, at which time it decreases as the probability of responding in
the next category increases. Ultimately, the next category is the extreme
positive category, which has a monotonically increasing function.
The presence of non-monotonic functions uncovers specific problems.
Such functions can no longer be described in terms of a location and a slope
parameter. Actually selecting the appropriate mathematical form and subsequently estimating parameters for such unimodal functions is a significant challenge. Fortunately, in case of ordered polytomous items, a solution
to this problem has been found by treating polytomous items essentially
as concatenated dichotomous items. Multiple dichotomizations of item response data are combined in various ways to arrive at appropriate response
functions for each item category. In fact, the different ways in which the
initial dichotomizations can be made and different approaches to combine
dichotomizations result in a variety of possible polytomous IRT models. In
addition, different types of polytomous items require, or allow for different
features to be incorporated into an applicable model. The result is a range
of possible polytomous models that far outstrips the number of available
dichotomous models [Ostini and Nering 2006]. For example, polytomous
item formats may refer to the following types of models:
the graded-response model,
the modified graded-response model,
the partial credit model,
the generalized partial credit model,
rating scale model,
the nominal response model
We will not discuss all of them, but will focus briefly on polytomous models for items with ordered categories, where the response categories have an
explicit rank ordering format with respect to the trait of interest. ALikerttype
attitude item may be the example of ordered polytomous items. Responses to
such items are referred to as graded responses in the literature. Here, polytomous items are categorical items which can be treated in the same way as
dichotomous items, however, they have more than two possible response categories. Categorical data can be described effectively in terms of the number
of categories into which the data can be placed. And ordered categories are
defined by boundaries or thresholds that separate the categories. Logically,
there is always one less boundary than there are categories. Thus, for example,
a dichotomous item requires only one category boundary to separate the two

90

Measurement methodology

possible response categories. And, a 5-point Likert-type item requires four


boundaries to separate the five possible response categories.
In sum, polytomous items appear to be better because they measure
awider range of the trait continuum than do dichotomous items. This occurs
simply by virtue of the fact that polytomous items contain more response
categories than do dichotomous items46. So the advantage of polytomous
items is that, by virtue of their greater number of response categories, they
are able to provide more information over a wider range of the trait continuum, than dichotomous items [Ostini and Nering 2006]. Kamakura and
Balasubramanian [1989, p. 514] suggested that dichotomous distinctions
are often less clear and that more subtle nuances, i.e. more complex response formats are needed than dichotomous items. Similarly, Cox [1980]
noted that items with two response alternatives are inadequate because they
cannot transmit much information and they frustrate examinees.
The polytomous data has additional consequences for statistical data
analysis. Wainer [1982] points out that responses to polytomous items can
be thought of as data distributions with short tails. This may affect statistical procedures, such as obtaining least squares estimators, which rely on
assumptions of a Gaussian distribution. Rather than simply proceeding as
though the data met the assumptions, a better approach, according to Wainer, is to use procedures designed specifically for this type of data, especially
relevant IRT models.
Prior to IRT models, there are two most common methods for dealing
with polytomous data, i.e. Thurstone and Likert scaling. The former, Thurstone scaling is similar to IRT in that sense it scales items on an underlying
trait using a standardized scale [Thurstone 1927, 1929, 1931]. However, to
achieve Thurstone scaling, one must either assume that the trait is normally
distributed in the population of interest. On the other hand, Likert [1932]
showed that a simple summated rating procedure would produce results
equal to or even better than Thurstones method. The simplicity of the approach meant that Likert scaling was widely adopted as the method of choice
for rating data such as those used in attitude measures [Hulin, Drasgow and
Parsons 1983].
46

Of course all items, whether dichotomous or polytomous, measure across the entire
range of the trait continuum, from negative to positive infinity. The amount of measurement
information provided by an item is peaked above the trait scale location of that item and then
drops, often rapidly, at higher and lower trait levels. Paradoxically, the more information an
item provides at its peak, the narrower the range of the trait continuum about which the item
provides useful information.

CTT and IRT some differences

91

CTT and IRT some differences


Classical test theory CTT (known as old rules) differs in several aspects
from IRT representing new rules [Lord and Novick 1968; Mokken 1971,
1997; Reckase 1979, Lumsden 1976; Hulin, Drasgow and Parsons 1983; Guion and Ironson 1983; Harris and Sackett 1987; Embretson 1996; Tarka 2011].
They are discussed below and some of them are summed up in Table5, p. 94.
First aspect relates to interchangeable test forms, that is, when examinees
receive different test forms, and when some type of equating procedure is
needed before their scores can be compared. In traditional CTT, equating
means that the different test forms should be essentially equal. Gulliksens
[1950] classic text defined strict conditions in CTT, which included the equality of means, variances, and covariances across tested items and forms. In
Gulliksens point of view [1950], if the two tests meet the statistical conditions
for parallelism, then scores may be regarded as comparable across forms.
Thus, substantial effort must be devoted to procedures for test equating. More
recent extensions of CTT have considered the test form equating issue more
liberally, as score equivalencies between forms. Several procedures have been
even developed in equating tests with different item properties, such as linear
equating and equipercentile equating47. These methods are used in conjunction with various empirical designs such as random groups or common anchor items (see Angoff [1982], for a summary of such methods).
In contrast, the IRT version of so-called equating follows directly from
the IRT model assumptions, which implicitly controls for item differences
between test forms. The constructed measurement instrument (scale) is developed through the cumulative character of particular items. Finally, a better estimation of trait levels for all examinees is obtained from administering
different test forms. More accurate estimation of each examinee means that
score differences are more reliable. Hence, the new rule means that non
parallel test forms (that differ substantially and deliberately, in difficulty
level from other forms) yield better score comparisons.
In the next aspect, i.e. standard error of measurement, difference appears on whether the standard error of measurement is constant or v ariable
47

For a simplified example, suppose that both test forms could be given to the same group
with no carry-over effects. A very simple linear equating would involve regressing scores from
one test form to the other test form. Score equivalencies between the test forms are established
by using the regression equation to predict scores. This type of method can be applied to test
forms that have different means, variances, and even reliabilities [Embretson 1996, p.344].

92

Measurement methodology

among the scores in the same population. In CTT at standard error of measurement, the constancy is specified, whereas in IRT variability is considered.
Besides, measurement is different whether the standard error is specific or
general across populations. In CTT it is rather populationspecific, whereas IRT it is population-general. Moreover, if we estimate standard error in
IRT we assume that the relationship between trait score and raw score is
nonlinear, and the confidence interval band becomes increasingly wide
for extreme scores. Unlike CTT, neither the trait score estimates, nor their
corresponding standard errors depend on population distributions. In IRT,
trait scores are estimated separately for each score or response pattern, controlling for the characteristics (e.g. difficulty) of the items that are administered. Standard errors are the smallest when items are optimally appropriate
for a particular trait score level and when item discriminations are high
[Embretson 1996].
As far as the test length and reliability is concerned, in IRT, shorter tests
are more reliable than in CTT. In CTT (having based, for example, on the
SpearmanBrown formula) test is lengthened by factor of n parallel parts,
and the true variance increases more rapidly than error variance. Thus,
in CTT, shorter tests generally imply increased error and more unreliable
measurement.
According to next aspect, i.e. unbiased estimation of item profiles, in
CTT unbiased assessment of item properties depends largely on representative samples from the target population. Assessment the classical item statistics of item difficulty (i.e. p values as the proportion of passing) and item-total
correlations (biserial correlations) yields non-comparable results if they are
obtained from unrepresentative samples. In IRT unbiased estimates of item
properties may be obtained from non-representative samples.
Differences appear also in establishing meaningful scale scores. In CTT,
meaningful scale scores are obtained by standard scores and in IRT they
are obtained from IRT trait score estimates. Embretson and DeBoeck [1994]
noted that test score meaning depends on specifying an appropriate comparison. A comparison is deemed by two features: the standard with which
a score is compared and the numerical basis of the comparison (order, difference, ratio, etc.).
In CTT, score meaning is determined by a norm-referenced standard, and
the numerical basis is order. That is, scores have meaning when they are
compared with a relevant group of people for relative position. To facilitate
this comparison, raw scores are linearly transformed into standard scores
that have more direct meaning for relative position. However, an objection

CTT and IRT some differences

93

that is often raised to norm-referenced meaning is that scores have no meaning for what the examinee actually can do48.
In IRT, a score is compared with items, e.g. examinees and items are calibrated on a common scale. The match between trait level and item difficulty
has direct meaning for expected item performance. The probability that an
examinee passes a particular item is derived from the match of item difficulty to trait level49.
Differences between CTT and IRT theory appear also in establishing
scale properties, as it is in case of interval scale. In CTT, interval scale properties of measures are achieved by selecting items to achieve normal raw
score distributions. In IRT, interval scale properties are achieved by justifiable measurement models.
Routine test development procedures for many social research problems
include selecting items to yield normal distributions in a target population.
Even if normal distributions are not achieved in the original raw score metric, scores may be transformed or normalized to yield a normal distribution.
These transformations change the relative distances between scores. In consequence, score distributions have implications for the level of measurement
that is achieved. Thus, only linear transformations preserve score intervals
as well as distribution shapes. If raw scores are normally distributed, then
alinear transformation (such as a standard score conversion), will preserve
score intervals to appropriately estimate true score. However, scale properties are tied to a specific population. So, if the measurement is applied to an
examinee from another population, can the interval scale properties still be
justified? If not, then scale properties are population-specific.
In IRT models, i.e. particularly the Rasch model, interval scale properties are achieved to some other extent. The Rasch model has been linked
to fundamental measurement because of the simple additivity of the parameters [Andrich 1988]. A basic tenant of fundamental measurement is
48

In some tests, IRT trait levels are also linked to norms. In this case, IRT scores are linearly
transformed to standard scores. Thus, IRT trait levels also have norm-referenced meaning.
49
Just to remind here, as in psychophysics, an item is at the examinees threshold when
the examinee is as likely to pass as to fail the item. When an item difficulty equals the examinees trait level (e.g. in the Rasch model), then the examinees probability of failing equals the
probability of passing, or, stated in another way, the odds are 50/50 for passing versus failing.
Thus, analogous to psychophysics, the item falls at the examinees threshold. If the examinees
trait level exceeds the item, then the examinee is more likely to pass the item. Conversely, if an
examinees trait level is lower than item difficulty, then the odds are more favorable for failing
the item.

94

Measurement methodology

Table 5. Rules of the measurement in classical theory of measurement


and item response theory
Theory

Rules

Classical theory The standard error of measurement applies to all scores in a particuof measurement
lar population
Longer tests are more reliable than shorter tests
Comparing test scores across multiple forms depends on test parallelism, adequate equating
Unbiased assessment of item properties depends on representative
samples from the population
Meaningful scale scores are obtained by comparisons of position in
ascore distribution
Interval scale properties are achieved by selecting items that yield
normal raw score distributions
Item response
The standard error of measurement differs across scores, but generaltheory
izes across populations
Shorter tests can be more reliable than longer tests
Comparing scores from multiple forms is optimal when test difficulty levels vary across examinees
Unbiased estimates of item properties may be obtained from unrepresentative samples
Meaningful scale scores are obtained by comparisons of distances
from various items
Interval scale properties are achieved by justifiable measurement
models, not score distributions
Source: Embretson 1996.

additive d
ecomposition [Michell 1990], in that sense, two parameters are
additively related to a third variable. Hence in the Rasch model, additive decomposition is achieved. In this decomposition, interval scale properties will
hold if the laws of numbers are applied. Specifically, the same performance
differences must be observed when trait scores have the same inter-score
distances, regardless of their overall positions on the trait score continuum.
In sum, IRT theory allows the comparison of the performance of examinees who have taken different tests. It also permits us to apply the results of
an item analysis to groups with different ability levels than the group used
for the item analysis [Crocker and Algina 2008]. The IRT models overcome
the fundamental limitation of the CTT model, namely the assumption of
parallel items positions and its reliability dependence on the characteristics
of the sample. Thus, a scale remains more or less insensitive to sample in

CTT and IRT some differences

95

the course of assessment of discriminatory power of individual items. Scale


is independent of the item position when the examinees characteristics
are evaluated. In this situation we deal with a rule of specific objectivity of
measurement [Tarka 2013a]50.
Finally, both classical test theory and item response theory uses the items
(composing respective scale) which may have polytomous character. However, in contrast to CTT models, in IRT models, scale is composed with
items which include a variant of the polytomous and dichotomous items.
The key distinction refers also to the parallelism of items, which plays a significant role in CTT models, and cumulative distribution of the items which
is important in case of IRT models.
50
On the other side, IRT theory has some serious disadvantages too, mainly related with
[Stobiecka 2010]:
inconvenience in the scale construction (in many cases, it is difficult to determine which
items should be considered in line with the key (or a set of correct answers in the test),
inconvenience of scale for examinees who must respond both to easy and difficult
items,
complexity of the mathematical calculations (despite the use of statistical packages) that
can be a problem for many researchers,
costs associated with IRT use.

III. SELECTED ISSUES ON SCALES


CLASSIFICATION AND SCALING

The link between measurement and scaling


The measurement covers all activities related to the numerical representation of a variable using certain measures1. According to Ferguson and Takane [1999, p.29]: a variable specifies the property, in terms of the group
elements that differ from each other. Similarly, a variable is defined by Kerlinger [1964, p.32] who stated that variable is a property trait that has
different values2. Brzeziski even compares a variable with level of organism
activation. He believes that variable reflects some trait, or a property owned,
to varying degree, by each human. In other words, the activation level of the
organism takes different values for
different people [Brzeziski 1978].
Measurement is contained within a theoretical framework called scaling
theory which focuses on rationales and mathematical techniques in determining numbers which are used to represent different amounts of a property
being measured. A scaling rule establishes a link of correspondence between
the elements in a data system and the elements in the real-number system3.
1

Though, this form of expression of the measurement is simplified here.


Additionally, Ostasiewicz [2003] drew attention to fact that young researchers often
confuse terms such as trait and variable, which means, they simply mix up the meaning of the
trait and variable. He highlighted, that the first term belongs to the world of things and the
second is one of the fundamental concepts in the mathematics and related sciences.
3
For example, real-number system is comprised of zero and all possible integer and decimal values between negative and positive infinity. Such system can be graphically presented
by the real-number line, a single continuum which can be infinitely divided into smaller and
smaller segments. Every value in the real-number system has a unique location on this continuum.
2

The link between measurement and scaling

97

A system can be defined as a collection of elements or objects that share


a common property. We use a term, data system in order to refer to acollection of all possible observations of a given property for a set of objects.
Suppose, e.g. we have a set of objects, that differ observably in length. We
can conceive a variety of the scaling rules which might be used to relate the
observable lengths in the data system with the values derived from the real
number system. For example, objects might be arranged in order, in the
line, from shortest to longest with a number assigned to each object based on
its position. The shortest object would be assigned 1, the next shortest would
be assigned 2, and so on. The objects could also be compared to a standard
unit such as a foot ruler with assigned numbers based on their lengths in
feet. When a scaling rule is specified, and a number has been assigned to
each element of the data system, these numbers are called scale values.
Scaling techniques (i.e. procedures for assigning numbers to analyze objects) have been, among others, initiated by Fechner [1860], or actually even
earlier, since some measurement issues (in particular, concerning the investigations in psychology) were considered as early as the seventeenth century.
This development of scaling techniques went generally into two directions
[Nowakowska 1975]. One primary direction was begun in the twenties by
Thurstone [1927] which techniques were based on the ordering scheme.
The second direction refers to Stevenss works [1946, 1951, 1966] who introduced the method for direct quantification of the objects by replacing
methods in which the tested object produced a stimulus corresponding to
the predetermined value.
As Nowakowska argued [1975, pp. 246247], each such a technique is
based on the assumption, that there exists a specific type of scale, or is based
on a number of specific assumptions which define the relationships between
values of the trait and the observed empirical data. In other words, we postulate the presence of scale and also make assumptions which describe on
how empirical data are generated for such or some other forms.
Noteworthy is also fact, that the success in application of any scaling technique (i.e. as consequence of using a particular procedure), logically does
not make up sufficient argument that, the presence of specific scale is 100%
fulfilled. In fact, the attempts undertaken in searching for different methods
to verify this assumption led to the greater development of the measurement
theory.

98

Selected issues on scales classification and scaling

Types of scales according to measurement levels


Scale enables to place the measured objects on predefined calibration4. We
usually accept four levels of the scales, which were classified in the early
1940s, by the Harvard psychologist Stevens. He coined the following terms:
nominal, ordinal, interval and ratio to describe a hierarchy of measurement
scales [Stevens 1951]. This classification had a tremendous contribution to
measurement theory and was subsequently adopted by several important
statistics textbooks and has influenced the statistical reasoning of ageneration5 [Gaito 1960].
The four types of scales, i.e. nominal, ordinal, interval and ratio have following characteristics:
distinctiveness where different numbers are assigned to e.g. examinees
who have different values of the property being measured,
ordering in magnitude where larger numbers represent more of the
property being measured than smaller numbers,
equal intervals where equivalent differences between measures represent the same amount of difference in the property being measured,
absolute zero when measurement of zero represents an absence of the
property being measured [Allen and Yen 1979].
Nominal measurement has only the characteristic of distinctiveness. It
does not reflect ordering in magnitude, equal intervals, or an absolute zero.
A nominal scale of objects can be obtained in a straightforward manner.
Examinees can be asked to categorize or sort objects into mutually exclusive
and exhaustive sets. Sets are mutually exclusive if each object can be sorted
into only one set. For example, male and female are mutually exclusive
categories. Also sets are exhaustive if every object can be classified in a set. If
we are classifying cars and the category Ford is left out, the sets would be
exhaustive categorization of the cars.
Unlike, ordinal measurement assigns higher numbers to examinees who
possess more of the property being measured. The most common ordinal
measurement is the rank order. Ordinal scales of objects can be obtained by
having people to rank order the objects in terms of some property. Those
4

In Latin dictionary, scalae means a ladder.


Some authors adopted these ideas [Blalock 1968; Siegel 1956], perhaps because they appear to provide simple guidance and protect naive data analysts from errors in applying statistics.
5

Types of scales according to measurement levels

99

objects that are ranked higher are assigned higher numbers on the scale.
Similarly, people can be rank ordered by their total score on some task6.
The interval level of measurement has both the characteristics of distinctiveness, ordering and equal intervals. There are many methods to obtain
interval scales. One of these is through direct estimation, in which people
are asked to assign numbers to stimuli or differences between stimuli according to some specified property of stimuli. For example, examinees may
be given pairs of names of breakfast cereals and be asked to judge how many
more calories cereal A has than cereal B. Scale values for the stimuli usually
are taken to be the mean or median of the scores obtained when many examinees are tested.
Finally, the ratio level of measurement has all four characteristics. This
type of scale can be obtained using the method of direct estimation. People
are asked to assign numbers to stimuli or to ratios of stimuli. The fit of a ratio-scaling model can be examined in a manner similar to that described for
the construction of ordinal scales using direct estimation. This type of scale
is the most perfect and the most powerful of all other listed above scales. It
eliminates the limitations that arise when interpreting the results expressed
in nominal, ordinal and interval scales.
These four scales are presented in Table 6.
In the literature, we also find some other classifications of the scales. As
Coombs, Dawes and Tversky [1977, pp. 3839] explained: if types of scales
are defined through a set of admissible transformations, there is an infinite
number types of scales corresponding to an infinite number of transfor 6

Also ordinal scales (according to their appropriate scaling rules) can be produced by:
s orting techniques, where people are given a stimuli (such as occupational titles, pictures)
and are asked to sort them into piles representing different levels along some specified
dimension (such as prestige, attractiveness),
paired comparisons, which involves asking people to choose which object in each of aseries of pairs of objects has more of a particular characteristic,
rating scales, which are used to produce ordinal scales and which typically involve having people to indicate their opinion, beliefs, feelings, or attitudes in some manner,
Guttman scalogram analysis, that produces an ordinal scale of items and examinees
[Guttman 1944],
coombs unfolding approach, where the observations being analyzed, are the peoples
rank orderings of their preferences for or proximities to a set of stimuli along one dimension. For example, people may be asked to rank order a set of personal values in
terms of proximity or similarity to their own. If the scaling is successful across people,
an analysis of the responses of all the people will produce a consistent scaling of stimuli
and people.

100

Selected issues on scales classification and scaling

Table 6. Types of measurement scales according to Stevens


Characteristic

Level of measurement
ordinal
interval
*
*
*
*
*

nominal
*

Distinctiveness
Ordering in magnitude
Equal in intervals
Absolute zero

ratio
*
*
*
*

Cells marked with stars (*) describe the possible relationship between the characteristics and the
levels of measurement.
Source: Allen and Yen 1979.

mations. In this understanding, we obtain an absolute scale7 [Suppes and


Zinnes 1963], or intermediate types of scales, such as the scale of partially ordered and ordered metric scale [Coombs 1950]. The scale of partially ordered
is placed between nominal and ordinal scale, and ordered metric scale (in
terms of power) is located between ordinal and interval scale.
Another yet quite interesting concepts of measurement levels by different authors) were summarized by Sagan [2003] for their comparison, see
Table 7.
Table 7. The concepts of the measurement levels
Approach to
measurement

Authors
Stevens

Nominal
Level of
measurement Ordinal
Interval and
Ratio
-

Kingston

Idea
Observation
Comparison
Measurement
Relation

Mynarski

Buchler
Qualitative
imagining
Identification
Qualitative
thinking
Categorization
Qualitative
object
Sorting and
Quantitative
comparison
index
Distances, proQuantitative
portions, clusters measure
Structural relaQuantitative
tions, models
relations

Source: Sagan 2003, p. 24.


7
Absolute scale is type of ratio scale, which in addition to the natural zero point, possesses natural units rather than arbitrary ones. Absolute measurement is based on the simplicity of
counting, where numbers are assigned to objects, which make up respective sets.

Admissible transformations on scales

101

Admissible transformations on scales


Scales, according to Stevenss assumptions, reflect a set of the admissible
transformations. There is no doubt that scales should not be changed in
way that would disrupt their scores representation, that is, their real values.
Transformations associated with particular type of scale that are admissible (where the transformations maintain the correct representation in scale
scores) are discussed by Allen and Yen [1979]8. They claim, for example, that
in nominal scale, the unique numbers appear which are assigned to distinct
objects. These numbers can be changed in any way as long as the numbers
assigned to the distinct objects remain different.
Unlike, in ordinal scale, mostly monotonic transformations are allowed,
which do not affect the relative order of the scale values (e.g. adding a constant or multiplying by a positive number). When a scale is assigned with
numbers, then it can be transformed in any way as long as the correct ordering of the scale numbers is preserved. For example, if the scale numbers are
1, 2, and 3 are assigned to a set of objects, the numbers of 5, 10, 11 or of 1,
22.5 and 1003 would produce the same order and would be admissible.
For interval level of measurement a linear transformation Y = aX + b,
of interval scale is admissible, where a and b are constants, Y is a new scale
value, and X is original scale value. In order to preserve the original ordering
of objects, the constant a must be greater than 0. A linear transformation
within this type of scale doesnt alter the ratio of distances between numbers
of scales. For example, suppose three objects: L, M, and N, are measured
on an interval scale and are given scale numbers 1, 2, and 4. The distance
between the scale numbers for M and L is 2 1 = 1, and the distance from
N and M is 4 2 = 2. The ratio of the distance from M to L to the distance
from N to M is 1/2. If a new scale Y is created from the original scale X, by
the linear transformation Y = 2X + 10, then the new scale scores for the three
objects are 12, 14, and 18, respectively. The ratio of the distance from M to
L to the distance from N to M is (14 12)/(18 14) = 1/2 the linear transformation did not alter the ratio of the distances between the scale values.
Finally, in ratio scales the transformation Y = ax (with a greater than0
to preserve the original ordering of the objects) will be admissible with multiplication by a constant. For instance, if two objects are measured on the
8
Netemeyer, Bearden and Sharma [2003] argue, that scales, which preserve their meaning under a wide variety of transformations in some sense convey less information than those
whose meaning is preserved by only a restricted class of transformations.

102

Selected issues on scales classification and scaling

ratio scale and are given numbers of 3 and 9, the ratio of the scale values
of the two objects is 3/9 = 1/3. If a new scale is formed by the transformation Y = 5X, the new scale scores are 15 and 45. Their ratio remains 1/3; the
second object still has a scale score three times as large as the first object.
Note that, for a ratio scale, a linear transformation involving an additive constant is not admissible. For example, the transformation Y = 5X + 5 produces
scores of 20 and 50, which do not have the ratio 1/3.

Criticism of the Stevenss scales in relation to statistical data


analysis
Stevens in his article, Mathematics, measurement and psychophysics [1951],
classified not just simple operations, but also added statistical procedures
according to the types of scales for which they were permissible. Just to
remind here, the analysis on nominal data should be limited to summary
statistics such as the number of cases, the mode, and contingency correlation,
which require only preserving the identity of the values. Permissible statistics for ordinal scales should include all these above, plus the median, percentiles, and ordinal correlations, that is, statistics whose meanings are preserved
when monotone transformations are applied to the data. In case of interval
data we can also add means, standard deviations (although, not all common
statistics computed with standard deviations), and product moment correlations, because the interpretations of these statistics are unchanged when
linear transformations are applied to the data. Finally, ratio data allows to
use all of these plus geometric means and coefficients of variation.
There is no doubt that Stevenss classification of scales had a great practical and theoretical value. However, one may feel that the application of
this classification has led to some misunderstandings in regard to the use
of various statistical techniques. The choice of statistical technique, in Stevenss approach to classification of the scales, to be used in the analysis depends heavily on a set of measurement properties which in fact are related
to the level of measurement. And most of the arithmetic computations can
be made on the interval or ratio level of measurement, because they involve
taking differences among scores or sums of scores. The results of arithmetic computations based on ordinal measurement should be interpreted with
great care. Certainly, they should be strongly avoided at nominal level of
measurement [Allen and Yen 1979].

Criticism of the Stevenss scales in relation to statistical data analysis

103

As Velleman and Wilkinson [1993, p.1] stated: the use of Stevenss categories in selecting or recommending statistical analytical methods is sometimes inappropriate and can often be misleading. They do not describe the
attributes of real data that are essential to good statistical analysis, nor do
they provide a classification scheme appropriate for modern data analysis
methods9.
Criticism of Stevenss work has focused on three points. Firstly, restricting the choice of statistical methods to those that exhibit the appropriate invariances for the scale type at hand, is a dangerous practice for data analysis.
Secondly, his taxonomy is too strict to apply to real-world data. Thirdly, Stevenss prescriptions often lead to degrading data, e.g. by rank ordering and
unnecessarily resorting to nonparametric methods10. Lord [1953] attacked
Stevenss arguments by showing that the choice of permissible statistical tests
for a given set of data does not depend on the representation or uniqueness problems, but is concerned instead with meaningfulness. Lord argued
that the meaningfulness of a statistical analysis depends on the question it is
designed to answer. Also Baker, Hardyck and Petrinovich [1966]; Borgatta
and Bohrnstedt [1980] pointed out that Stevenss prescriptions often force
researchers to rank order data and thereby forsake the efficiency of parametric tests11.
On the other hand, Guttman [1977] argued more generally that the statistical interpretation of data depends on the question asked of the data and
on the kind of evidence we would accept to inform us about that question.
He defined this evidence in terms of the loss function chosen to fit a model.
9

Some of these points were raised even at the time of Stevenss original work. Others have
become clear with the development of new data analysis philosophies and methods.
10
For instance, Stevens classification has led to an overemphasis on the utility of nonparametric techniques in social research (e.g. in psychology, sociology, or marketing). As the
example of this tendency Siegel [1956] maintained that nonparametric tests of significance
should be used with subinterval type data. He also listed the requirements of interval scale as
one of the assumptions for the use of the analysis of variance. However, this assumption cannot be found if one looks to the mathematical bases of assumptions [Eisenhart 1947]. The important consideration for the use of the analysis of variance is not that the data have certain
scale properties but that the data can be related to the normal distribution, plus approximating the other assumptions of independence and homogeneity of errors.
11
Their arguments relied on the Central Limit Theorem and Monte Carlo simulations to
show that for typical data, worrying about whether scales are ordinal or interval doesnt
matter. Their arguments were somewhat ad hoc, and they unfortunately ended up recommending standard parametric procedures rather than dealing with robustness issues. Nevertheless,
they highlighted deficiencies in Stevenss discussion of permissible arithmetic.

104

Selected issues on scales classification and scaling

Tukey also considered Stevenss proposals as dangerous to good statistical


analysis. Like Lord and Guttman, Tukey noted the importance of the meaning of the data in determining both scale and appropriate analysis. Because
Stevenss scale types are absolute, data that are not fully interval scale must
be demoted to ordinal scale. He argued that it is a misuse of statistics to think
that statistical methods must be similarly absolute [Tukey 1961, pp.245246].
Luce [1959, p.84] said even that: the Stevenss scales place limitations upon
the statistics one may sensibly employ. If the interpretation of a particular
statistic or statistical test is altered when admissible scale transformations are
applied, then our substantive conclusions will depend on which arbitrary representation we have used in making our calculations. Most scientists feel that
they should shun such statistics and rely only upon those that exhibit the appropriate invariances for the scale type at hand. Both the geometric and the
arithmetic means are legitimate in this sense for ratio scales, only the latter is
legitimate for interval scales, and neither for ordinal scales.
Many of the discussions of scale types treat them as absolute categories.
Data are expected to fit into one or another of the categories. A failure to attain one level of measurement is taken as a demotion to the next level. However, real data do not follow the requirements of many scale types. Tukey
[1961] pointed out that when measurements that ought to be interval scale
are made with systematic errors of calibration that depend upon the value
measured (as can often happen), the resulting values are not truly on an
interval scale. The difference of two measured values at one end of the scale
will not be perfectly comparable to a difference of measurements at the other
end of the scale. Yet when the errors are small relative to the measurements,
we would sacrifice much of the information in the data if we are forced to
demote them to ordinal scale.
Measurement theory is important to the interpretation of statistical analyses. However, the application of Stevenss typology to statistics raises many
subtle problems. It would seem that the comments concerning the relation
between the various scales and the appropriate statistical procedures should
serve only as a guide. In some cases, empirical data may indicate that rigid
adherence to such statistical methods is not required and wasteful of the
data. At present, it can be even more dangerous, if the statistical programs
(based on Stevenss typology) suggest that doing statistics is simply a matter of declaring the scale type of data and picking a model. Worse, these
programs assert that the scale type is evident from the data independent of
the questions asked of the data. They, thus, restrict the questions that may be
asked of the data and such restrictions lead to bad data analysis.

Attitudes and preferences scales underlying classification

105

Attitudes and preferences scales underlying classification


Historically, former techniques of constructing scales for the attitudes measurement were developed by Thurstone and Likert [Thurstone and Chave
1929; Likert 1932]. Attitude as the term was defined on the basis of works
conducted by psychologists and sociologists. First time, it was introduced to
the literature by Thomas and Znaniecki [1918, p.21], to describe processes
of the human consciousness and determine either actual or projected reactions to social world. Thomas and Znanieckis work began a serious discussion over the possibility of measurement of attitudes and simultaneously
their differentiation. For example, Nelson [1939] has distinguished 23 ways
of using or defining the attitudes. Such theoretically various ranges of attitudes were due to the complexity of phenomena which pertained to the
attitudes measurement. At last, a full clarification of the attitudes was successfully conducted by Smith [1991], who extracted three components in
context of emotions, cognitions, and behavior12.
As far as the attitude scales are concerned, they by custom reflect a collection of items to which an examinee provides responses that are indicative of
the attitude. So, when we construct a scale, we need to choose a set of representative items making up a subset, based on which, the attitude scale is
constructed. The use of such a scale in the study of human attitudes is associated with the possibility of their quantification and also finding a location of
the examinees response on the latent continuum of the measured attitude.
In the analysis of attitudes, we may use rating scales (uncomparable or
monadic scales)13. Theoretically, they are situated at the level of an ordinal
scale. However in practice, they are often treated as interval scales [Sagan
2004]. This troublesome question arises due to many contradictions between
different researchers and different methodological schools. For instance,
Brzeziski [1978] argues, that rating scales belong to ordinal scale, however,
Kowal [1998] claims that they are also part of interval measurement.
12

Katz and Scotland [1959] have defined the concept of attitude considering that it is atendency or predisposition to evaluate some object or symbol of an object in a certain way.
13
In case of the uncomparable scales (according to Stobiecka) which range, e.g. from 5 to
9 points on scale, we can doubtfully claim, that they should be treated as metric ones. Some
researchers even ask themselves, whether a difference between categories such as: I extremely
like it and I very like it is similar to the difference between I very like it and I like it on average,
and whether the statistical analysis that is based on parametric methods (in reference to these
differences) is appropriate.

106

Selected issues on scales classification and scaling

In literature, the most popular classification of rating scales was proposed


by Guilford, who distinguished the numerical, graphical14 and summated
scales. He also introduced the itemized scales and comparative scales, the
same as Churchill did.
Some other yet exhausting classification was proposed by Brzeziski, who
classified rating scales according to three criteria: 1) a way the scale is presented (scales based on categories and graphical scales); 2) ways of categories
description (scales with described marginal points on scale, and scales with
all points described on scale); 3) character of the scale (unipolar and bipolar
scales).
On the opposite side of attitude scales, we find scales that are based on
preferences. These are closely related to measurement models of attitudes,
however, in preferences we actually do not put much emphasis on the explanation of the structure of attitudes. Preferences and their directions are
rather related to considered objects that are the focus. The goal of preference scale is to organize and order the objects. Scales in this group are
used to determine factors such as the importance of the characteristics of
examinees.
In literature we can distinguish three general scales which measure preferences, such as: 1) ranking scale, 2) constant sum scale, or 3) comparative
scale with indicated representative point. In the latter, examinees task is
to divide 100 points (or percentage 100%) between different items or sets
of similarly grouped items. In the first row, measurement is undertaken on
ordinal scale, while the second stage requires measurement based on ratio
scale [Walesiak 1996].
In third option, i.e. comparative scale with indicated representative point,
the number of points allocated to so-called perfect a reference point, are
assigned by the researcher (for example: 1, 10 or 100 points). The examinee
is then asked to identify the most important point, which becomes a role
model [Stobiecka 2010].
Among the comparative types of scales, where a measurement of preferences is carried out in social studies, including marketing science are:
ranking scale and ordinary scale of pairwise comparisons. Status of metric
scale can hold only the constant sum scale (both its comparative and ranking
14

Numerical scale is defined by a set of categories describing various points on continuum. For certain categories there are a priori assigned numbers, from smallest to largest or vice
versa. The distances between categories are generally equal, although there are sometimes rare
exceptions to this rule [Brzeziski 1978]. And graphical scales are based on descriptive categories to be assigned.

Scaling on summated, cumulative and comparative scales

107

form). In contrast, the comparative scale with indicated representative point,


holds only status of ordinal scale. In the latter case, it is due to the number of
points used on the scale.

Scaling on summated, cumulative and comparative scales


Because every human has the unique personal properties and may react differently to external stimuli (e.g. through the measured items on the scale)
hence they are function of the traits of that human and the stimuli which
affect. In consequence, we assume that differences in reactions appear due to
differences in personal traits, i.e. when examinees are scaled and when they
are placed on some numerical scale. On the other side, we can assume that
variation of reactions is due to the differentiation of stimuli which are scaled.
This task is performed by examinees. Third approach to scaling refers to variation in the reactions of the examinees which are measured, both through
individual traits as well as stimuli [Ostasiewicz 2003]. These three aforementioned approaches for scaling are corresponding to three different types of
scales: 1) summated scales, 2) cumulative scales, 3) comparative scales.

Summated scales
Summated scales are developed by series of questions from which answers
are given in verbal form and which are coded numerically. After that, all
collected verbal response numeric codes are summarized. Total (summated)
score of all points represents the sum of the partial scores where points are
assigned to particular positions (in the items) of the scale.
The most popular type of summated scale is the Likert one. It consists of
the number of parallel items that are indicators of the measured latent trait.
Questions are formulated in the form of statements15 or declarative sentences with which examinee can agree or disagree. They are used to determine particular aspects, e.g. the general examinees attitude towards a given
issue16. In preparation of statements, Likert suggested to use for each one
15

All statements included in the scale make up rather an approximation of the imperfectnormal and interval distribution of the measured latent trait. Particular items compose arandom subset of all possible items of the measured area [Givon and Shapira 1984; Weathers,
Sharma and Niedrich 2005].
16
The whole set of statements is sometimes called the questionnaire [Ostasiewicz 2003].

108

Selected issues on scales classification and scaling

statement either a clearly positive or clearly negative form with respect to


the construct of interest. Statements neutral should not be included [Likert
1932].
When preparing items in reference to Likert scale one can avail from
Crocker and Algina [2008] guidelines. These authors suggested to: 1) put
statements or questions in the present tense; 2) do not use statements that
are factual or capable of being interpreted as factual; 3) avoid statements
that have more than one interpretation; 4) avoid statements that are likely
to be endorsed by almost everyone or almost no one; 5) use statements that
should be short, rarely exceeding 20 words, 6) use statements with a proper
grammatical sentence, 7) avoid statements containing universals such as all,
always, none, and never; 8) avoid use of indefinite quantifiers such as only,
just, merely, many, few, or seldom; 9) prepare statements in simple sentences
rather than complex or compound ones; 10) avoid statements that contain
if or because clauses; 11) use vocabulary that can be understood easily by
examinees; 12) avoid use of negatives (e.g. not, none, never).
The example of statements composing summated scale for personal values measurement is shown in Table 8.
Table 8. Summated scale with statements for measuring values on 5-point scale
List of statements
I strive to possess a large amount of money
I want to achieve a higher social status
I aspire to something only for myself
I want to be free in the activities and points of view
I am happy with a life Im leading
I spend time nicely and have lots of fun
I look for adventure and risk
I make my look very attractive

1
1
1
1
1
1
1
1

Response categories
2
3
4
2
3
4
2
3
4
2
3
4
2
3
4
2
3
4
2
3
4
2
3
4

5
5
5
5
5
5
5
5

Instruction for examinee:


Please rate the importance of each particular value on 5 point scale, ranging from 1 = totally disagree,
2 = disagree, 3 = neither disagree nor agree, 4 = agree, to 5 = totally agree.

Possible variants which determine the examinees attitude towards measured area are composed of partitions on the scale. They are called scale degrees or response categories. In determining the optimal number of points
on scale, many authors [Lehmann and Hulbert 1972; Cox 1980; Tarka and

Scaling on summated, cumulative and comparative scales

109

Kaczmarek 2013a, 2013b] inferred that the best options are 5 or 7 point
scales than 9 or 11 ones. More points on the scale leads to the greater dispersion of scores and to increase the degree of differentiation of objects [Green
and Rao 1970]. The optimal number of points on scale that is routinely and
often practiced in most of studies consists of 5 or 7 points.
Usually, categories of the responses on 5 point scale will have the following form:
totally disagree,
disagree,
no opinion, or neither disagree, nor agree,
agree,
totally agree.
These categories are coded with numbers ranging from 1 to 5, where number 5 is considered as more positive attitude of the examinee. Unlike, the
number 1 describes more negative attitude of the examinee. Edwards [1957]
replaced neutral category no opinion by two additional alternative categories somewhat agree and somewhat disagree determining the minimum intensity of the positive or negative attitude towards the object.
Placing the neutral category on Likert scale, is sometimes a matter of
making a choice between two types of formats either forced-choice scale or
the non-forced-choice scale. It is the researchers decision whether to give the
examinee an opportunity of avoiding the answer or force him/her to provide
the answer, no matter what he/she thinks about the measured phenomenon.
If researcher decides to leave a free choice, then scale should hold additional
category such as do not know or neutral point such as no opinion, neither disagree nor agree, etc. These categories provide respondents a wider space for
answering the statements, and they do not enforce the answers. On the other
hand, summated scale, that is based on forced-choice, forces the examinees
to clearly present their judgment in reference to the measured phenomenon
[Brzeziski 1978].
Some other yet important issues refer to balanced (with an equal number
of categories of positive and negative) or unbalanced categories on the scale
(with an unequal number of categories of either negative or positive) or issues concerning the application of even number of alternatives with equal
number of positive and negative expressions (such as, for example: unimportant value 4 3 2 1 / 1 2 3 4 and important value) or odd number of
alternatives to the unequal number of positive and negative expressions
with achoice of neutral point (unimportant value 4 3 2 1 (0) 1 2 3 4
important value).

110

Selected issues on scales classification and scaling

Finally, in construction of the summated scale, it is important to determine the extent to which each item in the questionnaire, measures the phenomenon. Likert [1932] proposed two kinds of such analysis:
correlation analysis,
analysis of internal consistency.
In context of correlations, if all items specify the same latent trait, then, they
must be correlated with each other. With r as the average value of the correlation coefficient between all items k of summated scale, we specify the
following goodness of fit of the scale through the agency of the Cronbachs
alpha coefficient:
=

kr
.
1 + r (k 1)

(3.0)

Items correlation analysis with the score of a total scale enables to separate incorrect (bad) items from correct (good) ones which might further
affect the final version of scale. The items classification to the final scale depends on high correlation coefficients (used as a discriminatory power of
items) or t statistics.
Below, the stages of the summated scale development are presented:
Formulation of statements which are further verified with regard to
their linguistic-word expressions. From the initial set of statements,
a preliminary version of the scale is constructed, which should hold
more items than its final version. A larger number of items allows the
elimination of unnecessary items.
Development of the response system to each item, where response categories should be evenly distributed along the continuum of the examinees attitude towards the measured objects.
Carrying out research in the respective population based on sample of
examinees, where its size should not be less than 100 examinees.
Items analysis in terms of their appropriateness, through the agency
of statistical correlation techniques, t statistics and Cronbachs alpha
[Brzeziski 1978, 2007].
In social sciences (marketing research), the summated scale (see Figure 10) is used mainly for three reasons [Spector 1992; Hair et al. 2010].
Firstly, it can produce scale with good psychometric properties, that is,
awelldeveloped scale will be of good reliability and validity17. Secondly, it is
17

This scale provides means of overcoming, to some extent, the measurement error inherent in all items. It reduces measurement error by using multiple indicators to reduce the reli-

Scaling on summated, cumulative and comparative scales

111

Items

E1

X1

E1

X1

E2

X2

E3
.
.
.
En

X3
.
.
.
Xn

Summated scale

Legend: Dotted lines link scale (i.e. latent variable or construct) with particular items (observed
variables) which are indicators of the scale. Scores on items are theoretically driven by the latent
variable/scale; that is they are reflected by the latent variable.

Figure 10. Summated scale


Source: own construction based on Spector 1992.

relatively cheap and easy to develop in the consumers research. The writing
item is straightforward and the initial development of the scale requires only
100 to 200 examinees. Thirdly, as Spector explained, a well-devised scale is
usually quick and easy for examinees to complete and typically does not induce complaints from them [1992].

Cumulative scales
Cumulative scaling is based on the deterministic methods [Ostasiewicz
2003]. The most common known is the Guttmans scalogram, where responses are aggregated once they meet certain conditions, e.g. if researcher
measures a phenomenon that in its nature assumes the growth, development, evolution or complexity, etc.
Guttmans scalogram can be demonstrated on the consumers attitudes to
materialism value, composed of the following hierarchically set items:
A. I strive to possess a large amount of money.
B. I work very hard to earn more money than other people.
ance on a single response. In concequence, by using the average or typical response to a set of
related variables, the measurement error that might occur in a single question will be reduced.

112

Selected issues on scales classification and scaling

C. Because I earn lots of money I buy many things.


D. I spend lots of money during shopping.
For each of these items the examinee can express its approval or disagreement. Note, however, that these items were formulated in its special way.
If examinee approves the second item and if he/she is consequent, he/she
should also approve another two items. These relationships have nonlinear
form, i.e. items are set hierarchically, arranged in their logic order consistent with the AIDA scheme [van der Linden and Hambleton 1997]. In effect,
the Guttman scale consists of the items having monotonic, cumulative and
hierarchical nature18.
Response categories in Guttman scale assume binary system of providing
answers (yes/no), they allow for defining the examinees location on continu
um of the measured latent trait. A process of coding responses composed
with two binary numbers, for four examinees is presented in Table 9.
Table 9. Binary codes for Gutmmans scalogram
Examinee

A
B
C
D

1
1

Items
2
1

3
1

Sum
3

Source: Ostasiewicz 2003, p.22.

In ideal case, we should reproduce hypothetically attitude of the next (in the
row) examinee, just knowing the sum of total score that he earned. For example, lets assume that fifth examinee (E) would have the following answers:
1, 0, 1, yielding the amount of scores equal to 2, similarly as the examinee B.
However, the attitude of examinee E cannot be sufficiently reproduced only
on the basis of collected scores, because they might be misleading. Thus, we
reach the reproducibility coefficient, which is expressed as follows:

r = 1

k
n 1
=
,(3.1)
k n
n

18
When examinee answers Yes to the statement C, it logically implies the answer Yes
to statements A and B. As a result on the basis of the collected answers Yes to the statement
D we can predict the correct answers to the first three previous statements A-B-C.

Scaling on summated, cumulative and comparative scales

113

where:
k number of items,
n number of examinees,
b number of errors / incorrect answers.
If value of this coefficient is lower than 0.85 then items do not form the unidimensional latent trait19. Here scalability implies reproducibility, i.e. from
an individuals score (assuming the order of items) we can reproduce the
response to each item. In practice, a set of items is administered to a sample
of examinees. The hypothesis of scalability is then tested by the proportion
of correctly reproducible responses, as indicated by the reproducibility coefficient.
Since there never can be as many errors as responses, the reproducibility
coefficient does not have a lower bound of zero. Therefore, the reproducibility coefficient alone, is sometimes inadequate for the assessment of the
scale. There were undertaken two attempts to improve scale. One approach
involved calculating the number of errors that would be expected by chance.
Then, scale reproducibility should be compared with a chance reproducibility, which is computed by substituting chance errors for actual errors20 according to the formula21:

r = 1

b
.(3.2)
kn

Unfortunately, significant reproducibility can be obtained even if only


a few of the scale items satisfy the hypothesis of scalability or if the items
are multidimensional or clearly do not fit the Guttman model. Robinson
suggested an item-analysis method of Guttman scales. It was based on in 19
An important objective in Guttman scaling is to maximize the reproducibility of response
patterns from a single score. A good Guttman scale should have a coefficient of reproducibility (the percentage of original responses that could be reproduced by knowing the scale scores
used to summarize them) above 0.85.
20
There are two popular methods of counting actual errors: the original Cornell method
[Guttman 1947] and the Goodenough method [Goodenough 1944]. Both order the items according to the frequency passing each item [Chevan 1972] and then predict response patterns
from scale scores with reference to that order [Stouffer 1950, p. 100].
21
Borgatta [1955] suggested the use of ratio of actual errors to chance errors. And some
tests have been developed for assessing the significance of scale reproducibility compared to
chance reproducibility [Chilton 1969].

114

Selected issues on scales classification and scaling

ter-item Yules correlations and avoided the entire problem of scale reproducibility [Robinson 1973].
Other approach to scale improvement, involved calculation the maximum possible number of errors. However, coefficients employing this approach also had many limitations. They tended to have high values. Under
some conditions (such as few items) a scale acceptable by these coefficients
may not be acceptable by chance error coefficients [Schooler 1968].

Comparative scales
Third group of scaling is represented by Thurstones comparative scales,
which are used to scaling stimuli and which are generally applied in situations when we need to make choices between something that is better and
something that is close to ideal. This kind of scaling helps us to determine
the preference or similarity relationships, which are used to define so-called
social space, for example, the space of friendship, availability of services, as
well as consumers personal values.
Comparative scaling, unlike the previously two discussed methods are far
more formalized. The basis of these methods is the hypothetical-deductive
scientific paradigm. At the same time there is a basic premise about anormal distribution of the stimuli. Scaling alone, depends on possibility of comparison of two stimuli. The score of comparison gives the relative frequency
of similarity or preferences. For example, we may assume there exist hypothetical preferences between consumers personal values. If 85% of consumers prefer personal value, such as love to hate, then it can be assumed that
P(Ai Bj) = 85/100. Leaving aside the technical problems to justify this conclusion, we should say that as long as value Ai grows, as compared to value Bj
we can determine zij by the formula [Ostasiewicz 2003]:
zij

x2

1
2
pij = P ( Ai B j ) =
e dx.(3.3)
2

For example, if p12 = P(Ai Bj) = 85/100, then using (3.3) or normal distribution tables, we can find out that z12 = 1.04. By comparison of Ai value with
any other value, we can obtain the scores as follows: zi1, zi2, , zik. The average of these scores (zi1 + zi2 + + zik)/k specifies the location of value (taken
into account) on the constructed scale, which we assign to Ai.

Rokeachs, Schwartzs and Kahles scales for values measurement

115

Rokeachs, Schwartzs and Kahles scales for values


measurement
The Rokeach value survey
Values due to their latent nature are measured mostly indirectly. Measuring values, like measuring many other social, psychological concepts is still
imperfect. There is a distinct lack of standardization across theoretical and
empirical research [Hitlin and Piliavin 2004].
The early years of values exploration and beginning of value scales construction can be traced back to Rokeachs, Schwartzs and Kahles measurement instruments. Rokeach laid solid foundations for values measurement.
His pioneering contribution and conceptualization of values was a kind of
testament left for future scientists pursuing research on human values and
simultaneously development of diverse instruments [Robinson, Shaver and
Wrightsman 1991]. For example, Rokeach influenced the Schwartzs value
survey. Both scholars conceptualized values similarly, with one notable exception. For Rokeach, the distinction between means (instrumental values)
and ends (terminal values) was fundamental. Schwartz found no empirical
evidence for this distinction and questioned its utility. The same values can
express motivations for both means and ends. These two paradigmatic empirical scholars also have taken different approaches to values measurement.
Rokeach advocated asking examinees to rank values and Schwartz defended
a rating, non-forced-choice approach22.
In empirical study, Rokeach gave examinees two main tasks. At first he
asked them to rank eighteen instrumental values, which were perceived in
terms of their importance as guiding principles in peoples life [Rokeach
1973]23. Later on, the same values were used to form a system of values.
In his value survey instrument (RVS see Table 10) examinees evaluated
importance of selected goals (eighteen values) in their life. The most popular form of the measurement instrument was form D (with gummed labels)
22

See description in further section.


Just to remind here, values were regarded as a part of functionally integrated cognitive
system where basic units of analysis are beliefs, which could be turned into clusters that formed
attitudes pertaining to value system. Rokeach simply thought that values are peoples general
beliefs, guiding their actions and attitudes. He also believed that values are hierarchically organized, and thus can be ordered.
23

116

Selected issues on scales classification and scaling

and E (where rank orders were written alongside values). In form D, a list of
terminal values was presented alphabetically. Each value was printed on a removable gummed label. Examinees then rearranged these gummed labels to
form a single rank order of all values with the most important at the top, and
the least important at the bottom. In form E of the value survey, gummed labels were not provided. Instead, these values were ranked by placing number
one next to the value to be the most important. Afterwards, the RVS data was
analyzed through median scores which were calculated for particular values
within a group. And as far as the reliability analysis was concerned, there
was no relevant option to measure it. However, test-retest analysis enabled
Table 10. The RVS scale
No.
List of personal values
1 A comfortable life (a prosperous life)
2 An exciting life (a stimulating, active life)

Ranks

3
4
5
6
7
8
9

A sense of accomplishment (success)

A world at peace (free from war and conflict)

A world of beauty (beauty of nature)

Equality (brotherhood, equal opportunity )

Family (taking care of loved one)

Freedom (independence, free choice)

Happiness (contentment)

10
11
12
13
14
15
16
17
18

Inner harmony (freedom from inner conflict)


Mature love (sexual and spiritual intimacy)
National security (protection from attack)
Pleasure (an enjoyable, leisurely life)
Salvation (being saved, eternal life)
Self-respect (self-esteem)
Social recognition (respect, admiration)
True friendship (close companionship)
Wisdom (a mature understanding of life)

Instruction for examinee: Please read all the list of values and rank them all [starting from the value
that is the most important to you and finishing with the value that is the least important to you]. Please
remember, the numbers ranks you ascribe for particular values, must be set in ascending order [from
number rank 1, 2, 3, 4, 5, 6, up to number rank 18]. For example: number [rank] 1 = value as
important as it can be for you among other values, number [rank] 2 = value of secondary importance,
number [rank] 3 = value of tertiary importance and so on. Put the ranking numbers in empty squares.
Source: Rokeach 1973.

Rokeachs, Schwartzs and Kahles scales for values measurement

117

calculation of median for the distribution of rank correlations, applied to all


eighteen values for each examinee in the sample.
The list of Rokeach values was presented in Table 10, and the example
on how examinees should be asked to rank their personal values is shown
below.

Schwartzs value survey


Schwartz explicitly drew on Rokeachs work. He borrowed many of the items
from the Rokeach value survey. However, the Schwartzs value survey asked
examinees to rate items on 9-point scale. The scale ranged from response7
(of supreme importance) through 3 (important) and 0 (not important) to 1
(opposed to my values), what allowed individuals to rate different values as
being equally important to them, and allowed for the possibility that a given
value item would be negatively estimated. As Schwartz argued, given the
nature of values as conceptions of the desirable, and given the ubiquity of
cultural discourses that serve to legitimate nearly all the values included in
the various surveys, one cannot expect to find many individuals who feel
that a particular item is against their own set of values.
Schwartzs original survey included 56 value-items and was derived from
a variety of cross-cultural studies presented to the examinees on one page,
with each item followed by a short explanatory phrase. Schwartz used smallest space analysis [Guttman 1968] to assess and confirm an organization of
human values with repeated cross-national samples [Schwartz and Bilsky
1990; Schwartz 1992]24.

LOV list of values


Some other yet alternative was developed by researchers from the University of Michigan [Kahle 1983; Veroff, Douvan and Kulka 1981]. The empirical study was originally conducted by Kahle. He made a much shorter and
condensed list of values as Rokeach did. A scale was given the name LOV
[Beatty et al. 1985]. In fact, LOV scale was developed from theoretical base
of Feathers [1975], Maslows [1954], and then Rokeachs [1973] work on
24
Schwartz has also developed personal values questionnaire (PVQ), an instrument containing less-abstract items that are more accessible to a wider population. Responses to this
instrument replicate the basic model of value relations.

118

Selected issues on scales classification and scaling

values. It was theoretically tied to social adaptation theory [Kahle 1983], in


which individuals were conceptualized to various life roles, based in part
upon value fulfillment.
Kahle has modified RVS values into smaller subset of values. In the initial
administration, in LOV approach he has concentrated on gathering information of what people consider to be their first and second most important values from the list of nine rather than requiring a full ranking of these values.
Kahle [1983] and his associates derived a number of important conclusions
about societal, role and psychological adaptation in a national survey by only
examining the two values. Later on, this method was developed, where examinees were asked not just to identify only two important for them values,
but also they were allowed to consider all of them (i.e. nine values) [Beatty
and Kahle 1984]. These values were also evaluated through paired comparison [Reynolds and Jolly 1980] and rating approaches [Munson 1984]. In the
latter option, LOV measured nine values, which were scored in 9-point scale
where answers on the scale ranged from very unimportant to very important
(see the Table 11).
Table 11. List of values LOV scale
No.
1.
2.
3.
4.
5.
6.
7.
8.
9.

List of personal values


Very unimportant 
Sense of belonging
1
2
3
4
Warm relations with others 1
2
3
4
Being well respected
1
2
3
4
Security
1
2
3
4
Self-respect
1
2
3
4
A sense of accomplishment 1
2
3
4
Excitement
1
2
3
4
Self-fulfillment
1
2
3
4
Fun and enjoyment of life
1
2
3
4

5
5
5
5
5
5
5
5
5

6
6
6
6
6
6
6
6
6

Very important
7
8
9
7
8
9
7
8
9
7
8
9
7
8
9
7
8
9
7
8
9
7
8
9
7
8
9

Source: Kahle 1983.

A brief review of other measures-scales applied for values


analysis
Rokeachs primary dominance over the values measurement and his simultaneously considerable contribution to the value concept was not the only
one in the history of values studies. Some other invented scales offered also

A brief review of other measures-scales applied for values analysis

119

useful alternatives. Among the other instruments for values measurements


(along with Rokeachs work) are:
1) the study of values [Allport, Vernon and Lindzey 1960],
2) the value survey [Rokeach 1968],
3) the goal and mode values inventories [Braithwaite and Law 1985],
4) ways to live [Morris 1956],
5) revised ways to live [Dempsey and Dukes 1966],
6) value profiles [Bales and Couch 1969],
7) life role inventory [Fitzsimmons, Macnab and Casserly 1985],
8) conceptions of the desirable [Lorr, Suziedelis and Tonesk 1973],
9) empirically derived value constructions [Gorlow and Noll 1967],
10) the east-west questionnaire [Gilgen and Cho 1979],
11) value orientations [Kluckhohn and Strodtbeck 1961],
12) personal value scales [Scott 1965],
13) survey of interpersonal values [Gordon 1960],
14) the moral behavior scale [Crissman 1942; Rettig and Pasamanick 1959],
15) the morally debatable behaviors scales [Harding and Philips 1986].
Some of these scales demonstrate that values have been measured using abstract philosophical issues that transcend cultural boundaries (i.e. scales: 4,
5, 10, and 11), by drawing upon a broad range of goals, ways of behaving
and states of affairs, that are valued in western societies (scales: 1, 2, 3, 6,
7, 8, and 9), and by focusing more narrowly on personal,- interpersonal or
moral behavior held in high regard in western cultures (scales with nos.:
12, 13, 14, and 15). As a result, the diversity in dimensions and diversity
used for item evaluation is quite considerable. For example, examinees may
give their judgments in terms of preference (scales: 1, 4), agreement (6, 10),
importance (7, 13), goodness (14), justifiability (15), importance as guiding
principles (2, 3) and consistent admiration (12). The semantic distinctiveness
of these dimensions has been well documented, but less is understood about
their empirical distinctiveness [Levitin 1968].
In the midst of this variability, one can detect some common threads.
While few value researchers have empirically examined cross-instrument
relationships, some recurring themes emerge in the dimensions identified
through factor analytic studies of these instruments. Related scales (with
their respective numbers given in brackets) falling under each category, are
as follows [Robinson, Shaver and Wrightsman 1991, p.667]:
concern for the welfare of others: benevolence (13), kindness (12), social orientation (1, 7), equalitarianism (6), humanistic orientation (8),
apositive orientation to others (3), and receptivity and concern (4),

120

Selected issues on scales classification and scaling

status desired or respected: recognition and leadership (13), status (12),


personal achievement and development (7), acceptance of authority (6),
status-security values (9), an authoritarian orientation (8), social standing(3),
self-control: self-control (12), social restraint and self-control (4),
unrestrained pleasure: self-indulgence (4), need-determined expression
vs. value-determined restraint (6), hedonistic orientation (8),
individualism: independence (13, 12, 7), withdrawal and self-sufficiency
(4), individualism (6), the rugged individualist (9), the work ethic (8),
social adeptness: social skills (12), conformity (13),
religiosity: religious orientation (1), traditional religiosity (3), religiousness(12).
In making a reasonable choice among all of the fifteen types of scales,
one should strongly rely on the research questions and the context in which
values are to be assessed. For example, when we consider the criteria such
as conceptual breadth, instruments ranging from no. 1 to 11 are based on
abroad conceptualization of the value domain, while 12 and 13 are restricted in scope to interpersonal values, and 14, 15 are more narrowly to moral
values.
Other criteria points at the representative sampling of items from the domain of inquiry. Relevance and comprehensiveness of items can be addressed
with instruments 2, 3, 6 and 12. On the other hand, in context of reliance on
multi-item rather than single-item measurement, most of the above scales are
strong on this criterion, but instruments 2, 4 and 5 adopt asingle-item approach.
In the context of the sampling process, best part of instruments have been
developed and used with college students, but there are some notable exceptions, e.g. instruments 1, 2, 7, 13, and 15 have been used extensively in nonstudent populations, 2 and 15 with large probability samples from the general
population. Finally, instruments as 6, 8, 9, 10, 11, 14, and 15 appear to be
weak on the criterion as availability of basic data on reliability and validity.
Except to the instruments (115), there have been also created:
measures of specific values such as: altruism [Rushton, Chrisjohn and
Fekken 1981], equality [Bell and Robinson 1978], materialist-post
materialist goals [Inglehart 1977],
measures of moral judgment ability [Rest 1972],
measures of broader concepts such as modernity [Kahl 1968],
measures that focus on values in work, family or specific contexts
[Kohn 1969; England 1967; Hofstede 1980; Harding and Philips 1986],

A brief review of other measures-scales applied for values analysis

121

measures of children values [Smart and Smart 1975; Lortie-Lussier,


Fellers and Kleinplatz 1986],
projective measures [Kilmann 1975; Rorer and Ziller 1982].
A noteworthy contribution to the values measurement is also the method
for comparing values across cultures, which was developed by Triandis et al.
[1972]. Authors of this instrument selected twenty abstract concepts that
were expected to highlight cultural differences in values (e.g. anger, freedom,
punishment, death, love). In empirical study examinees indicated what each
of these concepts meant to them by selecting from a list of five words one
that identified the cause of the concept. Examinees were presented with six
such list, allowing them to choose six antecendents for each concept. After
selecting antecendents, the same procedure was followed to identify consequences or the results of the concept.
An alternative cross-cultural methodology for values has been proposed
by Triandis and his co-workers in relation to the specific value orientation of
individualism-collectivism [1986]. A large item pool was generated by cooperating researchers in nine countries and only those items that half or more
of the researchers found relevant and none found irrelevant were used. The
resulting twenty one items were then subjected to an item and factor analysis
within each culture. The analyses ensured relevance to the collectivism construct within each culture and comparable interrelationships among items
across cultures. On this basis, the assumption could be made that items were
being given similar meanings by the different cultural groups. Finally, Triandis et al. [1986] have factor-analyzed the twenty one items across all examinees from the nine cultures. This analysis helped them to identify four
factors for cross-cultural comparisons.
Another conceptual and methodological work was the Hofstedes [1980].
This author constructed four value dimensions that were identified as basic
problems of humanity with which every society has to cope with:
power distance (social inequality and the authority of one person over
another),
uncertainty avoidance (the way societies deal with the uncertainty of
the future),
individualism vs. collectivism (the individuals dependence on the
group),
masculinity vs. feminity (the endorsement of masculine (e.g. assertive)
goals as opposed to feminine (e.g. nurturant) goals within the groups.
These dimensions were not too dissimilar from value orientations outlined
by Kluckhohn and Strodtbeck [1961]; Bales and Couch [1969] and may be

122

Selected issues on scales classification and scaling

still modified for use outside the work context. Methodologically, Hofstede
[1980] has emphasized the importance of analyzing data at the ecological
level as distinct from the individual level. Hofstede has derived his value
measures through analyzing the scores of forty countries rather than the
scores of individuals.
The last interesting contribution to values measurement was the Ingleharts
work in social values [Inglehart 1979]. Inglehart focused on one major dimension representing desirable national goals. At one end, he placed the materialist
values (that arise in response to needs and physical security), and at the other
pole, he defined postmaterialist values (that is, values concerned with social and
self-actualizing needs). Inglehart has measured the materialist-postmaterialist
value dimension through the ranking of twelve national policy objectives, half
of which represent materialist values (e.g. fighting rising prices), the remainder
non-materialist values (e.g. giving people more say in important government
decisions). Ingleharts conceptualization was based on Maslows need hierarchy
[1954]. The materialist values representing sustenance and safety needs, must
be satisfied before the postmaterialist values tapping belongingness, esteem,
intellectual and aesthetic needs are given priority.
Most of the above-discussed scales, that were invented and used by researchers in the history of values measurement, had been listed in Table12
with their respective descriptors such as: name of the scale, author, dimension of item evaluation, single or multi-item measure application, reliability,
sample and possible marketing applications. They all were classified according to four main areas representing:
1) general values and conceptualization of value domain,
2) typical personal values,
3) values related to environmentalism and socially responsible consumption,
4) values related to moral values, and materialism or possession/objects.
Having based on this typology (see Table 12) we might have adopted some
of them to marketing activities. The selected ones might be, for instance:
health consciousness scale: HCS [Gould 1988]; subjective leisure scales:
SLS [Unger and Kernan 1983]; belief in material growth scale: BIMG
[Taschian, Slama and Taschian 1984]; possessions: attachment to possessions scale [Ball and Tasaki 1992] or subjective discretionary income scale:
SDI [OGuinn and Wells 1989]. These scales seem to be very useful for specific industries. In consequence:
Health consciousness scale: HCS will be important from the perspective of exploration of the consumers health orientation and their lively

[123]

Name
of the scale

Authors

Dimension Singleitem
for item
or multi
Reliability
Sample
evaluation item scales
General values and conceptualization of value domain
The value
Rokeach
Importance Singleitem No relevant because of single
Adult samples drawn from stusurvey: RVS
1968
as guiding
item measure
dents and general populations
principles
in U.S. (1968, n=140; 1971,
n=1430)
List of values: Kahle 1983 Importance Singleitem No relevant because of single
Probability sample of n=2264
LOV
item measure
Americas
The study of Allport,
Preference Multi-item Split-half reliability coef. ranged College American students and
values
Vernon and
from 0.84 for theoretical values adults, n=8369 including male
Lindzey
to 0.95 religious values
and female
1960
Multiple-item Herche
Agreement Multi-item Alpha reliability coef. Estimated Sample of n=333, n=416 and
measures
1994
ranged from 0.67 to 0.81 across n=291 students were used. And
of values:
the nine values dimensions
asample of n=683 adult con ILOV
M
sumers was used to test MILOV
properties
The goal and Braithwaite Importance Multi-item Alpha coef. for the two-item
Sample consisted of two commode values and Law
as guiding
scales (social stimulation and get- munities of cities Brisbane
inventories
1985
principles
ting ahead) were barely adequate (n=483) and students in
0.53 and 0.66. The remaining
Queensland (n=480) in Ausscales ranged from 0.66 to 0.89
tralia
Ways to live
Morris 1956 Preference Singleitem No relevant because of single
Sample of male (n=2015) and
item measure
female (n=831) students in the
U.S.

Table 12. Typology and characteristics of scales for values measurement

Moderate

Strong

Strong

Moderate

Strong

Strong

Marketing
application

[124]

Lorr,
Importance
Suziedelis
and Tonesk
1973
Gorlow and Importance
Noll 1967

Conceptions
of the
desirable

Empirically
derived value
constructions
The east-west Gilgen and
questionnaire Cho 1979

Agreement

FitzsimImportance
mons,
Macnab and
Casserly
1985

Life role
inventory

Value profile

Multi-item

Multi-item

Multi-item

Multi-item

Multi-item

Alpha reliability coef. For the


instrument was of 0.70

No information on reliability
was encountered

Data on internal consistency


coef. are provided

Alpha reliability coef. For each


scale ranged from 0.67 achievement, 0.88 altruism

No reliability data were encountered

Dimension Singleitem
for item
or multi
Reliability
evaluation item scales
Preference Singleitem No relevant because of single
item measure

Dempsey
and Dukes
1966
Bales and
Agreement
Couch 1969

Authors

Revised ways
to live

Name
of the scale

Questionnaire completed by
sample of 552 examinees (predominantly students)
English- and French-speaking
samples of Canadian adult
workers (n=6382), and high
school students (n=3115),
and Englishspeaking students
(n=623)
Two samples of adult men
(n=365) and women (n=300)
that varied in educational level
and social class
Sample of 105 examinees of
varying background in the
university community
Students in U.S. (n=210);
transpersonal psychologist
(n=69) and businessmen
(n=46)

Sample of 230 students in psychology class

Sample

Weak

Strong

Strong

Moderate

Moderate

Marketing
application
Moderate

Table 12 cont.

[125]

Multi-item

No conventional reliability coef.


Have been encountered

Sample consisted of 23 SpanishAmericans, 20 Texans, 20 Mormons, 22 off-reservation Navajo


and 21 Zuni in the U.S.

Multi-item

Unger and
Kernan
1983

Agreement

Multi-item

Though reliability estimates


were performed, they were not
reported

A 0.93 alpha estimate was reported for the nine-item HCS

Strong

Moderate

Moderate

Strong

Moderate

Two samples were used. The first Strong


consisted of 132 students, and
the second sample consisted of
160 nonstudents adults

One sample of n=343 adult


consumers from U.S.

Typical personal values


Consistent Multi-item The alpha reliability coef. For
College students of n=200
admiration
the short form ranged from 0.55
for independence to religiousness. The long form ranged from
0.80 honesty to 0.89 physical
Importance Multi-item Coef. ranged from 0.71 for recLarge samples (nfrom 2667 to
ognition to 0.86 for: conformity, 3941) of American high school
independence, and benevolence
and college students
Values related to environmentalism and socially responsible consumption
Agreement Multi-item
The coef. alpha for the entire
A sample of n=238 under31-item ECOSCALE was 0.93
graduate students and n=215
college students

Agreement

Gould 1988 Preference

Stone,
Barbes and
Montgomery 1995

Gordon
1960

Survey of
interpersonal
values

Environmentally
responsible
consumers:
ECOSCALE
Health conscious scale:
HCS
Leisure: subjective leisure
scales:
SLS

Scott 1965

Personal
value

Value orienta- Kluckhohn


tions
and Strodtbeck 1961

[126]

Belief in material growth


scale:
BIMG
Materialism
measure

Social issues:
anxiety with
social issues
Socially
responsible
consumption
behavior:
SRCB
Voluntary
simplicity
scale:
VSS

Name
of the scale

Taschian,
Slama and
Taschian
1984
Richins
1987

Leonard
Barton
1981;
Cowles
and Crosby
1986

Sego and
Stout
1994
Antil and
Bennett
1979

Authors

Agreement

Multi-item

Singleitem

Multi-item

Guttmans Lambda was 0.93 and A number of samples were used


alpha reliability 0.92 respectively in the scale development. They
were set in the following order:
n=444, 321, 98 and 690

Undergraduate students
(n=103)

Sample

Strong

Marketing
application
Weak

For personal materialism alpha


was 0.73 and general materialism 0.61

A quota sample of 252 adults

Strong

Reliability estimates of the 9A number of samples were used Moderate


and 19-item versions of the scale in the scale development and
ranged from alpha 0.52 to 0.70
reliability estimation process.
The expanded version 19-items
was tested on: 423 examinees,
homeowners in California. Lastly
18item version was administered
to 812 California homeowners
Values related to moral values, and materialism or possession/objects
Agreement Multi-item The overall alpha for the final
A group of 25 student judges
Strong
12 BIMG was 0.82
to trim original 50 items, then
asample of 365 adults

Importance

Responsibility

Dimension Singleitem
for item
or multi
Reliability
evaluation item scales
Agreement Multi-item No estimates for internal consistency reliability were offered

Table 12 cont.

[127]

Ball and
Tasaki
1992
Yamauchi
and
Templer
1982

Possessions:
attachment to
possessions
Money
attitude scale:
MAS

Nostalgia
scale

Moschis
and
Churchill
1978
Holbrook
1993

Belk 1984

Materialism
scales

Materialistic
attitudes:
MMA

Inglehart
1981

Materialismpostmaterialism scale

Multi-item

Multi-item

Importance

Multi-item

Multi-item

Multi-item

Agreement

Agreement

Agreement

Agreement

Justifiability Multi-item

Representative samples including nations: Britain, France,


West Germany, Belgium, Italy,
the Netherlands, Luxemburg,
Ireland, Denmark, U.S. and
Japan
Student sample of 237 and nonstudents n=338

Coefficient alpha were:


0.80 power-prestige, 0.78
retention-time, 0.73 distrust,
0.69 anxiety

For first sample coef. alpha of


0.78 (for summated scale) and
for second 0.73
Coefficient alpha for the 9 items
was 0.93

Strong

Strong

Moderate

Two samples were used. At first Strong


300 adults from California cities.
And second sample of 125 students was engaged

First sample (n=167) of gradu- Moderate


ate business students, second
(n=156) nonstudent adults
A sample of 331 college students Weak

A number of reliability coef.


estimates are reported. For
example in first study, alpha
estimates for the possessiveness,
nongenerosity, envy were 0.68;
0.72; 0.80 respectively. The overall summed scale (24 items) had
alpha of 0.73 (n = 237) and 0.66
(n = 338)
The coef. alpha reliability of the 806 adolescents
scale was reported to be 0.60

No estimates for internal consistency reliability were found

[128]

Sample

Marketing
application
Weak

The instrument was adminisWeak


tered to large random an quota
sample in 10 European countries

Large samples of male and female college students, alumni,


blue collar and white collar
workers

Source: own construction based on: Robinson, Shaver and Wrightsman 1991; Bearden and Netemeyer 1999

Authors

Dimension Singleitem
for item
or multi
Reliability
evaluation item scales
The moral
Crissman
Goodness
Multi-item Kuder-Richardson reliability
behavior scale 1942;
coef. have been reported. Across
Rettig and
total scale a coef. was of 0.93
Pasamanick
among students, 0.95 among
1959
alumni, 0.96 blue collar workers and 0.93 among white collar
workers
The morally
Harding
Justifiability Multi-item No internal consistency coeffidevatable
and Philips
cients were encountered
behaviors
1986

Name of the
scale

Table 12 cont.

A brief review of other measures-scales applied for values analysis

129

interests in the pharmaceutical products purchase. This scale seems to tap


an overall alertness, self-consciousness, involvement and self-monitoring of
persons health.
The HCS scale is composed of nine items scored on 5-point scale ranging
from 0 to 4 (Table 13).
Table 13. Health conscious scale: HCS
No.
1
2
3
4
5
6
7
8
9

List of statements
I reflect about my health a lot
Im very self-conscious about my health
Im generally attentive to my inner feelings about my health
Im constantly examining my health
Im alert to changes in my health
Im usually aware of my health
Im aware of the state of my health as I go through the day
I notice how I feel physically as I go through the day
Im very involved with my health

Response categories
0---1---2---3---4
0---1---2---3---4
0---1---2---3---4
0---1---2---3---4
0---1---2---3---4
0---1---2---3---4
0---1---2---3---4
0---1---2---3---4
0---1---2---3---4

Legend: 0 Statement does not describe you at all, 1 Statement describes you a little, 2 Statement describes you about fifty-fifty, 3 Statement descries you fairly well, 4 Statement describes you very well.
Source: Gould 1988.

Subjective leisure scales: SLS measures consumers activity at their leisure time including free time, recreation and play. Everything from entertainment industry to gastronomy, hotels catering and many others,
should profit from it.
Unger and Kernan proposed six determinants of leisure: intrinsic satisfaction, perceived freedom, involvement arousal, mastery and spontaneity. These determinants were expressed and generated by the following
statements described with the response categories as: 1 strongly agree,
2 agree, 3 somewhat agree, somewhat disagree, 4 disagree, 5 strongly
disagree (Table 14).
Belief in material growth scale: BIMG will describe material orientation of consumers. It places a high value on material comforts and conveniences. Also, it explores economic effort and views actions taken by
consumer for particular products or services. The BIMG scale was designed to measure peoples belief in relation to energy consumption. For
its broad conception, it can be applied to many various areas of industries.

130

Selected issues on scales classification and scaling

Table 14. Subjective leisure scale: SLS


No.
List of statements
1 It is its own reward
2 Not because I have to but because I want to would characterize it
3 I feel like Im exploring new worlds
4 I feel I have been thoroughly tested
5 I could get so involved that I would forget everything else
6 I wouldnt know the day before that it was going to happen
7 I enjoy it for its own sake, not for what it will get me
8 I do not feel forced
9 There is novelty in it
10 I feel like Im conquering the world
11 It helps me forget about the days problems
12 It happens without warning or pre-thought
13 Pure enjoyment is the only thing in it for me
14 It is completely voluntary
15 It satisfies my senses of curiosity
16 I get a sense of adventure risk
17 It totally absorbs me
18 It is a spontaneous occurrence
19 I do not feel obligated
20 It offers novel experience
21 I feel like a real champion
22 It is like getting away from it all
23 It happens out of the blue
24 Others would not have to talk me into it
25 It makes me feel like Im in another world
26 It is a spur-of-the-moment thing

Response categories
1---2---3---4---5
1---2---3---4---5
1---2---3---4---5
1---2---3---4---5
1---2---3---4---5
1---2---3---4---5
1---2---3---4---5
1---2---3---4---5
1---2---3---4---5
1---2---3---4---5
1---2---3---4---5
1---2---3---4---5
1---2---3---4---5
1---2---3---4---5
1---2---3---4---5
1---2---3---4---5
1---2---3---4---5
1---2---3---4---5
1---2---3---4---5
1---2---3---4---5
1---2---3---4---5
1---2---3---4---5
1---2---3---4---5
1---2---3---4---5
1---2---3---4---5
1---2---3---4---5

Source: Unger and Kernan 1983.

S coring procedures involve judgment of twelve statements on 5 or 7-point


scale with the following range of strongly disagree to strongly agree (Table15).
Possessions: attachment to possessions scale defines the extent to which
subject is owned, and is expected to be owned, or previously owned by
an individual, and is used by the consumer to maintain his or her selfconcept. This scale, for its contents is strongly recommended for the use

A brief review of other measures-scales applied for values analysis

131

Table 15. Belief in material growth scale: BIMG


No.
List of statements
1 I always want to be the best
2 Material growth has an irresistible attraction for me

Response categories
1---2---3---4---5
1---2---3---4---5

Material growth makes for happier living

1---2---3---4---5

Growth in material consumption helps raise the level of civilization

1---2---3---4---5

Ownership and consumption of material goods has a high


value for me

1---2---3---4---5

More is better

1---2---3---4---5

Increase is the amount of goods and services produced is not


essential to my well being

1---2---3---4---5

I am reluctant to conserve material goods and services when


it affects my daily life

1---2---3---4---5

I would rather be perfectly comfortable in my home (neither


warm nor cold) than be slightly comfortable to converse

1---2---3---4---5

10 I have worked hard to get where I am, and Im entitled to the


good things in life
11 People should heat and cool their homes to the most comfortable temperatures regardless of what the government says
12 The only way to let everyone know about my high status is to
show it

1---2---3---4---5
1---2---3---4---5
1---2---3---4---5

Source: Taschian, Slama and Taschian 1984.

by automotive industry, especially in the process of cars designing, promotion, price and sales optimization.
The contents of scale is composed of nine items scored on 6-point Likert
scale ranging from 1 disagree to 6 agree (Table 16).
Subjective discretionary income scale: SDI will measure perceived
spending power of consumers. Specifically, it is an estimate of how much
money consumers would like to spend.
The SDI is a three-item scale where items are scored on 6-point scales
ranging from definitely disagree to definitely agree. The list of items is characterized by the following statements:
1) no matter how fast our income goes up we never seem to get ahead,
2) we have more to spend on extras than most of our neighbors,
3) our family income is high enough to satisfy nearly all of our important
desires.

132

Selected issues on scales classification and scaling

Table 16. Attachment to possessions scale


No.
List of statements
1 Imagine for a moment someone making fun of your car. How much would you agree
with the statement: If someone ridiculed my car, I would feel irritated
2 How much do you agree with the statement: My car reminds me of who I am
3 Picture yourself encountering someone who would like to get to know you. How
much do you think you would agree with the statement: If I were describing myself,
my car would likely to be something I mentioned
4 Suppose someone managed to destroy your car. Think about how you would feel.
How much do you agree with the statement: If someone destroyed my car, I would feel
a little bit personally attacked
5 Imagine for a moment that you lost a car. Think of your feelings after such an event.
How much do you agree with the statement: If I lost my car, I would feel like I had lost
alittle bit of myself
6 How much do you agree with the statement: I dont really have too many feelings
about my car
7 Imagine for a moment someone admiring your car. How much would you agree with
the statement: If someone praised my car, I would feel somewhat praised myself
8 Think for a moment about whether or not people who know you might think of your
car when they think of you. How much do you agree with the statement: Probably
people who know me might sometimes think of my car when they think of me
9 Imagine for a moment that you have lost your car. Think about going through your
daily activities knowing that it is gone. How much do you agree with the statement:
IfIdidnt have my car, I would feel a little bit less like myself
Source: Ball and Tasaki 1992.

Rating-ranking scales controversy in values measurement


Researchers are divided over the preferences of making choice between
rating or ranking scales for measuring human values. However, the debate
originates from the false premise that one scale must exclude the other.
Aconceptualization of the values structure that use characteristics of both
rating and ranking systems opens up theory and research to a more complex
understanding of values [Ovadia 2004].
One way that research on human values differs substantially from research
on other science topics is the inexact definition of the subject of analysis.
While variables such as weight are largely agreed upon in their definition and
how they are to be measured, values typically do not have clear or unified
definition nor a universal model of measurement as previously assumed in
many historical studies. Choices that researchers need to undertake in defin-

Rating-ranking scales controversy in values measurement

133

ing values are not only aimed to determine the values structure but also the
scale of measurement.
As previously mentioned, in Rokeachs ranking scale the examinee was
asked to place a list of values in order of importance [1973]. Rokeachs approach to values measurement captures the real-world notion that values are
often in competition with one another. Ultimately, when values were in conflict, Rokeach argued that, individuals should be forced to choose between
them. Ball-Rokeach, Rokeach and Grube [1984] supported later ranking
over rating, but without empirical support. They thought that forced ranking choices are more realistic in a world of limited resources. The advantages
of ranking, however, are not clear-cut [Krosnick and Alwin 1985; Schwartz
1992]. Some ranking proponents [Krosnick and Alwin 1988] allow that it is
plausible that people do not, in practice, choose between values when faced
with action situations.
For the purposes of consumers research, there are a number of problems
associated with rank-order technique. Firstly, ranking procedure forces the
examinee to indicate differences where none may actually exist, i.e. equally
attractive values are forced into separate rankings. In addition, wide gaps in
preference are treated as no different from very small gaps. Secondly, many
subjects verbalize Millers admonition [1956] that most people cannot adequately evaluate more than a few items (7 plus or minus 2) at a time. Finally, the ranking instructions bias the rankings in favor of deprived values
and against satiated values. For instance, the lowest income examinees rate
comfortable life relatively high, while wealthy examinees rate them quite low
[Clawson and Vinson 1978].
To avoid some of these problems, the vast majority of the empirical research employed the direct rating of each value. On rating scale, values are
rated by the examinee independently typically on a Likert scale or some
variant thereof25. While this offers a number of methodological advantages,
it doesnt address the issue of clusters of values that are consonant with one
another or those which are mutually exclusive.
25
Rating scales are also important instruments used not only to measure people values, but
they also can be adapted to measure a variety of stimuli including products, services, places,
institutions, advertisements, etc. The rating scale is only useful if it provides reliable and valid
measurements. There is a considerable amount of research dealing with the problems of constructing questions and rating scales that are reasonably objective and relatively free of measurement errors [Payne 1951; Sudman and Bradburn 1974, 1982; Schuman and Presser 1981;
Churchill and Peter 1984; Biemer et al. 1991; Tanur 1992; Friedman and Amoo 1999].

134

Selected issues on scales classification and scaling

Schwartz [1994] offered justification for the conceptual superiority of rating over ranking. In his opinion, rating has more useful statistical properties,
because it allows researchers to use longer lists of values and to note negative
values (important in cross-cultural work), and it does not force examinees to
discriminate among equally important values. Schwartz also argued that rating may be more accurate phenomenologically than ranking is in capturing
how values enter into situations of behavioral choice. People do not necessarily rank one value over another in action. Different values may be equally
compelling. It is an empirical possibility that people may be only vaguely
aware of contradictions between values, something that the forced-choice
approach of ranking fails to capture. Besides, rating approach obtains greater
variance if examinees are asked to first pick their most- and least-important
values from the list before rating the items [McCarty and Shrum 2000]. It is
also suggested that rating offers more validity. Examinees who were forced
to rank values made trivial and less valid distinctions between values, which
led to smaller empirical relationships with related attitudes.
Rokeachs ranking scale (which requires of examinee to place items in an
order of preference) is a lengthy process when compared to asking for individual ratings. Rokeach himself admitted that many examinees reported the
ranking task to be a very difficult one. At first such examinees have little confidence in having completed in reliable manner and at second, they are often
sure they had completed more or less randomly. Moreover, Rokeach ranking scale gives the results in dataset that cannot be easily analyzed with advanced statistical methods because of the interdependence of the ranks. For
instance, if the rankings of seventeen values in the RVS have been indicated
by the examinee, the value of the final value (eighteenth) is predetermined.
This characteristic of the data (sometimes referred to as being ipsative)
requires researchers to make complex adjustments in order to perform statistical analyses to obtain final scores. Besides ranking data (based on the
form of the ipsative scale) cause serious problems as in factorial/covariance
structure models26.
26

One way to overcome such problems is the solution proposed by Bentler and Chan [1993,
1996]. For instance, when consumers would be asked to rank 8 values from one (the least preferred) to eight (most preferred), and researchers interest would be to know the underlying
factors affecting consumers values preferences. If ties were not allowed, then the rankings
from each individual would constitute a vector of ordinals with ipsative property because the
sum of these rankings would be 36 (1 + 2+ + 8). As Bentler and Chan [1998] explained, we
can distinguish three kinds of ipsative data that are commonly encountered in social research.
They are, additive ipsative data (AID), multiplicative ipsative data (MID), and ordinal ipsative

Rating-ranking scales controversy in values measurement

135

In contrast, rating scale (e.g. LOV scale) is advocated for its simplicity in
execution and statistical analysis, as the examinee can quickly respond to
each item without having too much effort to compare the value with others
on the list. However, rating has also been subject to considerable criticism.
Researchers have found that rating scales allow low and with no motivation examinees to give largely uniform responses (non-differentiation)
and thereby understate the differences among values. Rating often yields
statistical ties between values, which may be the result of indifference to the
question, rather than a true equivalence of importance. Because the ranking
procedure forces the examinee to give each value a different assignment, it
prevents non-differentiation, irrespective of motivation level. However, it is
also possible that ranking procedures force individuals to overstate the differences between values and make comparisons of values that the examinee
considers as non-comparable or potentially make random responses to meet
the requirements of the survey.
The fact that values are desirable may lead people to produce very little variation among their ratings on the list of items. Krosnick and Alwin
[1988] found the evidence that, after removing non-differentiating examinees from the sample, the results of analyses of ranking data resembled
those of rating data. Still ranking, by necessity, forces negative correlations
between values, whereas rating encourages positive correlations. Krosnick
and Alwin [1985; 1988] also concluded that their work can be interpreted
as support for ranking, but such approach potentially produces an artificial
contrast between values. The fact is, that examinees using rating scale often
cluster their responses within a narrow subset of the range of options, but
no such underdispersion is possible in a ranking scale where the examinee
must use the entire range of the scale. Also rating scale cannot ensure that
the scale is used consistently, either between or within examinees. It is possible that one examinee considers a score of 70 (out of 100) to be a high level
of importance, while another examinee considers 70 to be a moderate level
of importance.
Since it is clear that both scales (ranking or rating) for measuring human values have advantages and disadvantages, one should have asked further question what each scale assumes about the subject of analysis itself,
data (OID). AID refers to the centered scores that are obtained by subtracting an individuals
average (over the attributes) from his/her raw scores. MID refers to the proportional data in
which variables are measured in relative proportions with respect to their sum [Bentler and
Chan 1998]. OID refers to the ordering of k different values in ranks, hence the values ranking study would be a typical case of OID.

136

Selected issues on scales classification and scaling

the values. In other words, what is the value and how is it organized? By
looking more closely at both scales, one can infer that values are the expressions of the characteristics of the overall value system that is assumed to
exist in aperson. Therefore, the question of ratings or rankings is not only
of method, but of the nature of values and their organization in the mind of
humans. In an ipsative system27, such as the Rokeachs value survey or any
other ranking, values are arranged in a zero-sum structure by definition. If
the ranking of one value increases by one rank, another value must decline
by one rank. For Rokeach and other advocates of the ranking method, values
represent mutually exclusive choices. In situations that call multiple values
into possible action, one value must be prioritized over the other. This means
that as values structure changes over time or differs across groups, the higher
importance of one value must come at the expense of the importance of another value. In contrast, a rating scale does not require changes in the importance of some values to be compensated for, by changes in other values. Since
the examinee has the ability to give each value any rating without regard to
the ratings given to other, the ratings of the values have no restrictions. The
value that is assumed in a rating approach is one in which the importance of
27

Rating scales might be ipsative too. However, in most situations there are two response
styles that operate in many surveys. They are based on:
yea-saying/nay-saying, the general tendency to agree or disagree with statements and
questions independent of specific item content [Couch and Keninston 1960],
standard deviation [Hui and Triandis 1985; Wyer Jr. 1969], the tendency to use a wide
or narrow range of response intervals about the individuals mean response.
The examinees level of yea-saying is usually defined as the mean of his/her responses across
many rating scale items, whereas standard deviation is defined as the standard deviation of
those responses. Standard deviation is sometimes compared with extreme response style, the
tendency to mark extreme scale intervals. Though the two are typically highly correlated, they
are not identical.
Most work on these two above response styles (yea-saying and standard deviation) refer to:
stability and reliability issues across different sets of items and across time [Wart and Coffman
1970; Merrens 1970, p. 802; Bachman and OMalley 1984; Hui and Triandis 1985], the effect of
scale design on response styles [Albaum and Murphy 1988], relationships between response styles
and demographic or personality characteristics [Hamilton 1968; Das and Dutta 1969; Iwawaki
and Zax 1969; Wyer Jr. 1969; Jones and Rorer 1973; Crandall 1982; Bachman and OMalley
1984] and contextual influences that may moderate response styles [Wyer Jr. 1969; Norman
1969; Biggs and Das 1973; Kiesler and Sproull 1986]. Moods can also affect response styles.
For example, people in a good mood may have a more favorable outlook on life than those in
a bad mood [Schwarz and Clore 1983]. There was an important study that answered to question whether the effect of yea-saying and standard deviation on peoples responses to attitude
rating scales conveys information on their attitudes or is an artifact of the scale response process that adds predictable and systematic bias to rating scale scores.

Rating-ranking scales controversy in values measurement

137

values are independent. There is no limit on the total amount of importance


that is distributed among the values. Hence when researchers are interested
in measuring values, they often use a rating rather than a ranking because it
yields data that are amenable to many statistical analyses. However, because
examinees values are inherently positive constructs, examinees often exhibit
little differentiation among the values and end-pile their ratings toward the
positive end of the scale. Such lack of differentiation may potentially affect
the statistical properties of the values and the ability to detect relationships
with other variables.
In order to show, what differences may appear between rating and ranking
scale, an example of values containing only four types: accomplishment, happiness, pleasure and salvation has been illustrated in Figure 11 [Tarka 2008].
Each of them can be associated with a specific amount of importance that
is later scaled from 0 (no importance) to 100 (maximum importance). Now
lets consider two examinees with the following distributions of importance.
1 examinee: accomplishment 90, happiness 80, pleasure 70, salvation 40
2 examinee: accomplishment 70, happiness 60, pleasure 50, salvation 30

Figure 11. Example of answers by importance according to two examinees:


rating and ranking scale

In the first view, ranking survey would show that these two examinees
have the same values structure. Each examinee would report that accomplishment is most important, followed by happiness, pleasure, and salvation
respectively. However, a rating survey would show that there are substantial differences between examinee 1 and examinee 2 in their values judgment. The first, examinee 1 placed more importance on these values overall
than examinee 2. The second, examinee 1, placed more importance on any
specific value than examinee 2, even though they ranked the values identically. Therefore, depending on a question being asked and type of scale being
used in regard to these values, conclusions about the values may differ significantly. If question is about which values will be acted upon in situations
where choices must be made, the ranking and rating scales will both point
in the same direction. However, if the questions of interest are about the
relative importance of a value for examinee 1 as compared to examinee 2,
the method will lead to different conclusions where ranking will indicate
no difference whereas rating will show a difference. Specifically, we would
conclude that while happiness has the same importance for each individual
relative to the other values in the structure, the amount of importance that

138

Selected issues on scales classification and scaling

examinee 1 assigns to happiness is greater than the value assigned to it by


examinee 2. Neither a ranking nor rating measure by itself would reveal this
pattern. Both conclusions are equally valid and therefore, the most correct
answer about their values is one that reveals both facts.
A second advantage in this approach is shown when one introduces
change in the value of examinee 1. Lets assume that over time or in response
to some stimulus, the importance of accomplishment for examinee 1 declines
from 90 to 87, while the importance of happiness increases from 80 to 83. In
ranking approach we might conclude that there has been no change in the
value of examinee 1 over time. This is true, as we would expect that in circumstances where accomplishment and happiness were to come into conflict,
examinee 1 would choose accomplishment at both times of measurement.
However, the rating system also tells us that the importance of accomplishment has declined for examinee no. 1 over time, while the importance of
happiness has increased. If these observations were part of a study to determine whether examinee 1 had been affected by some external stimulus, such
as buying a new car or clothes, then ranking and rating approaches would
lead us to two completely different conclusions. The best conclusion uses
both pieces of information: the stimulus affected a change, but it was not
large enough to change the ordering of the values of interest.
The above example enables us to make a few statements. A comparison
of ranking and rating scales and methods of evaluation in human values
suggests that there are no perfect instruments. It is for values, being hidden
in the examinees mind, which deeply take hold of subjective and mental
expressions. Values are for people obscure and hard to express. Many people may not even know, what their values are [Hechter, Nadel and Michod
1993], although psychologists [Rohan 2000; Rokeach 1973; Schwartz and
Boehnke 2004] included so-called conscious representation of needs as part
of their definition of the concept [Waters 1990]. Also for many researchers values are inscrutable or impenetrable. Telling the truth this is a subject
to acceptance and we all must leave with it. However, the truth is also that
ranking and rating scales, can be a valuable measurement instruments provided they are applied separately. After all, each scale and method has its
own unique way of values measurement [Tarka 2010a].

IV. PRINCIPLES OF ITEMS


AND SCALE DEVELOPMENT

Single-item vs. multi-item scales in measurement


ofconstructs
Churchill [1979, p.66] argued that: researchers are much better served with
multi-item than single-item scales when they measure complex constructs.
In making that recommendation, Churchill followed the tradition of psychometrics [Guilford 1950; Nunnally 1978]1. Since the Churchills article,
academics began to use increasingly multi-item instruments to measure
every construct. Unlike, in the new approach, that is based on C-OAR-SE
procedure2, Rossiter [2002] proposed that if the object can be conceptualized as concrete and singular, it does not need multi-item form to represent
it in the measure, and if the attribute can be conceptualized as concrete, it
does not require multi-item either [Bergkvist and Rossiter 2007].
So, the question is now, for what reasons multi-item measures are better than those which are based on single-item ones? One argument can be
1
Churchills article [1979], as well as Peters [1979] article on multi-item reliability, has
influenced the measurement of marketing constructs to such an extent that it is virtually impossible to get ajournal article accepted in marketing unless it includes multi-item measures
of the main constructs. The use of multi-item is also encouraged by the growing popularity
of Structural Equation Modeling SEM, aclass of statistical techniques for which multi-item
forms are the norm no matter what type of construct is being measured [see e.g. Baumgartner
and Homburg 1996].
2
C-OAR-SE is an acrostic acronym for the six aspects of the theory: 1) construct definition, 2) object representation, 3) attribute classification, 4) rater-entity identification, 5)selection of item-type and answer scale, 6) enumeration and scoring rule (see the final section of
this chapter).

140

Principles of items and scale development

sourced from reliability, that is, multi-item measures are inherently more reliable, because they enable computation of correlations between items. If the
correlations are positive and produce ahigh average correlation (i.e. ahigh
coefficient alpha), they indicate the internal consistency of all the items in
representing the presumed construct. This reliability argument needs to be
more clarified.
In the first row, alpha should never be used without establishing the unidimensionality of the scale [Cortina 1993], which can be investigated by factor
analysis or more safely by Revelles [1979] coefficient beta, which is agood test
of unidimensionality. Given the unidimensionality, alpha is actually an indicator of the reliability of the set of items measuring acertain type of construct,
specifically, an eliciting one of which the main exemplars are personality
traits, personal values. There is no doubt, that alpha reliability will be not
relevant for concrete attributes [Rossiter 2002], such as for example: ad or
brand and formed attributes such as social class (acomposite attribute that
sums demographic prestige ratings). If the attribute of the construct is concrete, alpha reliability is not arelevant criterion for evaluating the measure3.
Next argument in favor of multi-item measures is that they capture
more information than a single-item measure. This argument appears in
two forms. One explains that amulti-item measure is more likely to tap
all facets of the construct of interest [Baumgartner and Homburg 1996,
p. 143]. The presence of facets, or components means that the construct
cannot be classified as aconcrete attribute of aconcrete singular object. The
other form of the more-information argument stems from the notion that
multiitem scales offer more response categories than the single-item measure. It is important to emphasize here that it is not the multi-item instruments that are important but rather the number of categories, or length,
of the response scale. Multi-item measures provide apotentially more discriminating response scale than one item [Bergkvist and Rossiter 2007].
In literature, there are also arguments for using asingle-item measure instead of the multi-item measures. First theoretical (versus empirical) argument
has been proposed by Rossiter [2002], who argued that asingle-item measure
is sufficient, if construct is such that in the minds of raters (e.g. examinees in
asurvey): 1) the object of the construct is concrete singular meaning that it
3

Gorsuch and McFarland [1972] pointed out that an unreliable measure cannot form
arelationship that yields high predictive validity, and therefore, a single-item measure that is
equally predictively valid as a multi-item measure must be regarded as sufficiently reliable to
replace that measure [Clark and Watson 1995; DeVellis 2003; Haynes, Richard and Kubany
1995].

Process of scale development in aview of classical test theory

141

consists of one object that is easily and uniformly imagined and 2) the attribute of the construct is concrete, again meaning that it is easily and uniformly
imagined. In both cases, we need to take into account acriterion from Wittgensteins [1961] picture theory of language4.
On the other side, the empirically based argument for the favorable use of
asingle item can be made for measures in which the multi-items (representing the attribute in the answer part of the item) are synonyms, or intended
synonyms (more precisely, synonymous adjectives). An extreme example
is the Zaichkowskys [1985] well-known measure of personal involvement,
which as aconstruct refers to personal involvement with some object, such
as aproduct category or an advertisement5.

Process of scale development in aview of classical test


theory
Process of scale development differs from author to author [Churchill 1979;
Clark and Watson 1995; DeVellis 2003, Spector 1992]. Typically, scales which
contain multi-items involve the following stages, where6:
scale cannot be developed until it is clear what kind of theoretical construct will be measured,
scale is designed according to selection of appropriate response choices and writing instructions for examinees,
initial version of scale is pilot-tested on asmall number of examinees
who are asked to critique the scale by indicating which items are ambiguous or confusing,
4

According to the expert judgment based on the C-OAR-SE procedure, ad and brand
might be a example of such constructs [Bergkvist and Rossiter 2007].
5
That measure uses 20 bipolar pairs of synonymous adjectives to measure the attribute of
involvement. However, Drolet and Morrison [2001] found that when we increase the number
of synonymous-answer items they may produce a frequent problem. Specifically, the larger the
number of synonymous items the researcher attempts to generate, the greater is the chance of
including items that are not proper synonyms of the original attribute descriptor. Moreover,
the non-synonyms are unlikely to be detected. Drolet and Morrison have found that examinees were more likely to respond in the same way to an unequivalent (non-synonyms and,
therefore, not content-valid) item as to the other items in a scale when the number of items
was increased.
6
In Netemeyer, Bearden and Sharma work [2003] these steps and procedures were based
on scaling self-report paper-and-pencil measures of latent social-psychological constructs.

142

Principles of items and scale development

a full administration and item analysis is conducted on the sample


size, e.g.100 to 200 examinees.
scale is validated and normed7 [Netemeyer, Bearden and Sharma 2003
or Malhotra 2009].
This process was presented, but in its broader context, on Figure 12.
The importance of clear construct definition, content domain, and the role of
theory
The focus on reflective items/indicators vs. formative items/indicators
Construct
definition and Construct dimensionality: unidimensional, multidimensional or a higher-order
construct
content domain
issues to consider

Generating
and judging
measurement
items

Designing and

conducting

studies to

develop and
refine the scale

Finalizing
the scale

Theoretical assumptions about items (e.g. domain sampling)


Generating potential items and determining the response format
The focus on content validity in relation to theoretical dimensionality
Item judging by experts focusing on content and face validity

Pilot testing as an item-trimming procedure


The use of several samples from relevant populations for scale development
Designing studies to test psychometric properties
Initial item analyses via exploratory factor analyses (EFA)
Initial item analyses and internal consistency estimates
Initial estimates of validity
Retaining items for the next set of studies

The importance of several samples from relevant populations


Designing the studies to test the various types of validity
Item analyses via EFA:
the importance of EFA consistency,
deriving an initial factor structure dimensionality and theory
Item analyses and confirmatory factor analyses (CFA):
testing the theoretical factor structure and modell specification,
evaluating CFA measurement models,
factor model invariances across studies (e.g. multiple-group analyses)
Additional item analyses via internal consistency estimates
Establishing norms across studies
Applying Generalizability Theory

Figure 12. Steps in scale development


Source: Netemeyer, Bearden and Sharma 2003, p.15.
7
The norm describes the distributional characteristics of a given population. Individual
scores on the scale are then interpreted in relation to the distribution of the scores in the population.

Process of scale development in aview of classical test theory

143

The conceptual task involving the defining the construct is the most vital
step in the development of scale. When aconstruct is not defined well, there
is considerable risk that scale will have poor reliability and validity. In consequence, the connection between the theoretical construct and scale will be
unclear.
The conceptual work begins with ageneral definition of the construct and
then moves to specifics of operational definition. As Spector [1992, p.15]
explained: the more clearly delineated the construct, the easy it will be to
write items to measure it. In the delineation of a construct, it is useful to
base the conceptual effort on work that already exists, unless the construct
is completely new. There might be existing scales available to assess it, and if
scales exist to measure the construct of interest, the content of the existing
scales may help in the scale development. The items from several scales can
be then used as astarting point in writing an initial item pool. These would
be modified and more items added to create the item pool from which the
final scale would be developed.
The literature should serve as astarting point for better construct definition. If the construct is popular, there are greater chances to obtain different
definitions of construct and if various theories exist, they should be discussed
in the context of other broader theories. In general, the importance of theory
in development of scales measuring some theoretical constructs plays ahuge
role [Cronbach and Meehl 1955; Loevinger 1957]. Even narrowly abstracted
constructs should be grounded in atheoretical framework.
In the next phase, on the basis of theory we define the operational indicators (observed variables, items). Methodological trends in social sciences underline the importance of identification of the correspondence rules in theory
verification. On their basis, one can specify the range of hypothetical more
detailed empirical indicators in reference to theoretical concepts and the relationship between them. System of indicators defines the measurement operations in the field of empirical research. The adoption of theoretical construct
allows to obtain the measures through operational transformation of such
construct into variables. If construct has some value empirically, it must be
expressed through the observable measures. For example, construct such as
consumption may be expressed by the empirical evidence of alarge number of
shopping, adeprivation of criticism and rationality in the selection of goods. In
short, we assume that there exists: 1) theoretical concept which is operationalized by aset of indicators selected from the item pool, and 2) amodel that
reflects all possible and logically combinations of these indicators. As Hornowska [1989, 2000] argues, operationalization procedure implies the pres-

144

Principles of items and scale development

ence of on the ontological level theory and theoretical construct, which


have possible indirect relationship with other factors in reality.
A procedure of operationalization is presented on Figure 13. From the
theoretical concept to the items/indicators it is important to select only those
items that are representative and significant for theory under study. Selected
items should cover in-depth concept being tested. Their selection may be
conducted on the basis of the exploratory research [Abell 1971]. Then (with
regard to another stage, i.e. the transition from items/indicators to their measures) we perform their measurement.
Next stage determines anature and dimensionality of the measured construct as well as scale on the basis of collected scores from the indicators/
items. If there is k (i.e. many) items, this space will be likely n-dimensional.
Atheoretical construct can be differentiated according to its dimensionality,
which may vary from being highly specific and narrowly defined (unidimensional) to more complex (multidimensional). Constructs that are fairly homogenous are typically unidimensional, and those which have broad facets
to be covered will be treated as multidimensional.
Theoretical
construct/
concept

Items/indicators

Measures of
items/indicators

Formative level of measurement


Index
Reflective level of measurement

Space of
items/indicators

Scale

Figure 13. Stages of operational procedure


Source: own construction based on Abell 1971.

According to Rszkiewicz [2011] ascale which uses aset of items in measuring agiven theoretical construct and which is composed of abattery of unidimensional, ordinal Likert items is typically based on summated ratings.
This type of scale is appropriate, if the analysis confirms the unidimensional
structure of items. However, when we transform a set of items to diagnose
more than just one facet of the measured construct, then we approach to factors. In the latter, there are used techniques of multivariate statistical analysis
as exploratory factor analysis, or confirmatory factor analysis8, in which several
8

In-depth comparison is presented in chapters 6 and 7.

Formative and reflective indicators measuring theoretical construct

145

multiple factors (and relations among the factors) can be specified and evaluated to assess dimensionality (e.g. fit indices, presence of correlated measurement errors, and degree of cross-loading [Gerbing and Anderson 1984; Floyd
and Widaman 1995; H
attie 1985; Kumar and Dillon 1987]. In exploratory or
confirmatory factor analysis, we divide one complex (multidimensional) scale
into subscales. As Spector claims, the ultimate answer about how finely to divide such ascale must be based on both theoretical and empirical utility. If
subdividing ascale adds significantly to the exploratory power of atheory, and
if it can be supported empirically, then subdividing is recommended. On the
other hand, if the theory becomes overly complex and unwieldy, or empirical
support cannot be found, then subdividing should not be done [Spector 1992].

Formative and reflective indicators measuring theoretical


construct
In the course of scale construction we may encounter the problems pertaining to the relationships between theoretical construct and observed variables. These relationships can be described in two different ways. Most of the
existing literature guidelines [e.g. Spector 1992; DeVellis 2003] focus almost
exclusively on scale development, whereby items (i.e. observed variables)
composing ascale are perceived as reflective (effect) indicators of an underlying construct (Figure 14). An alternative measurement perspective is
based on formative (cause, causal) indicators (Figure 15) and involves the
creation of an index rather than ascale [Bollen and Lennox 1991; Diamantopoulos and Winklhofer 2001]9.
Indicator 1
Construct
Indicator 2

Figure 14. Effect model with reflective indicators


9

In traditional psychometric criteria (e.g. classical test theory) a common practice of the
scale development is dominated by reflective indicators.

146

Principles of items and scale development

Indicator 1
Construct
Indicator 2

Figure 15. Causal model with formative indicators

According to Coltman et al. [2008], we can distinguish afew differences between reflective and formative approach to measurement (see Table 17). They
can be framed in context of theoretical considerations, especially: 1)the nature
of the construct; 2) the relationship (otherwise causation)10 between items and
the latent construct; 3) the characteristics of the indicators used to measure
the construct. On the other hand, we have empirical considerations which can
be described by: 4) item intercorrelation; 5) item relationships with construct
antecedents and consequences, and 6) measurement error and collinearity.
The issue of the appropriate choice of formative or reflective measurement
models was broadly discussed by Wilcox, Howell and Breivik [2008] who
claimed that in social sciences (e.g. in marketing research practice), many
constructs which are currently operationalized by means of reflective indicators would be better captured if approached from aformative perspective11.
According to Podsakoff et al. [2003], some constructs are fundamentally
formative in its nature and should not be modeled reflectively. Likewise and
Rossiter [2002] suggested adifferent paradigm for measure development and
cites multiple examples of constructs inappropriately measured as reflective12.
10

Causation makes up in fact aprinciple by which cause and effect are established between two variables. It requires asufficient degree of association between two variables, that
one variable occurs before the other (that one variable is clearly outcome of the other), and
that no other reasonable causes for the outcome are present.
11
In most social sciences studies, development of the measurement scale is based on reflective indicators. For example, attitudes scales such as Likert, Guttman and Thurstone are
being developed in this way.
12
Rossiters C-OAR-SE model [Rossiter 2002] assumed judge or rater input to determine
the nature of the construct. If constructs are inherently either formative or reflective, the researcher would be obliged to measure them accordingly. For example, Heise [1972, p.153]
suggested, that SES would be aconstruct induced from observable variations such as income,
education and occupational prestige, and so on. On the other hand, Kluegel, Singleton and Starne
[1977] developed subjective SES measures that function acceptably as reflective indicators.

Formative and reflective indicators measuring theoretical construct

147

Another aspect pertains to the issue, whether the alone observed variables (either of the reflective or formative nature) can assist us in decision of
which to use for the measured construct. While in some cases determining
the direction of causation between measures and their constructs appears to
be easy [Diamantopoulos and Winklhofer 2001; Jarvis, Mackenzie and Podsakoff 2003; Podsakoff et al. 2003] many instances exist in which apotential
indeterminacy from an examination of the items alone may occur, and the
larger research context must be considered.
Edwards and Bagozzi [2000] suggested several criteria derived from the
literature on causation that might be employed in this regard, including association, temporal precedence, and the elimination of rival causal explanations.
They showed that asimple formative/reflective categorization may be overly
simplistic. Also Bollen and Ting [2000] suggested that a simple examination of aset of items along with amental experiment may be insufficient
to make the determination. The issue might be further complicated by the
likelihood that items of psychological constructs are mixture of effect and
causal items [Bollen and Ting 2000]13.
If the constructs are not inherently reflective or formative and the items
themselves do not always provide guidance as to which model to choose,
can the relationships among the items provide insight? MacKenzie, Podsakoff and Jarvis [2005] suggested, that items in reflective measurement model
should be highly correlated, and items in formative measurement model
should be uncorrelated. As Jarvis, Mackenzie and Podsakoff [2003] argued,
for formative measures, covariation among the items is not necessary. However, does it mean that formative indicators are not correlated at all and does
it mean that correlation is not relevant? Well, the answer is no. According
to Jarvis, MacKenzie and Podsakoff [2003], the source of the covariation
cannot come from the latent variable being formatively measured. Thus, to
the extent that formative variables are correlated, the correlation must come
from somewhere else14.
13
Bollen and Ting [2000, p.4] noted, that establishing the causal priority between alatent
variable and its indicators might be difficult. These authors offered also an empirical tool for
determining whether the covariance structure among aset of items is more consistent with
aformative or reflective measurement model based on vanishing tetrad analysis.
14
Correlation among items is not only of little value in determining the appropriateness of
formative or reflective measurement models, such correlation is problematic in its own right
with regard to unwanted meaning. For example, unrecognized correlation among items may
lead to lack of stability in regression coefficients. At worst, the items may carry meaning from
some unintended and undesirable construct. Thus, the statement that items in aformative
measure need not correlate can lead to an unwarranted confidence in the quality of the measure [Wilcox, Howell and Breivik 2008].

[148]

Formative approach to measurement

Items define the construct:


Items need not share acommon theme
Items are not interchangeable
Adding or dropping an item may change the
conceptual domain of the construct

Causuality from items to construct


Variation in the construct does not cause
variation in item measures
Variation in item measures causes variation
in the construct

Theoretical considerations
Latent construct exists
Latent construct is formed
Latent constructs exist independent of the
Latent construct is acombination of its
measure used
indicators

Reflective approach to measurement

Relationship
Causuality (or otherwise reflection) from
between items
construct to items
andlatent construct Variation in the construct causes variation
in item measures
Variation in item measures does not cause
variation in the construct
Characteristics
Items are manifested by the construct
of items used to
Items share acommon theme
measure construct Items are interchangeable
Adding or dropping an item does not
change the conceptual domain of the construct

Nature of construct

Considerations

Table 17. A framework for assessing reflective and formative measurement scales

Borsboom, Mellenbergh and van


Heerden [2003,
2004]
Bollen and Lennox
[1991];
Edwards and
Bagozzi [2000];
Rossiter [2002];
Jarvis et al.[2003]
Rossiter [2002];
Jarvis et al. [2003]

Relevant literature

[149]

Source: Coltman et al. 2008, p.1252.

Measurement error
and collinearity

Item relationships
with construct
antecedents and
consequences

Item
intercorrelation

Empirical considerations
Items should have high positive intercorrela- Items can have any pattern of intercorrelations
tion but should possess the same directional
Empirical tests: assessing internal consistrelationship
ency and reliability by Cronbach alpha,
Empirical test: no empirical assessment of
average variance extracted, and factor loadindicator reliability possible, various preings (e.g. from common or confirmatory
liminary analyses are useful to check direcfactor analysis)
tionality between items and construct
Items have similar sign and significance of
Empirical tests: assessing nomological validity
relationships with the antecedents/conseby using aMIMIC model, and/or structural
quences as the construct
linkage with another criterion variable
Empirical tests: establishing content validIdentifying the error term is not possible if
ity by theoretical considerations, assessing
the formative measurement model is esticonvergent and discriminant validity emmated in isolation
pirically
Identifying the error term in items is possible Identifying the error term is not possible if the
Empirical test: identifying and extracting
formative measurement model is estimated in
measurement error by common factor
isolation
analysis
Empirical test: using the vanishing tetrad
test to determine if the formative items
behave as predicted
Collinearity should be ruled out by standard
diagnostics such as the condition index

Bollen and Lennox


[1991];
Diamantopoulos
and Winklhofer
[2001]; Diamantopoulos and Siguaw
[2006]
Bollen and Ting
[2000];
Diamantopoulos
[2006]

Cronbach [1951];
Nunnally and
Bernstein [1994];
Churchill [1979];
Diamantopoulos
and Siguaw [2006]

150

Principles of items and scale development

In case of the reflective approach to measurement, Bollen [1984] demonstrated analytically that true reflective items must be strongly correlated
with one another. However, considering measurement models which appear
to be reflective (i.e. the arrows go from the construct to the items or in Bollens terms, appear to be effect indicators) Borsboom, Mellenbergh and van
Heerden [2003] suggested that the items used to measure the construct may
be alternate manifestations of the construct. Thus, in at least some exceptional cases, items which appear to be caused by alatent trait do not correlate, suggesting simultaneously that inter-item correlation is not auseful
criterion for distinguishing formative and reflective measures.
Scales on reflective indicators are mainly used to measure subjective area
of examinees and their internal reactions to stimuli (e.g. attitudes) [Sagan
2003]. If observed variables are assumed to reflect a scale, its presence is
inferred from the pattern of covariation among items. On the other hand, if
item, rather than reflecting the latent variable, cause it, then they are termed
as formative items [Bollen 1984]. Such items are measured without errors,
and they are formed objectively, because the scores obtained do not solely
depend on the evaluation given by human. Measurement instrument, that
is used on the basis of formative items is called index and latent variable is
measured as the index through aweighted sum or unweighted sum of observed variables.

Items development for the measured construct


andrespective scale
Items identification
As ageneral rule, the process by which construct can be translated into aspecific set of items forming ascale has remained in many instances informal.
In literature, there are numerous suggestions for writing good items, and as
Wesman [1971, p. 86] argued, none of the studies is definitive and that item
writing continues still to be an art. However, the most important guidelines
for items format development are comprised in the following hints: soundness of values, precision of language, imagination, knowledge of subject matter,
and familiarity with examinees.
Cronbach [1970], who has based on attitudes, explained that scale
constructor should typically conceptualize one or more types of attitudes

Items development for the measured construct andrespective scale

151

which are believed to manifest the construct and formed scale. After that,
researcher must simply try to think up items that require these attitudes
to be demonstrated. Unfortunately this approach could result in omission
of important areas of attitude or inclusion of areas that are relevant to the
construct only in the mind of the scale developer. In consequence, this
may result in a highly subjective and idiosyncratic definition of the construct. So if we want truly broaden, refine or verify the view of the construct to be measured, we need to engage in one or more of the following
activities:
content analysis with this method, open-ended questions are posed
to examinees about the construct of interest, and their responses are
sorted into topical categories; those topics which occur predominantly
are taken as major components of the construct,
review of research those attitudes that have been most frequently
studied by others are used to define the construct of interest; researcher may use an eclectic approach or select the work of one particular
theorist in specifying categories to be represented by scale items,
critical incidents where alist of attitudes is identified that characterizes extremes of the performance continuum for the construct of
interest,
direct observations researcher identifies the attitudes by direct observation,
expert judgment the scale constructor obtains input from one or
more individuals who have first-hand experience with the construct;
written questionnaires or personal interviews are used to collect information,
instruction objectives experts in asubject are asked to review instructional materials and develop a set of instructional objectives
when the scale is being developed.

Items construction
Lindquist [1936] characterized the process of scale development in the context of two major decisions, i.e. what to measure and how to measure it.
During item construction, the latter type of decision is the most important.
By custom, developing a pool of items to measure a construct, and thus,
forming ascale entails the following activities:
selecting an appropriate item format and verifying that the proposed
format is feasible for the examinees,

152

Principles of items and scale development

selecting and training the item writers, as well as writing the items,
monitoring the progress of the item writers and the quality of obtained
items.
As Crocker and Algina [2008, p.134] argued: if item writers will work
from item specifications, the structure and format of the items may already
be determined. If item specifications are not being used, it is still important
for decisions about item format to be made at the outset of the item-construction phase, rather than to be left to the idiosyncratic tastes of individual writers. And in deciding on acommon format, researcher may wish to
review similar instruments in the field and study reports of their development. The opinions of experts may also be helpful in deciding such matters
as whether the examinees are sufficiently literate to take group-administered
pencil-and-paper tests. For example, whether examinees can distinguish
among five points on an agree-disagree continuum.
In appropriate selection of item format, researcher should browse standard sources on item writing to collect suggestions on writing items of this
type. After that, alist of guidelines should be prepared and distributed to the
item writers, particularly if non-professional writers are to be employed. For
optimal performance tests, awide variety of item formats may be considered. Popham [1981] has divided these formats into two major categories:
1) those that require the examinee to generate the response (e.g. essay or
short-answer open-ended questions), and 2) those that provide one, two or
more possible responses and require the examinee to make aselection. Because
the latter can be scored with little subjectivity, they are often called objective
items. An important point that is common to all objective formats, is that all
responses should appear logically reasonable to an examinee who does not
have the knowledge or skill that the item was designed to the scale which is
being under preparation.
The scale developers are also advised to give careful thought when selecting an item format, to have been appropriate to the needs of the examinees
and to avoid novel or untried formats without having asound rationale for
their use. Simultaneously, some authors have even called for moving beyond
the use of the highly popular multiple-choice format, although the appropriate direction to take, has generated strongly contrasting points of view. Such
abeing case, on the one hand, Ebel [1982] advocated broader use of more
highly structured alternate-choice items. His point was that in many cases
the domain of knowledge that is to be sampled could actually be expressed
as aseries of precisely stated functional relationships or principles and that

Items development for the measured construct andrespective scale

153

each of these can form the basis of ahighly structured item15. On the other
hand, Frederiksen [1981, p.19] suggested that in the pursuit of greater scoring efficiency, scale developers may rely too heavily on highly structured formats and that this interest in what can be easily measured may overshadow
consideration about what should be measured.

Dichotomous and Thurstone item formats


In this section two the most popular item formats are presented along with
their response system, such as dichotomous agree-disagree and Thurstone
format (with the exception of Likert scale, which was already discussed in
chapter 3). Former option typically consists of a simple declarative statement followed by two response options. The simplest procedure for scoring
such items is to decide which end of the, e.g. attitudinal continuum should
be associated with high scores. For example, for three items in Table 18, the
underlying attitude could be described as acontinuum between authoritarian and permissive views of some kind of control. Researcher has the option of specifying whether examinees who hold more authoritarian values
should receive higher scores on the instrument or whether examinees who
hold more permissive values should receive higher scores. If one wishes high
scores to reflect more permissive attitudes, then a particular item is identified as being positively worded or negatively worded with respect to the
measured construct. In our example, because the continuum is characterTable 18. The dichotomous agree-disagree format for declarative statements
No.
List of statements
1 People should obey general social rules without any question
2 People need today astronger discipline in their life
3 People should rebel against too much control from side of the
government

Response format
agree
disagree
agree
disagree
agree
disagree

Source: own construction based on Crocker and Algina 2008.


15

Although it would appear that each item can measure only alimited piece of information, Ebel [1982] argued that:
a large number of such items can be asked in arelatively short amount of time, thus permitting more thorough sampling of the content domain.
performance on such items is less subject to the influence of extraneous factors, such as
the higher level of reading ability that may be required by more complex formats,
such items have greater conceptual clarity for examinees about the nature of what is being asked.

154

Principles of items and scale development

ized in terms of permissiveness, items 1 and 2 would be considered negatively


and item 3 would be considered positively. Items would be scored by awarding one point to the examinee for each agree response given to apositively
worded item and one point for each disagree response given to anegatively
worded item. The examinees total score will be the total of the item scores.
For the present example, examinee who would have marked disagree, disagree, and agree to items 1, 2, and 3, respectively, would receive item scores of
1, 1 and 1, for atotal of three points [Crocker and Algina 2008].
An alternative scoring approach for the agree-disagree format involves
items weighting, where each item is assigned a weighted value associated
with the perceived strength of sentiment expressed toward the construct of
interest. Although avariety of scaling procedures may be used to arrive at
the item weights, the most popular is the equal-appearing intervals procedure
proposed by Thurstone [1928], where researcher produces alarge number
of statements (Thurstone suggested as many as 100), which range from extremely positive to extremely negative with respect to the construct. Some of
these statements should be neutral in affect.
In Thurstones approach each statement may be written on a separate
card, and the collection of statements is presented to asample of examinees
for rating. Then, each examinee is instructed to read each statement and
place it along acontinuum divided into 7, 9, or 11 intervals equal in width.
The positive and negative end of the continuum is identified in advance.
Those statements that are most negatively worded are placed in categories
1 and 2, and those most positively worded are placed in categories 10 and
11. On an 11-interval continuum, the most neutral statements are placed in
category 6.
The collected data resulting from this judgment task can be used for two
purposes: 1) selection of items for the final version of the scale and 2) assignment of weights for scoring the scale. The median of the examinees ratings
is the item weight. Furthermore, the items to appear on the scale are chosen
from the initial large set of items on the basis of astatistic such as Q, the
semi-interquartile range:

Q=

X75 X25
,
2

(4.0)

where X75 is the numeric value to the 75th percentile rank, and X25 is the
numeric value corresponding to the 25th of percentile rank in the distribution of ratings. Smaller values of Q indicate that examinees are in fairly close

Items development for the measured construct andrespective scale

155

agreement about the strength of sentiment expressed by the item. Hence


items with smaller Q are favored in item selection, although the scale constructor may still try to include some items from each category. When the
scale is later administered to agroup of examinees, each time examinee endorses astatement, the weight value for the item is added to the examinees
total score. This total score is then divided by the number of items endorsed
to obtain the average weight of the statements endorsed. The average weight
is used for comparing or describing examinees attitudes.

Items review
As the scale items are drafted, researcher should ask qualified experts to review them informally for accuracy, wording, grammar, ambiguity and other
technical flaws.
Different types of expertise are required on the item review panel. Experts
in the subject matter are best qualified to certify that the items are clearly
stated and correctly keyed. They are also qualified to judge whether items are
appropriate for the scale specifications or item specifications. Some general
expertise in scale construction is important for the reviewers who must certify that items are free from construction flaws. For example, if items are prepared in multi-choice format, the technical expert should look for common
flaws affiliated with this particular format. Naturally, every item on ascale
should be free of grammatical errors, including spelling errors. Particularly,
flaws in punctuation or unwieldy sentence construction may result in misinterpretation.
Also, one or more members of the review panel should have expert familiarity with the population for whom the scale is intended. These reviewers
should consider whether content might be construed as offensive or seemingly biased toward any particular subgroup, perhaps by use of undesirable cultural stereotypes. Content that is unfamiliar to certain subgroups
(when this content is unrelated to the construct or knowledge domain being
measured).
Finally, items review can be carried out either before or after preliminary
pilot tests. The choice of sequence is made on the basis of convenience and
economy. If expert reviewers are readily available and their time is not costly,
items review can be conducted before pilot test so that time in pilot tests will
not be wasted on faulty or biased items. On the other hand, after pilot test
many items will inevitably be revised or reworded. This creates the necessity

156

Principles of items and scale development

for additional items review by the expert panel. Thus, if substantial costs or
effort are involved in assembling the review panel, many scale developers
choose to defer this activity until after preliminary pilot tests and subsequent
revisions. If results of the items review are to be reported as evidence of content validity, it is especially important for the review panel to examine items
in their final form.

Preliminary pilot tests of items


Finally, before we print out the items in their final form for afield study, we
should test the items on asmall sample of examinees. If only alimited number of observations is available, it might be necessary to use as few as 15 to
30 examinees for the preliminary pilot tests. In contrast, items developed for
commercial use may be tested on samples as large as 100 to 200.
Pilot tests are fairly informal, and the scale developer should use this opportunity to observe examinees reactions during testing, noting such behaviors as long pauses, scribbling, or answer-changing, which may indicate
confusion about particular items. After the study session, a debriefing
should take place in which examinees are invited to comment on each item
and offer suggestions for possible improvements.
Examination of descriptive statistics for the response distribution to each
item is also recommended. This will enable the researcher to obtain arough
idea of whether the items seem to be at the appropriate level of difficulty for
the group as awhole and whether there is sufficient variation in the responses to justify proceeding into alarger field test. It is important to recognize
that although the final decisions about which items to retain and which to
eliminate are made on the basis of the large field test, items are often revised
extensively after reviewing the results of preliminary pilot tests.
In conclusion, once aset of items was developed for relevant scale to be
measured, the researcher is often faced with problems of having developed
atoo long list of items. In order to solve this problem one may conduct apilot test. Having based on subsequent statistical analysis, arefinement of the
list of items should be conducted and thus, asmaller subset of items should
be introduced to the final scale. If time or financial resources do not allow
pilot test to be conducted, the researcher may use personal judgment to include only those items that are considered to be the best measures of the
particular scale. But this choice is rather risky and dangerous.

Rossiters C-OAR-SE anew concept for scale development and its criticism

157

Rossiters C-OAR-SE anew concept for scale development


and its criticism
Rossiter [2002] provided an alternative procedure for scale development.
Rossiter argued that researchers should focus only on theoretical considerations and resist the temptation to conduct empirical tests. However, Diamantopoulos [2005] argued that both theoretical and empirical criteria are
necessary to design and validate measurement models. Empirical analyses
provide an important foundation for content validity, especially to detect errors and misspecifications or wrongly conceived theories16.
Rossiter proposed a general procedure for developing scale (called
COAR-SE) which included a six-fold classification of measures, allowing
for both reflective and formative perspectives as well as single-item and
Object (the focal object
being rated)
Concrete
Abstract collective
Abstract formed

Attribute (a dimension
of judgement)
Concrete perceptual
Concrete psychological
Abstract achieved
Abstract dispositional

Rater entity (the person or persons doing the rating)


Expert(s)
Coders
Managers
Consumers
Individual(s)
Legend: The elements of the construct (object, attribute, rater entity) and apreview of the classifications
of each element. In the rating, the object is projected onto the attribute (hence the one-directional arrow).
The rater has to apprehend both the object and the attribute (hence the two arrows from the rater entity).

Figure 16. The OAR structure of measurement


Source: Rossiter 2011, p. 3.
16
Still, according to Diamantopoulos [2005, p. 1], Rossiter brought a fresh air in the marketing literature on measure development, explaining that Rossiter demonstrated, there is life
beyond classical test theory, the domain sampling model and coefficient alpha.

158

Principles of items and scale development

multiitem scales. The OAR in C-OAR-SE signals the central theoretical


idea that aconstruct consists of three elements (see Figure 16):
1) O, the object to be rated,
2) A, the attribute on which it is to be rated,
3) R, the rater entity who does the rating.
In C-OAR-SE, the first step of the scale development involves the conceptual definition of construct [Rossiter 2002, p.309] which: should specify
simultaneously the object, the attribute and the rater entity, because otherwise the conceptual definition of the construct will be inadequate for indicating how the construct should be operationally measured. A problem,
however, with this line of argument is that it goes against the fact that constructs, by their nature, are abstract. The notion of construct definition as
used by C-OAR-SE should benefit from some clarification. Such clarification is important for two reasons. Firstly, the degree of acceptable aggregation during construct definition according to C-OAR-SE is not entirely
clear. Secondly, C-OAR-SE gives the impression that conceptual definitions
can offer operational guidance for measuring aconstruct. This, however, is
the purpose of an operational (rather than aconceptual) definition. It is the
latter [Nachmias and Nachmias 1976, p. 17] that seeks to bridge the gap
between the theoretical conceptual level and the empirical observational
level and assigns meaning to aconstruct by specifying the activities or operations necessary to measure it [Kerlinger 1964, p.28].
The second step in the C-OAR-SE procedure is the classification of the
focal object in one of three categories: concrete singular, abstract collective
and abstract formed. According to Rossiter [2002], different types of objects
require different types of measures. Unfortunately, the basis of the object
classification under C-OAR-SE is not clear. Specifically, if the denotation
of an object is open to multiple interpretations, then its conceptual clarity
would be negatively affected. If, however, differences in interpretation are
at the level of the connotation of the object, then they should be treated as
normal. With aspecific reference to the three-fold classification of objects
under C-OAR-SE, concrete objects are differentiated from abstract objects
in that for the former nearly everyone (of asample) of raters describes the
object identically, whereas for the latter the object suggests different things
to the sample of raters. Do such differences relate to the denotative or connotative meaning of the object(s) under consideration? In the case of abstract
formed objects, it would appear to be the latter since peoples interpretations of the object differ, requiring explicit efforts towards identifying the
main components of the objects meaning. However, in the case of concrete

Rossiters C-OAR-SE anew concept for scale development and its criticism

159

singular and abstract collective objects, the underlying basis of classification


is more ambiguous, in that it may result in the same classification although
adifferent interpretation is assigned to the object by the judges [Diamantopoulos 2005].
Next step in the C-OAR-SE procedure is to classify the attribute of the
construct, that is, the dimension on which the object is being judged [Rossiter 2002, p.313]. Three distinct types of attributes are identified: concrete,
formed, and eliciting.
A potential problem with the above classification is that while formed and
eliciting attributes are clearly defined with explicit reference to the causal relation linking the attribute and the measuring items representing their components, for concrete attributes, the causal relation between attribute and
measure is not considered. However, aconcrete attribute can be considered
to be aspecial case of either aformed or an eliciting attribute (i.e. when an
abstract attribute has only asingle component). Thus, issues of causal priority would appear to be just as relevant.
A second issue relates to the contention that, in case of a concrete attribute, there is no need to use more than a single item to measure it in
ascale [Rossiter 2002, p.313]. Practical considerations justifying asingleitem measure can be difficult as one still has to address the epistemic relationship between the item and the latent variable. If only one specific item
should be used to operationalize the latent variable and no other item would
be suitable, then inevitably aconcept becomes its measure and has no theoretical meaning beyond that measure [Bagozzi 1982]. This implies adoption
of pure operationism, aperspective which has received substantial criticism
in the philosophy of science literature [Hempel 1953, 1956]. If, on the other
hand, asingle good item is to be chosen from aset of potential candidates
(which implies that other items could have been used instead), the question
becomes how to choose the best or, at least, agood item.
The fourth step in the C-OAR-SE procedure is the identification of the
rater entity, which is considered to be an intrinsic component of the construct, e.g. in marketing, as Rossiter [2002, p.319] argued. Three types of
rater entity are identified: individual raters, expert raters, group raters. However, it is difficult to see the rationale underlying the decision to include
the rater as an integral part of a construct. Rossiters C-OAR-SE concept
maintains that constructs differ depending on whose perspective they represent. He claims that the rater entity is part of the construct. Unfortunately
no explanation is given as to why this is the case. In fact, taking the point
concerning perspective quite literally would result in the absurd situation

160

Principles of items and scale development

that adifferent construct would be involved every time adifferent individual


rater is involved [Diamantopoulos 2005].
Because of the different types of objects and types of attributes, scale enumeration rules under C-OAR-SE range from asingle-item score equaling
the total score, to two types of index, adouble index, an average, and averages which are then indexed [Rossiter 2002]. Attention needs to be drawn to
those enumeration rules associated with formed and eliciting attributes and
for which some sort of index or average is recommended by C-OAR-SE. The
difficulty here is that alinear composite (i.e. aweighted or unweighted sum)
is not the same as the latent variable defined by (formed attribute case) or
reflected in (eliciting attribute case) aset of indicators.
In the end, we need to mention about the validity and reliability aspect.
Rossiter customized the assessment of reliability according to the type of
rater entity involved and the type of attribute to be rated. Interestingly, while
he acknowledged that [Rossiter 2002, p.328]: highly precise reliable scores
can be obtained from non-valid scales for some reason he failed to point out
that lack of reliability provides negative evidence of the validity of ameasure. The indirect assessment of validity through reliability assessment is not
considered in C-OAR-SE.

V. RELIABILITY AND VALIDITY IN AVIEW


OFCLASSICAL TEST THEORY CTT

Principles and meaning of reliability and validity


Due to the fact that most of the measurement instruments in marketing research are based on the scientific experiences and knowledge derived from
psychology, there have been highlighted some concepts of the validity and
reliability evaluation associated with this field of science. In this chapter we
will focus on reliability and validity aspects pertaining to CTT theory.
In practice marketing researchers rarely assess the reliability and much
less the validity of scales [Heeler and Ray 1972, p.369]. As Parameswaran
et al. explained [1979, p.18]: even when reliability is treated in the marketing studies, the validity is usually not considered. If we return to history, in
1968, Hughes wrote an article Measurement, the neglected half of marketing
theory, identifying major disadvantages in the marketing discipline. In his
point of view, marketing scholars should be urged to pay more attention to
measurement because theory construction is a product of the interaction
between data and models. So if the state of the art in marketing is to develop
beyond its current condition, astarting point should be the regular assessment of reliability and validity in marketing research studies and the development of highly reliable and valid scales1.
Reliability, in one word-expression, refers to the precision level. As aresult, a measurement instrument is reliable, in sense of its precision and
1

Generally, in marketing research (which are based on psychometrics) when we decide


to develop an appropriate scale, we must primary think of, and control the whole process of
the measurement. Thus, we need to learn on how to control prospective errors which may occur in this process.

162

Reliability and validity in aview of Classical Test Theory CTT

consistency within the evaluation of scores derived from the conducted


measurements. A scale forming arespective measuring instrument should
exclude the random measurement error. The magnitude of this error is related to the degree of reliability of the measurement instrument [Schmidt
and Hunter 1999]. For example, if scale yields grossly imprecise indications
of the weight of objects, then such scale is unreliable. Similarly, if the shots
fired from awell-anchored rifle are scattered widely about the target, then
the rifle is unreliable. But if the shots are concentrated around the target,
then rifle is reliable (see Figure 17, p. 164). In this context, ahighly reliable
indicator of the theoretical concept is the one that leads to consistent results,
usually on repeated measurements and which does not fluctuate greatly due
to random error [Zeller and Carmines 1979]2. In short, reliability defines the
resolution of the measurement precision, and tells us how small differences
we can talk about, when we face the presence of true variance and error variance assuming the total variance [Tarkkonen and Vehkalahti 2005]3.
Reliability has certainly many positive connotations. For anything to be
characterized as reliable is to be described in positive terms. So it is with
any type of test scale, experiment or measuring procedure application. If it
is reliable, then it has gone along way toward gaining scientific acceptance.
2

Cronbach regarded reliability as the consistency of repeated measurements of the same


subject by the same process. Moreover, as he explained, there are two fundamental differences
that will appear between the physical and psychological context of research [Cronbach 1947,
pp. 12]: the physical scientist makes two assumptions, both of which are adequately for him.
Firstly, he assumes that the entity being measured does not change during the measurement
process. By controlling the relevant conditions he can hold nearly constant length of a rod or
the pressure gas. When measuring variable quantity, where his assumption is no longer valid,
he abandons the method of successive observations and employs instead simultaneous observations. The psychologist cannot obtain simultaneous measurement of behavior, yet the quantities that interest him are always variable. The second assumption of the physical scientist is
that his measurements are independent. If one rules out his remembering prior measurements,
this assumption can usually be made true. Repeated measurements of psychological quantities
are rarely independent. Thus, the reliability of a test score has generally been defined in terms
of variation of scores obtained by the individual on repeated testings. Neither the assumption of constancy of true scores nor the assumption of experimental independence is realized
in practice with most psychological variables.
3
For example the mean of the squared deviations of the observed scores about the obtained mean is the error variance. Any deviation of the observed scores from mean, would
be an error, which constitutes the unreliability of measurement instrument. In other words
adispersion of scores with, e.g. repeated measurements of the same distance (under the same
research conditions) can be regarded as a sign of unreliability. The greater the difference between the measurements of the same trait, the lower is the reliability [Cronbach 1951].

Principles and meaning of reliability and validity

163

Also when ameasurement is relatively reliable, it means it is minimally affected by chance disturbances (e.g. random/chance measurement errors).
However, as Zeller and Carmines argued [1979, p.11]: the measurement
of any phenomenon in social sciences always contains acertain amount of
error. Hence the goal of error free measurement, while laudable, is never
attained. Instead the amount of error may be large or small, but it is universally present to some extent. For example, two sets of measurement of the
same traits of the same individuals will never exactly duplicate each other.
Hence the repeated measurements will never be exactly equal one another.
Unreliability is always present to at least alimited extent. But while repeated
measurements of the same phenomenon will never precisely duplicate each
other, they do tend to be consistent from measurement to measurement. The
more consistent the results given by repeated measurements, the higher the
reliability of the measuring procedure, and vice versa [Zeller and Carmines
1979].
The scale that is reliable has only come half way toward achieving scientific acceptance. It must also be valid. Here, in context of validity, we simply
mean that it is appropriate or right, and in this particular sense we can talk
about valid theory, valid argumentation and valid reasons [Hornowska 2001].
In the early stages of validity theory development, it was described as
the level of accuracy according to research objectives that could be implemented. However, when works of Cronbach and Meehl [1955] as well as
Messick [1989] were published, the understanding of validity was changed4.
4

Perhaps those who invented the concept of validity were authors Cronbach and Meehl
[1955] with their classic article entitled Construct validity in psychological test. In literature, the
question of validity has evolved from the question of whether one measures what one intends
to measure [Kelley 1927; Cattell 1946] to the question of whether the empirical relations between test scores match theoretical relations in a nomological network [Cronbach and Meehl
1955] and finally, to the question of whether interpretations and actions based on test scores
are justified not only in the light of scientific evidence but with respect to social sciences research and consequences of test use [Messick 1989]. Thus, validity theory has gradually come
to treat every important test-related issue as relevant to the validity concept and aimed to integrate all these issues under a single header.
However, as Borsboom, Mellenbergh and van Heerden [2004] claimed, the validity description in literature sometimes failed to articulate the validity problem clearly or missed the
point entirely. In their opinion, validity is not complex, faceted, or dependent on nomological
networks and social consequences of testing. It is a very basic concept and was correctly formulated, for instance by Kelley [1927, p. 14] when he stated that a test is valid if it measures
what it purports to measure. This argument is exceedingly simple, so simple, in fact, that it
articulates an account of validity that may seem almost trivial.

164

Reliability and validity in aview of Classical Test Theory CTT

In consequence, the question how good test fulfills intentions of the author was replaced by what does it measure and how good is it. Cronbach
and Meehl posed the question which was not about aproperty of tests, but
about aproperty of test score interpretations. It was not about the simple,
factual question of whether atest or scale measures an attribute but about
the complex question of whether test score interpretations are consistent
with anomological network involving theoretical and observational terms
[Cronbach and Meehl 1955] or with an even more complicated system of
theoretical rationales, empirical data and social consequences of testing as
Messick stated in his articles [1989, 1998]. Messick assumed, that validity
reflects an integrated process of the level evaluation and measurement according to empirical proofs and theoretical considerations confirming the
adequacy and correctness within interpretation and programs of actions inferred on the basis tests scores or other measurement instruments.
In sum, validity is more of the theoretically-oriented issue because it raises
the question valid for what purpose?. However, validity must lead through
empirical proofs to quite consistent results on, e.g. repeated measurements
and to intended by the author theoretical concept. Validity is concerned with
whether ameasuring instrument, e.g. scale measures what it is supposed to
measure in the context in which it is applied. Such instrument should be
free of systematic errors which affect empirical measurements. These errors
rtt X max and rv X max

rtt X min and rv X

rtt X max and rv X min

Reliable and valid


measurement

Imprecise measurement
unreliable

Non-valid measurement

Random/chance error

Systematic error and bias

Legend: rtt reliability and rv validity coefficients

Figure 17. Reliability and validity of measurement instrument


Source: own concept based on Best 1978.

Reliability estimation

165

have asystematic biasing effect on the measurement instrument. For example, ascale that always registers the weight of an object two pounds below its
actual weight is affected by systematic error. Similarly, in case of mentioned
above rifle, if the shots aimed at bulls eye hit approximately the same location but not the bulls eye, then some form of non-random error has affected
the targeting of the rifle (Figure 17). So then, just as the reliability is inversely
related to the magnitude of random error, the validity depends on the extent
of non-random/systematic error present in the measurement process. Systematic errors disable indicators, in the constructed scale, to represent what
they are intended to, i.e. the theoretical concept being measured. They may
indicate on something other than the intended theoretical concept, perhaps
adifferent concept entirely [Hammersley 1987; Zeller and Carmines 1979].
Now, in the face of CTT measurement (that is discussed next), an observed score of item X is defined in the line of T true score, but also in the
line of systematic error of measurement ES and random error of measurement ER. As aresult, we obtain the following Equation [Zeller and Carmines
1980]:

X = T + ES + ER,(5.0)

where expected value of random error ER is 0 and is uncorrelated with the


other elements. Also random errors of different tests cannot be correlated
with each other.
Based upon the assumptions, derived from (5.0) which pertain to repeated measurements across the number of examinees, we may rewrite it to the
variance form. In consequence, we obtain V(X) = V(T) + V(ES) + V(ER) +
+2cov(T, ES). Zeller and Carmines [1980, p.31] used this formula in context of the reliability coefficient: V(X) V(ER)/V(X) and validity, respectively: V(T)/V(X).

Reliability estimation
In the classical test theory CTT (2.5) we approach to reliability estimation, by comparing the observed variance that is attributable to the model
as aproportion of the total variance, including error variance. True score is
aperfect measure of the property being measured. However, as Cronbach
[1947], Zeller and Carmines [1979] and many authors noticed, in practice,
the perfect true score can never really be known and generally it is assumed

166

Reliability and validity in aview of Classical Test Theory CTT

to be the mean score of alarge number of administrations of the same measurement instrument on the same subject. On the basis of examination of the
certain number of examinees with specific test, we determine the frequency
distribution of the observed scores of individuals. Such distribution (denoted as X) is formed by overlapping distributions T and E, that is, distributions
which contain true scores and errors (see Figure 18) [Magnusson 1981].

Tl

El

Xl

Ti

Ei

Xi

Legend: Each person j-th, whose scores form respective X distribution, and occupies position l, in T and E distributions.

Figure 18. Distributions of true scores T, errors scores E and observed


scores X in the same data set
Source: based on: Magnusson 1981, p.101.

Assuming that the level of correlation equals zero between true scores
and error scores, one may infer that particular covariance components (on
the right side of the Eq. (5.1)) will equal zero:

Reliability estimation

167

2X = T2 + E2 , 

(5.1)

T + E variance,
X = T +
where: X = observed
true
X = T + E error variance.
E variance,
Discussion about the reliability should be started in the context of correlation between e.g. the parallel tests. In such tests, if we base on linear
squared correlation coefficient:

tt = 2XT =

T2

2X

,

(5.2)

areliability is given as the ratio of true to observed score. Now, having based
on the formula (5.1), and assuming that T2 = 2X E2 , the equation can be
rewritten in the following way
X1Xtt2 = 1 E2 / 2X .
Observed scores explain true scores in the best way when alinear correlation ratio equals 1.0 that is, when all observed score variance is true score
variance. Correlation coefficient which yields 1.0, means that two distributions for both tests including true and observed scores are perfectly equal or
parallel. Theoretically parallel measurements ought to have the same average variance and correlation between pairs of measurements, and intuitively,
two measurements if made precisely, with reliability being equal 1.0, should
have the same true score without error.
The correlation Eq. (5.2) between the parallel measurements requires for
each element in population quite rigorous assumptions such as [Aranowska
2005]:

T1 = T2, 

(5.3)

2X2X11 == 2X2X22, 

(5.4)

E1E2 == 0,
0

(5.5)

which means, there appears equality between: true scores, observed variance
scores, as well as there is no linear relationship between errors. In a view
of Lord and Novick, measurements should be experimentally independent.
However, this is not aquestion of true values, as Gulliksen explained. It is
rather an issue of the expected value (means) T from the distributions of
ability of the j-th examinee being researched in the first and second test. For
the population this assumption will be expressed as follows:

168

Reliability and validity in aview of Classical Test Theory CTT

T1 = E X1 and T2 = E X2,(5.6)

that is, the expected value is defined as true value.


From (5.3) we may infer the equality of true score variances, T21 = T22,
and consequently equality of error variances E21 = E21, as well as the equality
of observed score variances.
In sum, if two measurements in the same population are parallel, then
they obtain asimilar level of reliability, which is equal to the linear correlation coefficient [Aranowska 2005]5:
X1 , X2 = tt = 2XT =

T2

2X

.(5.7)

More importantly, correlation between the true scores of two separate


measurements X and Y equals the correlation between these measurements,
weighted by the inverse reliability both of them, namely:
TXTY =

that is, TXTY =

covTXTY
TX TY

XY

XX YY

,

(5.8)

cov XY

XY
cov XY
=
= X Y =
,
TX TY TX TY
XX YY
X Y

where:
XX reliability of X measurement,
YY reliability of Y measurement.
The Eq. (5.8) represents the correction for attenuation. If additionally
measurements X' and Y' would be parallel to X and Y, the Eq.(5.8) would be
changed [Aranowska 2005]:
TXTY =

XY

XX' YY'

.

(5.9)

According to the theory of statistics, searching for linear relationship makes only sense
for the comparison of traits based on two-dimensional normal distribution. Robust theory for
r-Pearson test, however, permits a slight deviation from normality. Generally, distributions
P(X2|X1 = x) and P(X1|X2 = x) should be unimodal and more or less symmetric.

Reliability estimation

169

Equation (5.9) reflects equal correlation between observed scores for two
measurements divided by the square root of the product of the correlations
of each observed measurement score with aparallel score. The scores X and
X' represent the observed scores on parallel measurements. And the scores
Y and Y' are observed scores on two other measurements that are parallel to
each other, and Y = TY + EY, and Y' = TY + EY.
The XX' YY' must be less than or equal to 1, and TXTY must be greater
than or equal to XY. Theoretically, the measurement cannot correlate more
highly with any other score than it correlates with its own trues score; that is
[Aranowska 2005]:

XY XT,

(5.10)

or

XY XX' .(5.11)

We can also correct for attenuation due to the unreliability in the predictor or only in the criterion. For example,

TXTY =

XY

YY'

.

(5.12)

This would be the correlation between X and Y if the measurement of Y was


perfectly reliable.
By using correlations between observed scores rXY , rXX' and rYY' in correction for attenuation, it is possible to estimate the correlation between true
scores for any two measurements in asample.
At last, test scores may be considered as the sum of the scores of certain
number of test parts (i.e. the sum of the particular items). In its simplest
form, atest consists of two parts. In CTT theory, the score of measurement
instrument may be considered as an aggregated sum of particular scores of
the whole test. Denote, for example, X1 and X2 as separate measurements
(two parts of the test) with values T1 and T2 and errors E1 and E2 at X = T + E,
where T = T1 + T2 and E = E1 + E2. It is now possible to form variances: of the
observed score X, true score T and measurement errors E as the sum of the
variances of two random variables, namely [Nowakowska 1975; Aranowska
2005]:

2X = 2X1 + 2X2 + 2 X1X2 2X1 2X2,(5.13)

170

Reliability and validity in aview of Classical Test Theory CTT

T2 = T21 + T22 + 2 T1T2 T21 T22, 

(5.14)

E2 = E21 + E2 2 + 2 E1E2 E21 E2 2, 

(5.15)

where errors in separate measurements are uncorrelated.


If additionally X1 and X2 tests would be parallel, on the basis of known
assumptions for equality of such tests then [Nowakowska 1975; Aranowska
2005]:

T1T=1 T=2T,2,T1TT21T=2 1= 1,

(5.16)

T21 = T22,

(5.17)

2X1 = 2X2,

(5.18)

E21 = E2 2.

(5.19)

Above Eq. (5.165.19) take the following form:


2X = 2 2X1 (1 + X1X2 ),

(5.20)

T2 = 4T21,

(5.21)

E2 = 2 E21.

(5.22)

Such abeing case, if we base on (5.20) and (5.22) we can infer that true score
variance of the whole test has increased four times to true score variance of
one part of the test. And error variance increased only two times. Reliability
of the whole test is greater than reliability of each separate test (i.e. subtest).
The Eq. (5.205.22) form reliability, which is given as:

tt =

T2

2X

4T21

2 2X1 (1 + X1X2)

2 X1X2

1 + X1X2

.

(5.23)

2Xcoefficients
= 2 2X1 (1 + X1X2 )obtain positive values (01), hence:
Because the correlation

tt > X1X2 = X1X1 = X2 X2.

(5.24)

171

Reliability estimation

And if X1, X2, , Xn are parallel, then correlations among them will be equal:

X1X2 = X2X3 = = Xn1, Xn = constans.

(5.25)

From the Eq. (5.7) we infer that the correlation between two parallel
measurements Xi, Xk expresses the reliability of any of them:

Xi Xk = tt =

T2i

2Xi

,

(5.26)

where tt represents reliability of the parallel measurement. If the reliability


(5.26) is correct, then knowing this relationship, we may indicate on the reliability of the whole test containing n parallel components as follows:

tt =

n
. 
1 + (n 1)

(5.27)

The formula (5.27) is otherwise called the Spearman-Brown.


If X = X1 + X2 + + Xn and Xi are separate measurements, and if there
is possibility to tell about their parallelism structure, we can adopt test reliability assuming that [Nowakowska 1975; Aranowska 2005]:

tt = 2XT

2X

n i =1 i
1

n 1
2X

.

(5.28)

Right side of the Eq. (5.28) was given aname by Cronbach [1951] as alpha
reliability coefficient. As aresult, the reliability of each test is greater than or
equal to alpha coefficient. When test components are parallel, alpha reliability equals the right side of Spearman-Brown formula, that is:

n
,
1 + (n 1)

(5.29)

where constans, denotes correlation coefficient between each pair of parallel measurements.
If now:

2X = n 2X (1 + (n 1) ),

(5.30)

172

Reliability and validity in aview of Classical Test Theory CTT

T2 = nT2 ,

(5.31)

E2 = n E2 ,

(5.32)

then the reliability of measurement increases along with its duration (variance of true score is increasing faster than error variance).
Finally, we can sum up the interpretation of reliability, as follows:
tt = X1X2 where reliability of a test equals the correlation of its observed scores with observed scores on a parallel test. If examinee obtains the same observed score when tested with a parallel form and
there is some variance in observed scores within each testing, the tests
have perfect reliability tt = 1. On the other hand, if examinees obtain
observed scores on one test that are uncorrelated with their observed
scores on aparallel test, tt = 0, tests are completely unreliable.
The proportion of variance in X1 is explained by linear relationship with
X2, thus we obtain 2X1X2 asquared correlation which can be interpreted as
the proportion of variance in one of the variables that is explained by alinear
relationship with the other variable. Here, 2X1X2 can be viewed as the proportion of variance in one test score explained by its linear relationship with
scores on aparallel test.
Some other yet interpretation of reliability might be the following [Allen
and Yen 1979]:
X X = T2 / 2X represents the ratio of true score variance to observed
1 2
score variance. For perfectly reliable test, =X 1,
so= T2 / 2X = 1, and all
1 X2
of the observed variance reflects true score variance rather than error
variance. When < 1, error is present in the measurement.
X1X2 = 2XT defines the reliability coefficient as the square of the correlation between observed scores and true scores.
2XT then XT = 0.9; if X1X2 ==0.25,
2XT then XT = 0.5.
For example, if X1X2 ==0.81,
2XT
This relationship was illustrated on Figure 19. Whenever 0 < X1X2 <= 1,
2
we can see that XT > X1X2. =
An XT
observed test score will correlate higher
with its own true score than with an observed score on aparallel test. In
fact, since atest score cannot correlate more highly with any other variable than it can with its own true score, the maximum correlation between an observed test score and any other variables is X1X2 = XT .
2
XX11XX22 ==112XE
XE explains that the reliability is minus 1 the squared correlation between observed and error scores. Ideally, XE should be 0,
2XE
but XE = 0 only when X1X2 = 11.0.

173

Reliability estimation

XT
1.0
0.9

0.5

0.25

0.81

1.0

X1X2

Figure 19. Reliability coefficient as the square of the


correlation between observed scores and true scores
Source: based on Allen and Yen 1979.

XX XX ==11E2E2 //2X2X relates to the reliability error score variance and ob11 22
2
served score variance. So when X X X =X 1
=and
1 E2 /E2 2X/=0.
X Simultaneously
1 21 2
222 222
when XXXXXX ===110,
1EEE/=//XXX.
111 222

Homogeneity or heterogeneity of the group


We should have started discussion here with acorrelation coefficient where
lower diversity level in the sample causes lower correlation coefficient that
we further obtain. The same refers to correlation coefficients expressing
relationships between parallel tests, i.e. reliability coefficients [Magnusson
1981]. As mentioned, the magnitude of errors dispersion is dependent on
the precise measurements in reference to true scores of each examinee, and
can be accepted as constant for different samples (although particular samples yield different levels of heterogeneity). On the other hand, adispersion
of true scores in respective variable (for which, group is heterogeneous) will
be not equal in different samples.
Lets now assume reliability coefficient, for a sample with known total
variance, which is expressed as follows:
rtt = 1

s E2

s 2X

.

(5.33)

174

Reliability and validity in aview of Classical Test Theory CTT

If we examine homogenous sample, then error variance will remain unchanged but total variance will decrease, because the variance distribution
of true scores will be lower. In Eq. (5.33) the effect of this change will be
discernible in the coefficient which is decreasing.
Having based on the assumption of equal variances in errors on different
levels of difficulty in test, we may construct Equation for calculation of reliability test, which can be applied in sample with adifferent total variance, as
compared to total variance which appears in the sample constituting abasis
for calculation of the original reliability coefficient.
If by symbol uwe denote asample, in which the reliability will be estimated, we will obtain the formula for the reliability test, calculated in that
group [Magnusson 1981]:
ruu = 1

s E2

su2

. 

(5.34)

Equation (5.33) enables us to take down the variance of errors distribution:


s E2 = s 2X (1 rtt ). 

(5.35)

Because the distribution of errors contains the same variance in samples with
different levels of heterogeneity, therefore in Eq. (5.34), we need to replace s E2 = s 2X (1 rtt )
2
s Eby
= s 2X (1 rtt ). In consequence, if total variance for sample uis identified, then
we will obtain formula for reliability estimation in this sample as follows:
ruu = 1

s 2X (1 rtt )
su2

,

(5.36)

where:
ruu estimated reliability in sample u,
2s 2 (1 2 r )
variance where known reliability coefficient was calculated,
tt )
ruu = 1 s E X= s2X (1ttsrample
su sample variance, where reliability is being estimated,
rtt known reliability in sample.
In sum, the level of the reliability coefficient is significantly affected by
the nature of explored group as awhole, of which the empirical results form
the basis for calculating this coefficient. Each correlation coefficient depends
on the individuals differences in tested group. Such abeing case, if we consider asample with number of classes (for example, in accordance to age,
socio-economic status, etc.), the reliability coefficient may not yield correct

Reliability estimation

175

level. If test is to be used to differentiate individuals within amore homogeneous group, then reliability coefficient should be re-estimated for awhole
group and for each subgroup. It is better to calculate that coefficient again
(using empirical data obtained for the group as awhole) and compare it to
the groups which were found in the study.

Standard error of measurement and estimation


Reliability explains the proportion of true score variance in the observed test
scores. However, in many situations researcher is more concerned with how
measurement errors affect the interpretation of examinees scores. Although,
it is hard to determine the exact amount of error in agiven score, classical
test theory provides a method to describe the expected variation of each
examinees observed scores about the examinees true score. Recall that the
true score has been defined as the mean, or expected value of the examinees
observed scores obtained from alarge number of repeated measurements.
Now, in order to calculate standard error of the measurement we must
profit by linear correlation ratio [Aranowska 2005]:
2XT =

T2

2X

= tt.(5.37)

With some minor transformations in Eq. (5.37) there will be:


E2 = 2X (1 tt ) = 2X (1 X X ), 
1 2

(5.38)

which is:
E = X 1 tt = X 1 X X .(5.39)
1 2
The measurement error variance will be given in the following form of:
E2 = 2X (1 tt ),(5.40)
where error variance is equal to the observed score variance times 1 minus
the correlation between observed scores for parallel measurement. This formula can be used similarly in the sample:
E2 ; E2 = s E2 = s 2X (1 rtt ). 

(5.41)

176

Reliability and validity in aview of Classical Test Theory CTT

where:
2
E2 ; E2 = s E2 = ssample
)
variance,
X (1 rtterror
2 2
2
2
E ; E = s E = s X
rtt ) variance for variable X,
(1 sample
rtt sample estimate of reliability coefficient.
By taking square roots (5.40), we obtain the standard error of measurement
(SEM) for population:

E2 = 2X 1 tt .

(5.42)

The SEM indicates the accuracy of discriminating between the observations.


For asample, it may be rewritten as follows:
s E2 = s 2X 1 rtt ,(5.43)
s 2X 1 errors,
rtt
where s E2 =denotes
which may be committed by j-th examinee, assuming that, obtained score (in infinite repeated measurements) will be for
the j-th examinees atrue score. Probability of the standard error of measurement, obtained on certain level of the measurement instrument, for all
examinees, will be the same and it is not dependent of true scores. If, for
example, reliability of test will be equal 0.84 in aten-examinee group (with
2
s E2 of
= sthe
1 rtt
scores being equal 10), and if these values were substituted
X observed
into the Eq. (5.43), we would find that the distribution of the scores of SEM
(that would have been received, for any examinee) would amount to4.0.
Now, because all measurement errors are: 1) independent of true scores,
2) independent of each other, 3) the same for all examinees, 4) are distributed
normally, researcher should analyze the SEMs in abroader context, that is,
in confidence intervals [Magnusson 1981]. However, in most testing in most
testing situations, the examinee is tested once, and only one observed score
is obtained. Thus, even if one obtains an estimate of the standard error of
measurement for the test, one cannot construct such an interval around an
examinees true score because the actual value of the true score is unknown.
Thats why researcher should estimate value of the standard error to create
aconfidence interval around the examinees observed score. The estimation
of SEM (by Eq. (5.43)) reinforces the estimation of confidence intervals in
accordance with probability levels, to which scores of particular examinees
(participating in empirical study) fall. Adirect access to the true score probability distribution is not available. However, with same degree of certainty,
one can determine the range in which the true score falls for each observed
score obtained among the other scores [Magnusson 1981].

Reliability estimation

177

Having based on SEM, one may estimate true value for any examinee,
applying the statistical theory of estimation where confidence intervals are
constructed as follows [Aranowska 2005]:
P = ( X z s E2 T X + z s E2 ) = 1 ,(5.44)
where: z is the critical value of the standard normal deviate at the desired
probability level, X is the observed score, and T true score.
As E2 remains
unknown, hence confidence intervals must be
= 2X 1 sometimes
tt
transformed to probability distributions dependent on sample size. In z we
choose higher value, designated of t-Students probability distribution, that
is t;f, f = n 1:
P = ( X t ; f s E2 T X + t ; f s E2 ) = 1 .

(5.45)

Eventually, one may construct the standard error of estimation of the true
score in reliability process. According to theory of statistics, prediction of
T based on prediction X equals the expected value E(T/X), therefore linear
regression:
T = X + ,

(5.46)
where =

T
and = ET E X.
X TX

And in asample:

T = bX + a.

(5.47)

= tt tt = tt,

(5.48)

T
is then transformed according to theoretical reliabil X TX
ity assumptions, as follows:

Expression =

where: tt denotes reliability of measurement. And therefore:


T = tt X + ET tt EX = tt ( X EX + ET ).(5.49)
Because E X = ET hence: T = tt X + (1 tt )EX. In a sample it is expressed as
follows:

T = rtt X + (1 rtt ) X .

(5.50)

178

Reliability and validity in aview of Classical Test Theory CTT

The error of point estimation for the true score reflects difference between
(T T ) denoted as R2 :
R2 = 2 ( tt X + (1 tt )E X T ) = tt E.

(5.51)

And the standard error of estimation is denoted as the SEE which takes the
form of:
SEE = tt SEM . 

(5.52)

In sample, respectively:
SEE = rtt SEM = s 2X rtt (1 rtt ). 

(5.53)

In general, Eq. (5.43) indicates that the amount of standard error of measurement is afunction of the reliability coefficient. Reliability is aspecial way
of expressing the inaccuracy in the instrument/scale, namely, to what extent
it determines the amount of the total variance that results from the true variance. However, the reliability coefficient may also be unreliable. The SEM
provides amore realistic guidance on the estimation of uncertainty level of
the considered true score (regardless of how distant the estimation uncertainty of that score is) [Green Jr 1950].

Reliability estimation for unidimensional reflective indicators


In general, we use three principal methods for assessing the reliability of the
test or scale, i.e. test-retest, internal consistency and alternate forms (see
Table 19).
In test-retest, we apply the same set of measures to the same examinees
at two different times. Two sets of obtained scores are then correlated. The
correlation between test scores in this case is called coefficient of stability, and
as Cronbach stated [1947, p.5]: it refers to part of reliability which is the
degree to which the test score indicates unchanging individual differences
in any traits.
In the parallel measurements we administer two different forms of atest,
based on the same content, at one time to the same examinees. In this case,
the correlation coefficient is known as the coefficient of equivalence, which
[Cronbach 1947, p.6]: represents the reliability as the degree to which the

Reliability estimation for unidimensional reflective indicators

179

Table 19. Various methods of test-scale reliability estimation some differences


Types of reliability procedure and function
Kuder
Used for true score theory approach
Richardson (raw scores) and also for dichotomous
responses when same examinees with
the same test are measured at one time
Cronbachs Used for true score theory approach
alpha
(raw scores) and also for polytomous responses when the same examinees with
the same test are measured at one time
Split-half
Used for true score theory approach
(raw scores) and also for polytomous
responses where one test (with one
version of contents) is required for
halves and k parts
SpearmanBrown
Test-retest

Alternate
forms of
parallel test

Statistical method solution


Internal consistency and precision based on: KR-20/KR-21,
alpha Cronbach, Hoyts variance
analysis or some other

For halves equivalence: rSpearman-Brown or Rulons formula


For k parts equivalence: generalised rSpearman-Brown and
r average of intercorrelation
between k parts
The analysis of the relationship of tested rSpearman-Brown formula
items with the total score
Used when:
For precision and relative stabil1) s ame examinees with the same test
ity: rPearson or other measure
are measured over and over
2) s ame examinees are measured with
the same test again at two different
times / occasions (with time interval)
Used when:
For tests equivalence: Rulons reli1) t wo forms of test are parallel, and
ability of the whole test and after
when they are applied at one time
that rSpearman-Brown formula
2) w
 hen two forms of test are applied at for test length
two times, however with aminimum For relative stability: rPearson
time interval
orother measure

Source: own construction based on: Magnusson 1981; Wilson 2005; Aranowska 2005.

test score indicates the status of the individual at the present instant in the
general and group factors defined by the test. On the other hand, we could
administer two alternate test forms on separate testing occasions (with some
minimum time interval), yielding a coefficient of stability and equivalence.
The test-scale items on one form are designed to be similar (but not identical) to test-scale items on the other form. The resulting scores from the two
administrations of the alternate forms are then correlated6.
6

Each of these coefficients will probably be an underestimate of the theoretical reliability coefficient which would be obtained from truly parallel measurements. Coombs [1950]

180

Reliability and validity in aview of Classical Test Theory CTT

In internal consistency methods, test-scale is applied to examinees at one


point in time. Items within the particular subset of the test/scale are then
correlated. Here exist anumber of different practical methods for obtaining
internal consistency coefficients.
In context of determining higher reliability estimates, test-retest or parallel-forms estimates should be preferably used. Use of coefficient alpha or
Kuder-Richardson will produce a lower bound for tests reliability [Berge
and Zegers 1978]7. Coefficient alpha and the Kuder-Richardson formulas
should be rather used for homogeneous tests, since they basically reflect
items homogeneity. If test measures avariety of traits, alpha and the Kuder-Richardson reliability will be inappropriately low. On the other hand,
Kuder-Richardson formulas 20 (KR-20) and 21 (KR-21) give a good level
of reliability of the test, provided the dichotomous items8 in test have equal
item difficulties [Jackson 1942].
The coefficient alpha is typically used in case of summated scale. Usually
in the Cronbach or Kuder-Richardson methods we have simple structure of
the data, with one latent variable. Moreover, alpha is avery general reliability
coefficient, encompassing both the SpearmanBrown prophecy formula as
well as the Kuder-Richardson 20. Alpha is also easy to compute, especially if
one is working with acorrelation matrix. The minimal effort that is required
to compute alpha is more than repaid by the substantial information that it
conveys about the reliability of ascale [Zeller and Carmines 1979].
Finally, referring to the Spearman-Brown formula, one can fairly say that,
it can over, or underestimate atests reliability if components of the test are
not parallel9. The Spearman-Brown formula is useful in judging the effects
on reliability in reference to test length. In other words, it is useful for estimating the reliability of atest with altered length and it offers reasonable
estimates if the test length is changed by adding or omitting parallel versions
of the original test items [Allen and Yen 1979].
characterized this theoretical quantity as the coefficient of precision, defined as the correlation between test scores when examinees respond to the same test items repeatedly and there
are no changes in examinees over time, or as Cronbach [1951] preferred to describe it, when
the elapsed time between tests becomes infinitesimal.
7
This lower bound equals the test reliability if the components in the test are essentially
tau-equivalent.
8
Dichotomously-scored items mean that items are measured in binary system of answers
such as 0 and 1.
9
In Spearman Brown prophecy we estimate the reliability of acomposite of parallel tests
when the reliability of one of those tests is known.

Selected methods of reliability estimation

181

Selected methods of reliability estimation


Test-retest method
In test-retest, the reliability estimate and scores are examined twice with the
same test. In an approach called over and over, between both tests theres
no defined interval time, which means asecond test should be conducted
straight away after the first is finished. If there would be ashort time interval,
it might cause carry-over effects due to the examinees memory, practice
or mood. On the other hand, along interval would make effects due to the
changes in information or moods likely too. If trait, which is measured, varies over time, long intervals tend to underestimate the reliability of the test
for one occasion [Hoffman 1963].
In this method, if the examinee receives exactly the same observed score
on the second test as on the first, and if there is some variance in the observed
scores among other examinees, then the correlation 1.0 indicates aperfect
reliability. But if scores from the first measurement are not related to the set
of scores from the second, then estimate of tt is 0. Obviously, this method of
the reliability estimation works better in conditions, where astable construct
is measured with forgettable items, as compared with aless stable construct,
measured with memorable items.
In test-retest researcher investigates variation in the items locations due
to the measurement instrument, not due to real change in examinees locations. Both tests (first and second) should be close enough to assume that
there has been little real change. Total variance structure is decomposed in
the following way [Magnusson 1981]:

2X = T2 + E2 p + E2 bad + E2 z + E2 oc + T2osc,

(5.54)

where:
2
2
2
E2 z + E2 oc + T2osc
2X = total
variance
of +test,
T +
E p + Ebad
2
2
2X = T2
+ true
E2 bad + of
+ E2 oc + T2osc
E p +variance
Ez test,
2
2X = T2 +
+ error
+variance
E2 + E2due
+to
T2the
E2 p
effects of reminders,
E
osc
2
2
2
2
2 bad 2 z
2 oc
X = T + E p + Ebad
+ error
+

variance
composing
components associated with running
Ez
Eoc
Tosc
apilot test,
2
= T2 + E2 p + E2 bad + E2 z + error
T2osc
of the guessing effect,
Eoc +variance
2
2
2
2
2
E p + Ebad + Ez + Eoc
+ error
Tosc variance due to the subjectivism in giving response,
2
2
2
2
Ebad + Ez + Eoc + Tosc true variance pertaining to oscillation of true score.

182

Reliability and validity in aview of Classical Test Theory CTT

Variance which is assumed in the course of correlation calculation between


scores of two tests would be error variance composing of [Magnusson 1981]:
E2 = E2 bad + E2 z + E2 oc + T2osc.(5.55)

The error variance corresponds to the fluctuations occurring from one test
to another. This variability may be due to uncontrolled tests effects such as:
sudden changes in the weather, noise and other minor distractors as broken
pencil. However, this error is also due to changes of the tested examinee,
which can be caused by personal illness or fatigue, emotional stress, life troubles, recent pleasant or unpleasant experiences.
As aresult, variance for the correlation coefficient between scores of both
tests is expressed as follows:

tt = 1

E2 bad + E2 z + E2 oc + T2osc
2X

.

(5.56)

And for asample (transforming Eq. (5.56)) we obtain respectively:


rtt = 1

s E2 bad + s E2 z + s E2 oc + sT2osc
s 2X

.

(5.57)

Another test-retest method assumes that the same examinees are measured
with the same test again at two different times (with atime interval). In reliability assessment, the scores from two administrations are correlated and the
resulting coefficient is interpreted in terms of the stability of performance of
the measures over time10. A test-retest or stability coefficient is estimated by
the magnitude of the correlation between the same measures on different occasions. If stability coefficient is low, with no change in the construct over time,
the reliability of the measure is in doubt. If stability coefficient is high, with no
change in the construct over time, the reliability of the measure is enhanced.
The interpretation of stability coefficient as an estimate of reliability raises
some questions, e.g. when alow coefficient is obtained, does this indicate
that the test provides unreliable measures of the trait, or does it imply that
the trait itself is unstable? If the test developer believes that the amounts of
the trait that examinees possess should change over time, abasic assumption
10

time.

Note here that test-retest reliability is concerned with the stability of item responses over

Selected methods of reliability estimation

183

of the classical true score model is violated and the obtained correlation coefficient is not an appropriate estimate of test score reliability. A second issue
is whether an examinees behavior is altered by the first test administration
so that the second test score reflects effects of memory, practice, learning,
boredom, sensitization or any other consequences of the first measurement.
In sum, test-retest as amethod of reliability estimation involves the following problems:
Firstly, different scores may be obtained depending on the length of
time between one test and second. That is, the longer the time interval
the lower the reliability. On the other hand, as Anastasi and Urbina [1997,
pp. 131132] explained: if the time interval between first and second test
is relatively short, the examinees may be able to recall many of previous answers. In other words, the same pattern of giving good and bad answers can
be repeated simply because the answers have been memorized. The scores of
both studies are therefore not obtained independently and the high correlation between them is an artifact.11
Secondly, along with further research in time, asubject to serious change
is also the measurement instrument itself. This is the case of items related to
reasoning and creativity. If examinee finds arule upon which the research
is based or has already come to solve the problem, in the future the same
examinee may provide the correct answer without going through another
phase of the study.
Thirdly, if any change occurs between the first and second administration
of the test, there is no way to distinguish between that change and unreliability.
Fourthly, test-retest correlation is dependent on the correlation between
different items composing the scale, because a portion of the correlation
of sums includes the correlation of each item with itself. Such correlations
would be expected to be much higher than those found between different
items and could produce a substantial correlation between test and retest
[Nunnally 1967, p.215].
Finally, test-retest as the method for estimating the reliability is suitable
for tests that repetition does not affect significantly the results. Although this
method provides the useful information about the stability of measures, the
problems suggest that it should not be used as the sole method of reliability
assessment. Rather, it should be supplemented with internal consistency estimates.
11

Considering pros and cons, Bohrnstedt suggested a two-week interval, which is generally recommended period for retest [Bohrnstedt 1970, p. 85].

184

Reliability and validity in aview of Classical Test Theory CTT

Parallel-test, alternate forms


In the parallel-test12, two alternate copies of the instrument are administered
and calibrated, and then two sets produce the alternate forms of the reliability coefficient. The correlation coefficient between two sets of scores is
then computed with e.g. Pearson product moment formula. This coefficient
is also called the coefficient of equivalence. The higher equivalence, the more
confident researcher can be that scores from the different test forms may be
used interchangeably.
Parallel tests should be constructed as if the simultaneous research (including the same two tests) gave an identical correlation between two distributions of collected scores to the correlation, which is obtained between the
two times approach in the research based on one of the tests. For instance,
as it is in test-retest, assuming that in the second study we can exclude any
influence of the first sample. Items in parallel tests should be similar in their
content and levels of difficulty in order to obtain the same results. Items
which are presented in one of the tests must correspond to items in the
second test not only in terms of content, but also in terms of the user, the
style of posed questions, etc.
In practice, in parallel tests, each test is designed to be similar in its content but different enough that the first will not substantially affect the other.
The resulting scores obtained from two administrations of the alternate
forms are then correlated. Of course, administration procedure and scoring
errors, guessing effects, and temporary fluctuations in examinees performance may contribute to the inconsistency of scores.
Brzeziski [1978] explained, this method can be performed well if the
following criteria of measurement will be met, that is:
equal scores on averages between first and second form of the test,
equal variances,
equal intercorrelations for each measured item on two forms,
no wide range of time interval between two tests.
In context of last given condition, Brzeziski [1978] claims that second
form of the test must come right away, after the first is finished. However, it
is possible that two forms may be administered only within avery short time
interval, allowing enough time between tests so that examinees will not be
fatigued. Also it is considered desirable to balance the order of administra 12

For computational examples, see Kerlinger [1964, pp. 447451].

Selected methods of reliability estimation

185

tion of the forms so that half of the examinees will be randomly assigned to
form 1 followed by form 2, whereas the other half will take form 2 followed
by form 1.
The primary problem with use of alternate forms lies in the development
of substantially equivalent alternate measures. For example, strict definitions
of alternate forms state that the mean, variance and intercorrelation of items
on each form must be equivalent [Gulliksen 1950]. Though this problem has
been overcome in psychometrics, it still remains aserious consideration for
the measurement of other behavioral constructs.
An even more perplexing problem with alternate forms is proving that
the two measures are equivalent. For example, if the correlation between
scores on the two forms is low, it is difficult to determine whether the measures have intrinsically low reliability or whether one of the forms is simply
not equivalent in content to the other [Nunnally 1967, p.211213].

Internal consistency reliability methods


Due to the many constraints such as time, cost and availability of the examinees, it is not always possible to repeat tests. In such cases, the concept
of internal consistency is strongly preferred to estimate reliability, where
methods are based on statistical analysis of items in one time approach.
However, before we proceed to analysis of internal consistency, we need to
follow some strict rules.
Firstly, items must be free of technical flaws that may cause examinees to
respond on some basis unrelated to the content. Furthermore, when items
on asingle test are drawn from diverse areas of examinees knowledge, they
probably will not perform consistently across these items. Simultaneously,
even if all items are fair representatives of the content domain, but some are
poorly written so that examinees may misinterpret the questions, this will
also lower internal consistency.
Secondly, in conducting an internal consistency analysis, we should be
concerned about the errors caused by content sampling, although errors due
to faulty administration, scoring, and guessing of examinees may also affect
the internal consistency coefficient. If examinees perform consistently across
items, the test is said to have homogeneous structure of items which obtain
the same level of performance or represent the same content domain.
Thirdly, internal consistency is strongly associated with the development
of unidimensional scales containing items which result in maximum normal

186

Reliability and validity in aview of Classical Test Theory CTT

and uniform distributions13. As Boyle [1985] argued, if test-scale assumes


that only single latent variable is the source of all covariation among the
items, then more reasonable approach to items would be solution based on
internal consistency. Factor models detect the presence of multiple latent
variables (a few types of subscales within the one construct) and serve as
causes of larger variation in aset of items. In Boyles opinion [1985] (meaning here the solution based on factor models), items should be selected in
order to be maximally loaded by one factor, but which exhibit moderate to
low item inter-correlations in order to maximize the breadth of measurement of the given factor.
The argument, referring to normal and uniform distributions of items,
means that when items with extreme skew or positive kurtosis appear, they
will not correlate well with other normally distributed items. Items that are
skewed, leptokurtic or both will result in unacceptable levels of reliability
[Enders and Bandalos 1999]14.
Fourthly, test-scale construction reveals two potentially conflicting goals,
i.e. achieving satisfactory levels of reliability without compromising the ability of atest-scale to adequately measure the construct of interest. Procedures
which are undertaken in order to maximize the internal consistency may do
so at the cost of reducing validity [Nunnally and Bernstein 1994]. Any deletion of skewed items based on low item-total correlations may comprise the
ability of measurement instrument to discriminate at extreme values of the
construct, thus limiting the validity of instrument. The deletion of difficult
or easy items might result in an increase of the reliability, but would sacrifice
the ability to discriminate tails of the construct, thereby decreasing validity.
Items distribution and their shape influences the correlation coefficient
[Carroll 1945; Bernstein and Teng 1989]. A quite interesting empirical research in this case was conducted by Nunnally and Bernstein [1994] who
discussed the effects of shape distribution on the correlation coefficient in
13
While it is desirable that items should measure a something in common (i.e. exhibit unidimensionality) Hattie [1985, pp. 157158] has indicated that there is no satisfactory solution
in that case. This researcher pointed out, a unidimensional scale (having an underlying latent
trait) is not necessarily reliable, internally consistent or homogeneous. Hattie concluded that
the frequent use of Cronbachs alpha coefficient as a measure of unidimensionality is not fully
justified.
14
Montenei, Adams and Eggers [1996], employed some tactic when selecting items (as it
was in case of Attitudes towards diversity scale development) which included high corrected
item-total correlations, large standard deviations, and low skewness/kurtosis comprising the
final scale. Vispoel [1996], and Friedman [1996] excluded items that did not have means near
the mid-point that is, skewed items.

Selected methods of reliability estimation

187

some detail. As they have explained, the amount of reduction in covariance


due to distributional differences is dependent on three factors:
size of the original correlation that plays asignificant role, i.e. where
high correlations are affected to a much greater extent than are low
correlations,
degree of disparity among item distribution shapes which affects the
amount of reduction; this reduction in covariance increases as the disparity between distribution shapes increases,
level of measurement influences the degree of restriction in covariance, which means that continuous items are less sensitive to distributional disparities than ordered categorical variables.
The last assertion that the correlational methods using ordered categorical
variables are more sensitive than continuous variables to distributional disparities has asignificant meaning for the reliability estimation. For example,
the number of response categories used in the Likert-type scales may address apertinent mediating variable with regard to the relation between item
distribution shape and reliability. So then, the reliability increases as afunction of the number of response categories, until these numbers reach five or
seven, at which point reliability increases level off. However, the degree to
which the number of response categories interacts with the distributional
shapes of the items in affecting reliability is not clear [Enders and Bandalos
1999]15.
Split-Half method
In this method, the whole test is divided into two parts. If the split halves
of the test are parallel, the reliability of the whole test is estimated by the
Spearman-Brown formula. When halves are essentially tau-equivalent,
alpha coefficient is computed of the entire test [Jackson 1979]. In case of
15

Feldt [1993] conducted a study investigating the other type of relation between distributional shape and test reliability for dichotomous items. In his findings, he pointed out that
conventional measurement practice suggests that cognitive tests should be constructed using
item difficulties concentrated at about 0.50, as this will result in maximum reliability. In order
to assess the tenability of this assumption, he constructed a number of hypothetical tests having different distributions of item difficulties and compared reliability estimates to those gathered from tests comprised solely of items with optimal p-values of 0.50. The results indicated
that reliability will be affected very little by utilizing a spread of item difficulties. The largest
drop in reliability (0.05) occurred in his study when the distribution of item difficulties was
skewed with a mean of 0.73. With respect to this, it should be noted here that the effect of dichotomizing the underlying continuous distribution affects the reliability to a much greater
extent than do differences in item difficulties.

188

Reliability and validity in aview of Classical Test Theory CTT

Spearman-Brown formula, scores obtained from the parallel test halves, denoted as Y and Y' are correlated, producing YY'. This correlation will be
ameasure of the reliability of one half of the test. And the reliability of entire
test, X = Y1 + Y2, will be greater than reliability of either half taken alone. The
Spearman-Brown formula in this case is expressed as [Allen and Yen 1979]:

tt =

2 YY'
.
1 + YY'

(5.58)

If scores for the halves have unequal variances or halves are not parallel,
then alpha coefficient should be used to calculate reliability of the whole
test. And if the halves Y1 and Y2 are essentially tau-equivalent, then alpha
coefficient gives the reliability of the whole test. However, if the halves are
not tau-equivalent, alpha coefficient will give lower bound for the tests reliability (i.e. the tests reliability must be greater than or equal to the number
produced by the alpha coefficient).
If alpha produces ahigh value, then the test reliability must be high. If
alpha is low, then we may not know whether the test actually has low reliability or whether the halves of the test are not essentially tau-equivalent. The
formula for coefficient alpha for split halves is given as follows:

2 2X (Y21 + Y22 )
,
tt =
2
X

(5.59)

where:
Y2Y12,1 Y2Y222 2X2Xthe variances of scores on the two halves of the test,
Y21 Y22 2X
is the variance of scores on the whole test, with X = Y1 + Y2.
Values produced by alpha coefficient and the Spearman-Brown formula
will be large if the test halves are highly correlated and small if they are not.
The halves will correlate highly only if they measure traits that are the same
or that are highly correlated, thus, the Spearman-Brown and alpha coefficient
reliabilities are indices of the tests internal consistency or homogeneity.
On the other hand, if variances of the observed scores for the test halves
are equal, the Spearman-Brown formula and alpha coefficient are equal. If
variances of the observed scores for the halves are equal, but the halves are
not essentially tau-equivalent, both Spearman-Brown formula and alpha coefficient will underestimate the test reliability.
An alternate method for estimating the reliability was proposed by Rulon [1939]. This researcher started from the split of whole test into halves,

Selected methods of reliability estimation

189

although he did not require equal level of variances in both halves. However, for the estimation of reliability, some measure of the magnitude of
error variance is needed. Rulon treated the variance of distribution differences between two halves as defined entirely by the error variance of the two
halves. These error variances together formed an error variance of whole
test. Therefore, the variance distribution of the obtained differences was used
to estimate reliability of whole test. It is concluded, that [Rulon 1939]:

tt = 1

2
D

2X

,

(5.60)

where:
2 2
D
Xvariance of distribution differences on total true scores in the first
and second half,
2 2
D
X variance of distribution scores across whole test.
And for sample respectively:

rtt = 1

2
sD

s 2X

.

(5.61)

The difference between the scores obtained by the examinee (considering


two halves of the same test), forms an unintended variance, in other words
error variance. The variance of these differences divided by the total variance
informs us about the share of error variance in the scores. By subtracting the
error variance of 1.0 we obtain true variance.
There are many ways of splitting aset of items. One is called first-half
and last-half, where items interact each other and thus affect each subset.
This is the case when items are scattered throughout alengthy questionnaire
and where examinees might be more fatigued when completing the second
half of the test. Fatigue would then differ systematically between the two
halves and would make them appear less similar. However, the dissimilarity
would not be so characteristic of the items per se as of their position in the
item order of the test. As aresult, such afatigue would lower the correlation
between halves because of the order in which the items were presented, not
because of the quality of the items [DeVellis 2003].
In forming test halves one may use odd-even method where asubset of
odd-numbered items is compared to even-numbered items. In order to split
the halves properly, we need to sort and rank items according to their level
of difficulty. For example, the process of items extraction might be as follows:

190

Reliability and validity in aview of Classical Test Theory CTT

A) 1, 3, 5, 7 subset of odd-numbered items,


B) 2, 4, 6, 8 subset of even-numbered items.
For the examinees participating in atest, items are judged in alternated way.
The score of every examinee in the first test, is obtained by summing up the
number of items well resolved (including odd-numbered items). The score
in the second test is obtained by adding the number of items solved correctly
among the even-numbered items. In next phase, we calculate correlation coefficient between scores of two halves and estimate reliability on test, using
Spearman-Brown formula16.
Another alternative forms are: balanced halves and random halves. In
balanced halves one identifies some potentially important item characteristics (such as item length or type of response). Two halves are then formed,
so the characteristics both of them are equally represented either in the first
or second half, each according to the same level of the item word-formations
and so on. In contrast, in random halves approach we obtain halves based on
random allocation of each item within one of the two subsets. Quality of this
approach depends on the: 1) number of items chosen for analysis, 2) number
of characteristics of the subject considered in the analysis, and 3) degree of
independence among items.
Random halves involve afew important steps: Firstly, two statistics are
computed for each item: 1) the proportion of examinees passing the item
and 2)the biserial or point-biserial correlation between the item score and
the total test score. Then, each item is plotted on agraph using these two
statistics. Items that are close together on the graph are paired, and one item
from each pair is randomly chosen for one half of the test. The remaining
items form the other half of the test. This method helps ensure that two
halves are of approximately the same difficulty and are measuring approximately the same thing.
In summary, in split-half method, item scores obtained from the administration of the test are split in half and the resulting half scores are correlated. The test-scale is usually split in terms of odd and even numbered items
or on arandom basis. Furthermore, although split-half method is the basic
form for intemal consistency estimation, there appears serious problem with
using it in practice, i.e. when different results may be obtained depending
on how the items are split. So the question is how to split the general set of
items in order to get the most effective solution [Anastasi and Urbina 1997].
16

In Magnussons [1981] point of view this is one of the best ways of splitting test in order
to estimate its reliability.

Selected methods of reliability estimation

191

Kuder and Richardson (KR-20 and KR-21)


Kuder and Richardson tried to find asolution to the problem that split-half
methods failed to yield aunique result for agiven test. Their landmark paper contained two estimation formulas, known today as the KR-20 and the
KR21. Formula KR-20 was abbreviated as KR-20, because it was 20th presented formula by Kuder and Richardson [1937]. Another name for this formula in literature is coefficient alpha-20, (20) abbreviated after Cronbach
[1951].
For items that are scored dichotomously, Kuder and Richardson proposed
formulas that split the test up, consisting of k items into k parts. For answers
0 or 1 with equal level of difficulty, afraction of good answers p equals the
fraction of bad answers q. As aresult, one obtains maximum variance for
i-th item:
2Xi = pi qi, 

(5.62)

where qi = 1 pi and pi is the proportion of examinees getting item correct,


that is passing atest (coded as 1).
KR-20 reliability formula is expressed as follows [Kuder and Richardson
1937]:
k

pi qi

tt =
1 i =1 2 ,

(5.63)
k 1
X

where:
variance of the score test,
2Xi =ptotal
i qi
piqi variance of i-th item,
k number of items included in the test.
The ps and qs are calculated for each item. When they are summed up we

obtain

piqi. However, in the process of scale development we usually take


i =1

into account of p values in order to determine the level of each items difficulty.
Equation (5.63) has a very definitive interpretation. It measures the homogeneity of the items in ascale. When it has the value 1 the items are perfectly
intercorrelated with equal variance and when it obtains the value 0 the items
are mutually independent. Perfect homogeneity of items, simply means that

192

Reliability and validity in aview of Classical Test Theory CTT

every item measures the same quality and therefore the same quality is measured by the test as awhole. From this perspective, one can note that tt reflects
asort of composite measure of item validity [Dressel 1940]. This property is
not unique to this formula, but is, as was proved by Richardson, aproperty of
the ordinary reliability coefficient [Richardson 1936].
If one doesnt know the level of difficulty of the particular items (as it may
happen in KR-20 formula), then one may accept KR-21 formula where this
level is estimated approximately or is comparable among the tested items.
Here, one part from formula (5.63):
k

piqi,

(5.64)

i =1

is replaced with kp q .
In consequence, KR-21 Kuder-Richardson formula is calculated as follows:

tt =

k kp q
1 2 ,
k 1
X

(5.65)

where:
p average value of p in k tested items (i.e. the average of item difficulties),
q average value of q in k tested items.
The other yet modification of

piqi in KR-20, was proposed by Ferguson


i =1

and Takane [1999]. This modification included variables weighting w, e.g.


+1, 0, 1 or 0, 1, 2, 3, 4,
where variance of i-th item in the test is calculated as follows:

2Xi

k =1

wk2 pk

wk pk , for k = 1, , m, i= 1, , n.  (5.66)

k =1

Having then summed up variances of items, we place this sum into:


piqi,

(5.67)

i =1

of formula (5.63).
Generally, these two formulas (KR-20 and KR-21) will be equal if item
difficulties are all equal. If item difficulties are not equal, KR-21 will be lower

Selected methods of reliability estimation

193

than KR-20 and will underestimate the test-scales reliability. For this reason,
it is not acceptable for the researcher to report only the KR21 estimate for
aset of test scores.
Tucker [1949] claimed that KR-21 seriously underestimates the reliability
of a test, and KR-20 yields a much better estimate. He argued, if formula
KR20 was rewritten the estimate of KR-20 would be largely improved17.
Both types of formulas had been in use for years, especially number 20.
The principal advantages claimed for those formulas were: ease of calculation and uniqueness of estimate (compared to split-half method).
Hoyts method
Another formula for reliability estimation was developed by Hoyt. Working independently of Kuder and Richardson, Hoyt [1941] developed an approach to the estimation of reliability which yielded results identical to those
obtained from coefficient alpha. Hoyts method was based on the analysis
of variance, treating the examinees and items as sources of variation. This
method proceeds from apartitioning of sums of squares to the various effects of the test to determination of residual error18.
Using standard analysis of variance notation, Hoyt [1941] defined the reliability estimate as follows:
tt =

MSPersons MSResidual
,
MSPersons

(5.68)

where: MSPersons is the mean square term for examinees, taken from the analysis of variance summary table, and MSResidual is the mean square term for
the residual variance in the same table.
Hoyt related this formula to the theoretical definition of the reliability
coefficient by noting that MSPersons represents the observed score variance
and MSResidual represents the error variance in the theoretical reliability expression:
tt =
17

X2 E2
X2

.

(5.69)

For further details of the proposed algorithm see [Tucker 1949].


As Hoffman [1963, p. 276] stated: depending upon the type of model employed and
assumptions involved, the error may be considered as an unbiased estimate of error of the
measurement. Appropriate ratios F may be then employed to establish the significance of individual variation.
18

194

Reliability and validity in aview of Classical Test Theory CTT

Cronbach method
Cronbach [1951] has created an alternative solution for reliability test, that
was derived from KR-20 Kuder and Richardsons earlier formula19 [Kuder
and Richardson 1937]. When he referred to earlier works of Kuder and Richardson [1937], Hoyt [1941] and Guttman [1945], he claimed that [Cronbach
1951, p. 299]: making the same assumptions but imposing no limit on the
scoring pattern, will permit one to derive the formula in the form of alpha,
as given:
k

2Xi

k
=
1 i =1 2
k 1
X

.

(5.70)

where:
k number of items in the scale, where k 2,
2Xi variance of i-th item,
2X total variance of the scale.
As aresult, the original assumption of dichotomous variables was extended
to more general, however, the rigid assumptions of equal variances and equal
covariances of the items were hidden. Instead of k equal variances, Cronbach
used asum of the observed variances. In that sense, alpha was algebraically
identical to the Guttmans formula:
k

2Xi

k
3 =
1 i =1 2
k 1
X

.

(5.71)

Guttman derived it as alower bound, reminding that the equality holds only
if the variances and covariances are all equal. Cronbach did not make so
explicit assumption. Instead, he wrote: since each writer offering aderivation used his own set of assumptions, the precise meaning of the formula
became obscured. The original derivation unquestionably made much more
19

Cronbach has published article in the Psychometrika journal. Since that time, the names
Kuder and Richardson were instantly dropped away and the formula has been referred to as
Cronbachs alpha. The article, Coefficient alpha and the internal structure of tests, has become
probably the most referred paper in the psychometric literature.

Selected methods of reliability estimation

195

stringent assumptions than necessary []. In this paper, we take formula [of
alpha] as given, and make no assumptions regarding it. Instead, we proceed
in the opposite direction, examining the properties of alpha and thereby arriving at an interpretation [Cronbach 1951, p.299].
If we take into account the structure of alpha coefficient, we notice it is
concerned with the variance that is common among items. For aset of items
composing scale, the variance in that set is composed of true variance and
error variance. Therefore, alpha describes the partitioning variance of the
total score into true and error components. In short: 1 error variance =
= alpha and 1 alpha = error variance. Alpha represents the proportion of
ascales total variance that is attributable to acommon source (that common
source being the true score of latent construct which is measured) [Netemeyer, Bearden and Sharma 2003].
The variance of the sum in scale will be smaller than the sum of item variances if items measure the same variability between objects, that is, if they
measure true score. For example, the variance of the sum of two items equals
the sum of two variances minus covariance, that is the amount of true score
variance common to these two items. Estimation of the proportion for true
score variance is captured by item when comparing the sum of item variances
with the variance of the sum in scale. If there is no true score but only error in
the items (which is esoteric and unique), then the variance of the sum will be
the same as the sum of variances for particular items. In consequence, coefficient alpha will equal zero. If all items are perfectly reliable and measure the
same latent construct, then coefficient alpha is equal 1.
Because the total variance can be restructured as the sum of item variances plus two times the sum of the item covariances, alpha can be restructured into following formula [Peter 1979]:

k
=
1
k 1

i =1

i =1

2Xi

2Xi

+2

where:
k number of items in the scale,
2Xi variance of i-th item in test-scale,
Xij covariance of i-th items.

i> j

ij

,

(5.72)

196

Reliability and validity in aview of Classical Test Theory CTT

Now, lets assume e.g. four-item covariance matrix:


12 12

21 22

31 32

41 42

13
23
32
43

14

24
,
34

42

(5.73)

where the total variance equals X1 + X2 + X3 + X4, which is then equal to:
12 + 22 + 32 + 42 + 12 + 13 + 14 + 21 + 23 + 24 + 31 + 32 +
+ 34 + 41 + 42 + 43 =

i =1

2Xi +

X
i> j

ij

.  (5.74)

Note here, the variance of the total score, equals the sum of all the variances
and covariances in the covariance matrix. The diagonal elements represent
the covariance of an item with itself, that is, the variability in the score of an
item from agiven group. As such, diagonal elements are unique sources of
variance and not variance that is common or shared among items. The off
diagonal elements are covariances that represent the variance that is common by any pair of items in the scale.
Formula (5.72) represents the unstandardized alpha coefficient, for it
consists of variances and covariances. If items would have markedly different
variances, those with larger ones would be given greater weight than those
with lesser ones. As aresult, researcher should use standardized alpha.
In standardized formula, we use correlation matrix with 1s on the main
diagonal (e.g. correlation of an item with itself) and correlations among
pairs of items as the off-diagonal elements (e.g. standardized covariances).
Itis calculated as follows [Netemeyer, Bearden and Sharma 2003]:
=

k
,
1 + (k 1)

(5.75)

where is the average correlation among the items.


Formulas, either based on covariances or correlations, are sometimes referred to as the raw score and standardized score solutions for alpha. The raw
score formula preserves information about item means and variances in the

Alpha factor analysis and principal components reliability estimation

197

computation process, because covariances are based on values that retain the
original scaling of the raw data. In contrast, the standardized score formula
based on correlation is astandardized covariance. Hence, all items are placed
on acommon metric and thus weighted equally in the computation of alpha.
Which is better depends on the specific context and whether the weighting
is desired [DeVellis 2003].
At last, Cronbach together with Rajaratnam and Gleser changed the
course of reliability estimation, by giving a completely new definition for
alpha. One of the aims was to discard the restrictive assumptions of the classical model and measures. Cronbach thus admitted that alpha was too limited in its original form. He trusted that [Cronbach, Rajaratnam and Gleser
1963, p. 154155]: obscurities and inconsistencies in the choice of formulas
would be eliminated by the new development. The new formulation was
called the theory of generalizability and was based on additive analysis of
variance models and the intraclass correlation. The concept was somewhat
more general than before, although not completely new, since the reliability
issues based on the analysis of variance had been considered already earlier [Hoyt 1941]. However Cronbach, Rajaratnam and Gleser [1963] did not
generalize the measurement model, and thus the coefficient derived was essentially the same as before.

Alpha factor analysis and principal components reliability


estimation
If there are more latent variables, that is, when we face amultidimensional
construct, we may apply alpha to check the reliability levels of each set of
items in such construct separately. However, in the literature, there appeared
some other alternative solutions, which are discussed below.
First work on the use of factor analysis and Cronbachs alpha, was started
by Kaiser and Caffrey [1965] who tried to combine factor analytic approach
with alpha, by developing anew method of factor analysis. Their suggestion
(called alpha factor analysis), was analogous to Raos [1955] canonical factor
analysis. However, the weighting of the items differed dramatically. Instead
of using as weights the inverses of the unique variances, Kaiser and Caffrey [1965] used inverses of the communalities, thus giving more weight for
items with lower communality. This surely was not the goal the goal was to
develop apsychometric factor analysis but the contradictory results could

198

Reliability and validity in aview of Classical Test Theory CTT

not be avoided, since the idea was based on maximizing the generalizability
of the factors, that is, Cronbachs alpha.
Following Lords [1958] treatment, Kaiser and Caffrey [1965] ended up to:
=

k
1
1
k 1 i

.

(5.76)

where i are eigenvalues of the weighted correlation matrix of the items. The
maximization procedure of alpha led to the principal components of the
standardized items, as Lord [1958] had earlier proved.
From Eq. (5.76) we notice that alpha tends negative as soon as i drops
below one. Kaiser and Caffrey formulated this as aclear rule, explaining that,
only those alpha factors which have positive generalizability, i.e. the associated eigenvalues greater than one, should be accepted [Kaiser and Caffrey
1965, p.11]. Kaiser [1960] had earlier suggested the same procedure in the
context of principal components. In the special case of one alpha factor, Kaiser and Caffrey [1965, p.8] reasoned that it is always perfectly generalizable.
Bentler [1968] criticized alpha factor analysis, and suggested instead
alpha-maximized factor analysis (alphamax). Both methods carrying the
name of alpha were based on maximizing Cronbachs alpha. However, difference is that in alphamax variables are weighted by the inverses of the unique
variances, not by the inverses of the communalities. Bentler [1968] simply
showed that the maximization of alpha in the traditional form leads to principal components analysis, while maximizing the weighted alpha leads to
factor analysis.
Heise and Bohrnstedt [1970] suggested areliability measure for composite variables in the context of factor analysis. They worked on the basis of
the sample correlation matrix R with the basic equation of factor analysis as:

R = FFT + V,

(5.77)

where F denotes factor matrix, and V represents diagonal matrix of unique


variances.
This reliability measure was weighted by avector of factor loadings as:
=

T (R V)
TR

.

(5.78)

where R V is a correlation matrix with communalities in the diagonal.


In the case when the communalities are known, omega was claimed to be

Alpha factor analysis and principal components reliability estimation

199

exactly equal to the reliability of acomposite [Heise and Bohrnstedt 1970,


p.117].
The relations of alpha and omega were studied by Smith [1974], who concluded that if the research design limits one to internal-consistency estimates of reliability, then omega is clearly the choice. He thought that Heise
and Bohrnstedt chose their symbol with abit of frivolity [Smith 1974, p.507].
Also Harman [1976] has discussed the concepts of reliability, communality
and specific variance in factor analysis. In his opinion, if the communalities
can be estimated by the item reliabilities, then omega (5.78) is equal to Mosiers
formula [1943, p. 162], which was presented along time ago and which is:
tt =

T R*
T R

,

(5.79)

where R* is the correlation matrix, whose diagonal terms are the item reliabilities.
Armor criticized the assumptions of Cronbachs alpha and the methods of
item analysis, claiming that [Armor 1974, p. 18]: mathematical assumptions
for alpha reliability are often not met. The usual steps of item analysis, throwing out bad items to enhance alpha reliability, may not in fact produce
optimal alpha reliability.
Armor worked with principal components and came up with the following solution:
=

k
1
1 ,
k 1 1

(5.80)

where 1 is the highest eigenvalue of correlation matrix among all the items
of the scale.
It may be noted that formula (5.80) is identical to Kaiser and Caffreys
[1965] alpha in (5.76), but this fact seemed to be ignored by Armor. Besides,
although Armor referred to Bentler [1968], it seems that he did not catch
Bentlers critique concerning the use of principal components.
Greene and Carmines [1979] summarized the variations of Cronbachs
alpha, concluding that Armors theta (5.80) is equal to amaximized alpha,
and the alpha related to Bentlers alpha-maximized factor analysis is equal
to amaximized omega [Bentler 1968]. Greene and Carmines stressed that,
the condition of essential tau-equivalence is neither necessary nor sufficient
condition for alpha to be equal to the true reliability, except in the case of
equal weights.

200

Reliability and validity in aview of Classical Test Theory CTT

Greene and Carmines [1979] claimed also there are two general important differences between theta and omega:
Firstly, they are based on different models. Theta is grounded in the
principal components model, whereas omega is based in factor analysis. That means one always uses 1.0s in the main diagonal to compute
the eigenvalues on which theta is based but the value of omega depends, in part, on communalities, which are estimated quantities not
fixed ones. And because omega is based on estimated communalities,
there is an element of indeterminacy in its calculation that is not present in theta.
Secondly, unlike theta, omega does not assess the reliability of separate
factors in case of multiple dimensions. Omega rather provides acoefficient that estimates the reliability of all the common factors in agiven
item set.

Types of validity
In order to plan agood validation study, the desired inference must be clearly identified in advance by the researcher. Then astudy is designed to gather
evidence of the usefulness of the scores for such inferences. There are three
major types of validation which are discussed below [Guion 1950]: 1)content validity which includes: face validity and logical validity, 2) pragmatic (criterion validity) which is split into: predictive and concurrent
validity, and 3) construct (theoretical) validity.

Content validity
Content validity is based on two major approaches: face validity and logical
validity. Face validity is established when researcher examines the test and
concludes that it measures the relevant trait or field of research. If examinees
disagree in exploratory study, face validity is in question. Face validity can
be crucial for effective test use, although in some cases face validity is not essential if test is valid in other ways. As compared, the logical validity is more
sophisticated version of face validity. It involves the careful definition of the
domain of e.g. behaviors to be measured by atest and the logical design of
items to cover all the important areas of this domain.
Content validity while necessary, may be insufficient to assess the true
validity of the measured construct and scale which is being under develop-

Types of validity

201

ment. This type of validity refers to the extent to which items fairly represent
the value dimension being measured. The question is only, how researchers
can be sure that their measures are correct and have appropriate level of
content? The best way would be to conduct athorough exploratory research
before the implementation of final research project. However, within such
aprocedure, first problem that appears is the question of an adequate selection of items from the entire universe of items20. So, in order to ensure that
the test items are completely included and are in the right proportions, all
major aspects of the explored domain, which are to be the subject of research, should be pre-analyzed. For example, from the exploratory research,
we may conclude that in test, there are over-represented items which do not
cover particular aspects of the domain. In this context, the research area
should also take into account both the core objectives of the research as well
as rules in data application and interpretation.
In addition to exploratory studies we may also apply more advanced empirical procedure that provides additional information about the content validity, and which may help in selection of the most needed items to the test.
Lawshe [1975] proposed the following coefficient known as content validity
ratio (CVR):
rCVR =

nE
n
2

n
2

(5.81)

where:
nE number of examinees, who defined i-th item as acore for the testconstructed scale,
n total number of examinees in asample.
20

Selection of items should be carried through consultations with experts in the explored
domain. Thus, having based on the information gathered in this way, one can make the specification of the preliminary contents of items, useful for those experts who will continue further
work on the final preparation of items. This specification should include content and topics to
be covered in the research, as well as test objects or processes to be taken into account, and the
relative importance of particular subjects and objectives in general. Additionally there should
be included some information about how many items of each type should be developed for
each subject within the bounds of explored domain. Finally, if there were engaged experts,
we need to specify their number and qualifications. If they acted as judges in the process of
items classification and clarification, we need to cite all the instructions they have received,
and present the degree of compatibility between them. Also, due to the fact that the content
may vary over time, one needs to provide dates of consultations with experts.

202

Reliability and validity in aview of Classical Test Theory CTT

If CVR obtains anegative value then less than ahalf of the examinees in
agroup considered i-th item as acore. For 0 (it reflects exactly half of the
group) and positive values mean that more than ahalf positively considered
this item in awhole test.
Finally, because the content validity is based on subjective judgments,
the determination of validity is more subject to error than other types of
validity.

Pragmatic-criterion validity
In pragmatic validity (also referred to as criterion validity) the researcher
is primarily interested in some criterion which he wishes to predict. In
order to do so, he must administer the test to obtain an independent criterion measure on the same subject and then compute acorrelation coefficient. If this criterion was obtained some time after the test was performed,
then he will study predictive validity. If the test score and criterion score
were determined at essentially the same time, then he will study concurrent validity. Thus, pragmatic (criterion validity), includes predictive and
concurrent validity [Cronbach and Meehl 1955, p. 283]21. In pragmatic
(criterion) validity, construct measures should be correlated with the selected for the analysis criterion variable(s). As aresult, researcher may try
to explain to what extent obtained scores of the particular test are related
to the outer variable, i.e. avariable which is based on some predefined
criterion.
When calculating the predictive validity coefficient (denoted as correlation), we wish to predict the items and respective scores given by examinees from the research. Especially we strive to predict the characteristics
of items distribution, to be obtained at alater time. Thus by the predictive
validity test, we obtain some kind of prognosis of the scores. However, as
Bechtoldt [1959, pp.619623], claimed the time interval and simultaneously adistinction between the obtained criterion and atest in itself seems
to be not always appropriate. For example, aserious risk (related to ashort
period) lies in the possibility of uncritical generalization of the conclusions derived from the research for longer period. But we all know that
21

The term prognosis can be used in a wider sense, in order to denote any predictions
made on the basis of the obtained test scores, or in the narrower sense meaning a prediction
of what will take place after a certain time. Here, the most significant meaning for predictive
validity has the term used in narrower sense.

Types of validity

203

such assumptions require some separate studies. Generalizing the research


results in reference to contemporary events, but assuming various time
intervals (on the other hand) is due to the wrong assumption of invariance
of behaviors (e.g. people), which cannot be in this situation taken seriously
[Bechtoldt 1959].
The extension of validation studies for the time needed to determine the
predictive validity is often not feasible. A compromise solution to this situation is to examine the test group for which the criterial data is already available in advance. In concurrent validity, one test is proposed as asubstitute for
another, or atest is shown to correlate with some contemporary criterion. In
this case, measurement of criterion variable is undertaken at the same time,
at which atest is executed. This solution leads to reduction of time and unnecessary research cost. It also should lead to the same score as the measurement of criterion. It is worth noting that the concurrent validity is calculated
for the tests that are used in the direct diagnostic situations. Moreover, in
estimation of the concurrent validity coefficients, one should also have in
mind that the criteria, may yield sometimes very different quality [Aranowska 2005].
The pragmatic criterion-validity can be verified by the correlation coefficient between tested scores and scores of variables that are criterionbased. This correlation is symbolized as XY where X is the test score and Y
is the criterion score. Validity coefficient is estimated in two ways, resulting
in either apredictive or concurrent validity estimate. In both cases, there
are no differences between the predictive or concurrent validity estimation, hence the validity is expressed by the same correlation coefficient. Its
interpretation is simple the higher coefficient, the higher validity of the
test in general. However, depending on the specificity of collected data, we
should wisely make choice between Pearson correlation or Spearmans
coefficient. Moreover, likewise (as compared to reliability coefficient) in
criterion-based validity coefficient, adispersion of data plays akey role. If
there is alower dispersion of scores then lower correlation will be obtained
[Rosenberg 1965; Park, Mothersbaugh and Feick 1994].
In literature one can also find the other types of criterion-related validity. They are [Netemeyer, Bearden and Sharma 2003]: convergent and discriminant validity. Both convergent and discriminant approaches to validity
analysis can be analyzed with MTMM (multi-trait-multi-method) matrix
[Campbell and Fiske 1959]22.
22

See further discussion in text.

204

Reliability and validity in aview of Classical Test Theory CTT

Discriminant validity requires that ameasure does not correlate too highly with measures from which it is supposed to differ. In convergent validity, as
Churchill and Iacobucci stated [2002, p.413]: ameasure will be convergent
if measures of the same construct converge or are highly correlated. The
evidence of convergent validity is offered by significant and strong correlations between different measures of the same construct. Obviously problems
in generating evidence of convergent validity often occur in the early phase
of measurement development when alternative measures are not available.
These instances are frequently encountered for constructs that have not been
studied previously or have been investigated using ad hoc or inadequately
developed operationalization methods.
More importantly, convergent validity is used when there is no acceptable single method to serve as an absolute valid standard for measuring the
construct of interest. If the results of the two maximally different independent methods are in close agreement, both are said to share in establishing
convergent validity. However, if the results demonstrate alow level of agreement between the independent attempts to measure the construct, at least
one of the two methods must be invalid. Of course, there is nothing in the
convergent validity procedure to indicate whether one or both of the methods are invalid. Even if one knows that just one of the methods is invalid, this
procedure will not identify the invalid method. The degree to which asingle
method is declared to be invalid depends on the face validity aresearcher
ascribes to each method and any relevant empirical observations [MacKay
and Summers 1977, p. 263].

Construct validity
Construct validity is the heart and soul of validity analysis. It is adegree to
which it measures the theoretical construct that was designed to measure.
However, establishing construct validity is an ongoing process, which we
call validation. On the basis of theory, researcher in regarding the construct
which is measured, makes some kind of prediction about how test scores
should behave in various situations. These predictions are then tested. If the
predictions are not supported by data, there are at least three alternative conclusions that can be drawn23:
23
The construct validity and scale validation procedure is similar to estimation of the
unknown population parameter. Just as the researcher can never know the true value of the
parameter, the researcher can never truly know the construct validity of measures. However,

Validation methods

205

research was flawed,


the theory was wrong and should be revised,
the test does not measure the construct.
Cronbach and Meehl [1955, p.282283] explained that: construct validity is not to be identified solely by particular investigative procedures, but
throughout the orientation of the investigator in general24. Unlike the criterion-oriented validity, as Bechtoldt emphasized [1951, p.1245]: involves
the set of operations as an adequate definition of whatever is to be measured. When researcher believes that no criterion, available to him, is fully
valid, he becomes interested in construct validity because this is the only way
to avoid the infinite frustration of relating every criterion to some more
ultimate standard.
The procedure of the construct verification does not differ much from the
procedure commonly used in the hypothetical-deductive methods of any
kind of social research. Such procedure proceeds according to the classical scheme: theory deduction hypothesis test data analysis that
supports the hypothesis or rejects it [Magnusson 1981, p.194]. Collecting
the data (which we need to determine the construct validity) starts from the
formulation of hypotheses about the subject of inquiry and characteristics of
examinees receiving, e.g. high test scores as opposed to those receiving low
scores. A set of hypotheses of this type creates an initial theory about the
essence of the construct that measures what was intended to measure. And
when researchers theory (about what the test measures) is essentially correct, most of the predictions should be confirmed.

Validation methods
More than just one approach to validation of the construct requires the compilation of multiple types of evidence. Three widely used methods to validation will be discussed below.
r esearcher can make at least some educated guesses. He knows that a scale does not have construct validity if it fails any of the reliability and validity tests, e.g.: low coefficient alphas, low
item analysis coefficients, poor pragmatic validity or simply put, deficiencies in content validity.
24
In the opinion of Brzeziski [1978], construct validity studies represent a cumulative
process of collected scores of many studies that seek to answer the question about the functional relationships of many variables.

206

Reliability and validity in aview of Classical Test Theory CTT

Group differences analysis and measuring change in scores


withtime lapse
If theory implies group differences in test scores, the analysis should be considered by collecting data and conducting areasonable statistical test. An example of this approach is contrasting the mean scores of males and females
(based on some ratings pertaining to scale measuring personal values) in
order to see if they differ in the hypothesized direction. Failure to find the
expected differences would raise doubts about either construct, measuring
personal values or the adequacy of instrument as ameasure of the construct,
or both. On the other hand, such studies may be experimental in design, with
the goal of demonstrating that examinees who received aspecific treatment,
designed to alter their standing on the construct, differ from examinees who
have received no treatment. If expected differences are not found, the possible
explanations are failure of the theory underlying the construct, inadequacy of
the instrument for measuring the construct, or failure of the treatment.
Now, if differences appear in test scores in time, and they are based on
e.g. age, system of values of examinees, there will be raised doubts about the
construct and adequacy of instrument too. Change analysis in time is applied in order to check the validity of the measured construct and adequacy
of instrument, as long as the theory may imply that test scores would be
changed with time or after some experimental intervention, based on preassumed earlier research conditions. In the latter, change of scores in time
is estimated using observed pre-test scores X, and posttest scores Y, after
some time. The classical test theory model is very useful here, when we have
to decide on the proper way of estimating change or difference in scores.
This change/difference in scores will be denoted by the following assumption: Y X of the scores, where both X and Y have respective measurement
errors EX, EY. In consequence, the difference of scores, denoted as D, is given
as [Allen and Yen 1979]:

D = Y X = TY TX + EY EX = TD + ED, 

(5.82)

where: TD = TY TX and ED = EY EX.

Correlation between ameasure of the construct and designated


In this approach, we establish correlational evidence of the relationship between scores on e.g. personal values test which measures hedonism construct

Validation methods

207

and other measures such as spending power of examinees for specific shopping
activities)25. Although it would seem illogical in the first instance to argue
that hedonism and spending money are identical constructs, it can be argued
that there is, or should be, arelationship between them. If this were not the
case, the usefulness of hedonism as aconstruct would be diminished.
There are no generally recognized guidelines for what constitutes adequate evidence of construct validation through correlational studies. Individual correlations may, of course, be tested for statistical significance,
reported proportions of variance shared and so on. However, such information alone, is probably not sufficient without comparison with the range
of values that have been reported previously by others who have developed
measures of the same or similar constructs. In many cases the correlational
approach involves application of multiple regression, so that contributions of
the construct of interest to variance in the criterion may be assessed relative
to the contributions of other variables.

Factorial validity and multi-trait-multi-method


In factorial validity approach we obtain a set of n measurements on the
same set of examinees and compute the correlation matrix between these
measurements. After that we use factor analytic techniques to identify latent variables (factors) which account for variation in the original set of
observed variables (items). In this solution, a matrix of item intercorrelations (for items on the same instrument) is factored in order to determine
whether item responses cluster together in patterns predictable or reasonable in light of the theoretical structure of the construct of interest. Variation in responses to items that form acluster can be attributed to variation
among examinees on acommon underlying factor. Such afactor, which is
unobservable, will be considered as construct suggested by particular set of
empirical observations. The issue is, however, whether the constructs, empirically identified through the factor analysis, correspond to the theoretical
constructs which the researcher has earlier hypothesized. For example, some
items in aset might be clearly clustered on one factor and some other items
might be clearly clustered on second factor, which would probably fit the
researchers expectation. If, however, first set of items (loading on factor one)
would be simultaneously clustered on the second factor or some other dif 25

Such information helps potential test users to evaluate the strength of the evidence presented for the construct validity of the scores.

208

Reliability and validity in aview of Classical Test Theory CTT

ferent factors, this might raise questions about the validity of the construct
measured by this collection of items. In the second case, acorrelation matrix
for aset of n different measures may be factored to determine the extent to
which correlations among the observed scores on these measures is attributable to variance on one or more common, underlying factors.
Next method of construct validation is called multi-trait-multi-method
validity (MTMM). In the early phase of MTMM development, Campbell
and Fiske [1959] described this approach as concerned with the adequacy
of tests as measures of construct, rather than the adequacy of aconstruct as
determined by the confirmation of theoretically predicted associations with
measures of other constructs. In MTMM two types of validity coefficients
are of special interest: convergent and discriminant validity26.
In MTMM, we must think at least of two ways to measure the construct
of interest. Researcher can be asked to identify other, distinctly different constructs which can be appropriately measured by the same methods applied
to the construct of interest. Using one sample of examinees, measurements
are then obtained on each construct by each method where correlations between each pair of measurements are computed. Each correlation coefficient
would be identified as one of three types:
reliability coefficients-correlations between measures of the same construct using the same measurement method; ideally they should be
high,
convergent validity coefficients-correlations between measures of the
same construct using different measurement methods; ideally these
should also be high, but the possible attenuation, because of unreliability of the measurement methods, should be considered,
discriminant validity coefficients-correlations between measures of
different constructs using the same method of measurement (called
heterotraitmonomethod coefficients) or correlations between different
constructs using different measurement methods (called heterotraitheteromethod coefficients); ideally these should be substantially lower
than reliability or convergent validity coefficients [Crocker and Algina
2008].
After that, in order to facilitate comparisons among all these different
types of coefficients, correlations are arranged in amulti-trait-multimethod
26

Cook and Campbell [1979, p. 61] summarized this, stating that: at the heart of assessing construct validity are two processes: first, testing for convergence across different measures of the same concept, and second, testing for divergence between measures of related but
conceptually distinct concepts.

Items analysis in reference to difficulty and discrimination indices

209

matrix27. The validity matrix is similar to a correlation matrix, which is


a rectangular display of correlations. A correlation matrix has 1.0s on its
main diagonal, that is, the correlation of each variable with itself, by definition equals to 1.0. However, in MTMM matrix 1.0s are replaced by estimates
(internal consistency) reliabilities, which are placed in the upper part of the
matrix on main diagonal. Evidence of convergent validity is provided by the
correlations included in the other part of the same matrix (so-called submatrix, where they are also placed on diagonal, for example, they may be
located in lower left corner of matrix).
Finally, although Campbell and Fiske recommended visual inspection for
assessment of construct validity in such amatrix, this approach can be problematic due to the large data matrices. Hence other authors offered analytic
procedures which may result in clearer interpretations of such data matrices
for reasonably large samples [Lomax and Algina 1979; Marsh and Hocevar
1983].

Items analysis in reference to difficulty and discrimination


indices
An important aspect to be considered in the design of test-scale is not only
the items identification, formulation, but also the additional verification of
items difficulty and items discrimination in order to differentiate examinees
responses that give a joint picture of the individual differences within the
measured construct. Thus, the primary measures used in the item analysis
are item-difficulty index (IDI), and item-discrimination index. Both statistics
indicate whether examinee who does well on the analysis as awhole (that is,
aperson who presumably is high on the construct being measured) is more
likely to get the particular item correct than an examinee who does poorly
on the test as awhole. In other words, item-discrimination index indicates
whether an item will discriminate between examinees who do well and those
who do poorly on the test as awhole. Taking the item difficulty and itemdiscrimination indices into consideration, we hope to construct a test that
27

In fact, there are MTMM correlation matrices which comprise the (linear) relationship indexes among several traits evaluated by different measurement methods. These matrices have often been used over the years by psychologists in various substantive areas [Raykov
2011, p.3839].

210

Reliability and validity in aview of Classical Test Theory CTT

conveys the most information possible about differences in the examinees


levels on the construct.
Item-difficulty index IDI enables evaluation whether the difficulty
of an item is suited to the level of examinees taking the test. Item-difficulty
for i-th item is defined as the proportion pi of examinees who get that item
correct. So then, an item with adifficulty of e.g. 0.3 is more difficult than an
item with difficulty of 0.8, because fewer examinees responded correctly to
the former item. If pi is close to 0 (no one got the item right and no differential information is provided) or 1 (everyone got the information correct and
again no differential information is provided), the item should be altered
or discarded, because it is not yielding any information about differences
among examinees values.
The IDI for i-th item di (difference) is calculated by the formula [Allen
and Yen 1979]:
di =

U i Li
,
niU niL

(5.83)

where:
Ui number of examinees who obtained total scores in the upper range
of total test scores and who have item correct,
Li number of examinees who obtained total scores in the lower range
of total test scores and who have item correct,
niU number of examinees who obtained total scores in the upper range
of total test scores,
niL number of examinees who obtained total scores in the lower range
of total test scores.
If niU = niL = ni, this formula will be reduced to:
di =

U i Li
.
ni

(5.84)

The difference di exists between a proportion of high-scoring examinees


who get the item correct and the proportion of low-scoring examinees who
get the item correct. Upper and lower ranges generally are defined as the upper and lower 10 to 33% of the sample, with the examinees ordered on the
basis of their total scores. If total scores are normally distributed, it is optimal
to use 27% of the examinees with the highest total test scores as the upper
range and the 27% of the examinees with the lowest total test scores as the
lower range.

Items analysis in reference to difficulty and discrimination indices

211

Item-discrimination index riXpb is the alternative to di. It is sometimes


called as item/totaltest-score point-biserial correlation. This index marks the
degree to which responses to one item are related to responses of other items.
Item-test-score point-biserial correlation riXpb will denote the correlation
between scores on item i-th and total test score X:
riXpb =

Xi X
pi

,
sX
1 pi

(5.85)

where: Xi is the mean of the X scores among examinees passing item i-th, Xi
and sX are the mean and standard deviation of the X scores among all examinees; pi is the item difficulty.
Sometimes riXpb is converted to abiserial correlation as follows:
riXbis =

Xi X

sX

pi
.
f (z )

(5.86)

1 z 2 /2
e
is the standard normal density evaluated at the value
2
of anormal variate z, above which pi of cases fall.
The biserial correlation coefficient riXbis is related to the point-biserial correlation in the same way that the tetrachoric is related to the phi-coefficient
which is:

The f (z ) =

riXbis =

pc pX pY

pX (1 pX ) pY (1 pY )

,

(5.87)

where X and Y are variables, pc is the proportion of examinees scoring 1 on


both X and Y, and pX and pY are the proportions of examinees scoring 1 on
X and Y, respectively.
The biserial correlation is used to correlate the artificially dichotomized,
normally distributed variables with acontinuous or multi-step variable. The
calculations make corrections for the dichotomization, yielding an estimate
of the Pearson correlation that would have been found in the data if it had
not been dichotomized.
The correlation between item scores and total test scores, whether it will
be a point-biserial or a biserial correlation, behaves similarly to di, as responses on an item become more highly related to total test scores, hence di
and riXpb will both increase.

212

Reliability and validity in aview of Classical Test Theory CTT

When an item is uncorrelated with all the other items in atest, the item/
total-test-score point-biserial will be still positive, because the item score is
included in the total score. To control for this effect, an item point-biserial
can be calculated using test scores X', in which the item score is not included.
For example, if we are dealing with the point biserial for the first item in
a25-item test, the examinees scores on the 24 items excluding item 1 would
be used of X in evaluating Eq. (5.85). The point biserial correlation for the
second item would be based on the examinees scores on the 24 items excluding item 2 and so on. Obviously, the calculation of point biserials using
X' scores is more laborious than the calculation of point biserials using total
test scores. The correlation riXpb' will be lower than riXpb if there are few items
in atest. However, if there are alarge number of items in the test, riXpb' and
riXpb will be similar in values, and little will be gained by the use of X' in place
of X.
The item-discrimination index di and the item/test correlation riXpb are
valuable pieces of information. On any reasonable item, di and riXpb should
be positive, i.e. more high-scoring examinees than low-scoring examinees
should answer the item correctly. An item with negative value for di and
riXpb apparently measures the opposite of what the test measures. A negative
di and riXpb might suggest that an error was made in the scoring of the item
or that the item is poorly worded, etc. Items with low di and riXpb generally
should be improved or eliminated [Allen and Yen 1979].

VI. EXPLORATORY (EFA)


AND CONFIRMATORY (CFA) FACTOR ANALYSIS
FOR SCALE DEVELOPMENT

Relationship between factor analysis and classical test


theory CTT
In the process of scale development, according to CTT, true score may be
obtained either by summing responses across the items which form a respective summated scale in relation to one-factor model, or through the
multi-factor model, where more than just one factor and true score is generated1. In CTT we use two general types of measurement approaches. The
former one is focused on items and their relationships to one latent variable
[Guttman 1945; 1954]. The latter is focused on the mathematical transformations of observed scores into multiple true scores/latent variables, which
pertains to multidimensional models (such as exploratory or confirmatory
factor models).
As mentioned, the assumptions of the factor analysis (FA) are based on
the classical test theory (CTT) due to the following formula:
X = T + E. 

(6.0)

In 1904 Spearman proposed a formula for correcting the effects of measurement errors
in order to find the true relation between two variables. This idea, together with his famous
application, measuring the general intelligence, marks the introduction of factor analysis (FA).
Spearmans original theory of the general factor and the specific factor corresponds to the
one-factor case in the modern terminology.

214

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

B
F

Legend: A classical test theory, and B factor analysis model.

Figure 20. Difference between classical test theory and factor analysis model
Source: Netemeyer, Bearden and Sharma 2003, p.42.

However, in FA we assume that X = F + (see Figure 20). In the light


of CTT, when true score T varies, so does the observed score X, because the
true score influences the observed score. Also if true score and error are uncorrelated, then, variance of observed score 2(X) is equal to the variance of
the true score 2(T) plus the variance of the error 2(E), which is expressed
as follows:

2(X) = 2(T) + 2(E). 

(6.1)

In CTT the reliability of measurement XX, is defined as the ratio of the variance of the true score to the variance of the observed score:
XX =

2 (T )

2(X)

,

(6.2)

In the factor analysis model with aminor exception to CTT, we have X


observed scores, F latent variable(s) (factor/s), u or which represents
a unique variance, and as factor loading describing the extent to which
latent variable affects the observed score. As the factor varies, so does the
observed score. However, the extent to which it varies is determined by the
value of loading factor . It is therefore clear, that F is equivalent to the true
score T in CTT model [Netemeyer, Bearden and Sharma 2003]. Similarly,

Relationship between factor analysis and classical test theory CTT

215

the reliability of the observed measure including the factor loading in FA


model is given by the following Equation:
XX =

2 2 (F )
2(X)

.

(6.3)

In a strict sense, one might still argue that conceptually, the factor FA
model and CTT are not equivalent. In FA model ( otherwise defined as
the indicator of uniqueness u), in addition to random measurement error,
the specific-systematic error is included. As aresult, its composite form is
defined as the sum of the specific s and random measurement error e [Harman 1976]2. Factor model that holds for the observed scores also holds for
the true scores. However, in the CTT theory, the e is considered as the pure
measurement error. The error term used in factor analytic models is more
encompassing than just random error defined by classical test theory [Lord
and Novick 1968].
Lord and Novick [1968, p.535] when decomposed true score, specified
that there may exist in the measurement error some unwanted component
that is or should be stable across successive measures3. As Gerbing and Anderson pointed out, the representation of the true scores as factors implies
that factors should be the stable components over successive measurements.
However, the constructs are usually not sufficiently defined by the criterion of consistency over successive measurements. And to some extent, the
measures may be systematically influenced by extraneous variables. In other
words, the true score would be only equivalent to the construct of interest if
systematic error was not present [Gerbing and Anderson 1984].
Tarkkonen proved empirically, it is rather impossible to separate these
two types of errors, i.e. s and e. In consequence, at an empirical level it is not
possible to differentiate between the CTT and FA model [Tarkkonen 1987].
This situation occurs especially in cross-sectional data where two errors
sand e, are functionally similar since neither one correlates with the remaining indicators [Smith 1974]. However, the use of correlated measurement
2
Despite the fact that the unique variance is composed of specific variance and error variance, we typically assume that specific variances are small relative to measurement error variance.
3
For example, this component s represents the aspects of each measure not shared with
any other indicator in the model. Kaplan [2000] argued that if s does not correlate with the remaining indicators in the model then it does not influence the correlations among them. This
may happen, for instance, due to the selection of observed variables in the models.

216

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

errors in longitudinal analysis represents the opposite situation where the


specific and random error can in fact be potentially separated. It is because
the specific component can be stable over time and the error terms for the
same indicator at two different times can correlate [Jreskog 1974].

Underlying aims of factor analysis in the field of statistics


In assigning a place to factor analysis in the general field of statistics, we
follow Kendall, who drew adistinction between analysis of dependence and
interdependence. As he explained: in the latter we are interested in how
a group of variables are related among themselves, whereas in the former
type of analysis (based on dependence) we are interested in how acertain
specified group (i.e. dependent variables) depend on the others [Kendall
1950, p.60]. The position of factor analysis in the selected group of techniques using analysis of interdependence is shown in the Figure 21.
Kelley mentioned three general aims of the factor analysis. These aims are
the same which give rise to other branches of statistics. As Kelley put it: the
Multivariate
analysis
Analysis
of dependence

Analysis of
variance and
covariance

Analysis of
interdependence

Product
moment
Regression
analysis correlation
analysis
Association
and
contingency
tables

Rank
correlation
analysis
Principal
component
analysis
Factor analysis

Discriminant analysis

Figure 21. Selected group of techniques using analysis


ofinterdependence and dependence
Source: based on Kendall 1950, p. 61.

Underlying aims of factor analysis in the field of statistics

217

first function of statistics is to be purely descriptive, second function is to enable analysis in harmony with hypothesis, and its third function is to suggest
by the force of its virgin data analyses not earlier thought of [Kelley 1940,
p.22]. Kelley also added: we may say that there are two occasions for resort
to statistical procedures, the one dominated by adesire to prove ahypothesis, and the other by adesire to invent one [Kelley 1940, p.12].
As aresult, we accept the use of factor analysis at three levels, as they have
been discussed by Eysenck [1953]:
1. Factors as descriptive statistics whatever else may be the function of
afactor, it is always descriptive of agiven sample or population. Holzinger
and Harman [1941, p.1] wrote: factor analysis is abranch of statistical
theory concerned with the resolution of aset of descriptive variables in
terms of asmall number of categories or factors. The chief aim is to attain
scientific parsimony or economy of description. Similarly, Kelley [1940,
p. 120] explained that: factor analysis represents a simple straightforward problem of description in several dimensions of a definite group
functioning in definite manners.
2. Factors suggesting ahypothesis to the researcher. In so far as it does that,
the factor ceases to be merely descriptive and becomes part of theoretical
assumptions.
3. Factors supporting or disproving a hypothesis it can be argued,
whether the factor analysis cannot be used as a formal part of the hypothetical-deductive process in relation to just any type of hypothesis.
The great majority of hypotheses in social sciences research require some
form of analysis of dependence. But there are also number of hypotheses,
particularly those concerned with unknown structure, which require factor-analytic solutions, and which are difficult to disprove or support by
non-factorial methods.
In passing from the purely descriptive use, there appeared a definite
change in the implication of the term factor. For Kelley [1940], there was
no causal reference implied in afactor, but for Spearman [1927], Thurstone
[1947], and those who followed their methodology such areference was obvious. Causal implication characterizes not only the interpretation of factors as suggestive of ahypothesis, but also the next level of factors as proving
ahypothesis. Thus, one may accept the following definition where afactor
is a hypothetical causal influence underlying and determining the observed
relationships between a set of variables. This definition is helpful in drawing attention to the close link between the hypothesis-generating and the
hypothesis-proving functions of factor analysis, as opposed to the purely

218

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

escriptive option. However, it may often be found that in one and the same
d
investigation there will be factors which support a hypothesis and factors
which generate one. Thus, we can find in the same analysis confirmation of
one hypothesis, and suggestions for further hypotheses.
In sum, the orientation of factor analysis will be as follows:
Factor analysis is a mathematical procedure which resolves a set of
descriptive variables into a smaller number of factors. These factors
themselves, in the first instance, may be regarded as having apurely
descriptive function.
Under certain circumstances, factors may be regarded as hypothetical
causal influences underlying and determining the observed relationships between aset of variables.
The term cause is aconcept which aids in the simplification and unification of natural phenomena. Like all scientific concepts it is abstract
and consequently an artifact. A scientific concept is not apart of nature, it is rather away of comprehending nature.
The factorial method, no more than any other, cannot guarantee the
correctness of the causal hypotheses suggested by it. Historical evidence
suggests, however, that it is more successful than any alternate method,
and that the hypotheses generated by it have proved to be remarkably
accurate.

Differences between EFA and CFA


Both, exploratory factor analysis (EFA) and confirmatory factor analysis
(CFA) aim to reproduce the observed relationships among agroup of indicators with asmaller set of latent variables, but they differ fundamentally by
the number and nature of apriori specifications and restrictions made on
the factor model [Gerbing and Hamilton 1996].
The EFA is often used as the initial stage of the scale development4, especially when scale under study is meant to be amultidimensional for which
we identify so-called subscales among the set of intercorrelated items5. In
4

Kim and Mueller [1978] argued that main motivation behind the use of factor analysis
is not only in ascertaining the factor structure among a set of variables, but in achieving data
reduction and obtaining better factor scales which can be used in a different study.
5
As Spector claimed a subscale or one factor of a multidimensional scale should be treated
as a separate scale for development purposes. He argued that the subsets should be developed
in parallel way [Spector 1992, p. 390].

Differences between EFA and CFA

219

this case, ascale contains more than one factor. However, if one measures
aunidimensional construct, one may also profit from EFA by eliminating
unnecessary items6. This choice, in fact, depends on the type of construct
under investigation, whether it is unidimensional or multidimensional. In
the latter, items should be preselected by the examinations of the factor
structure and then in the second phase the respective factors should be verified by the internal consistency analysis. Some authors did this procedure,
but in opposite way. For example, Martin and Eroglu [1993] at first used internal consistency analysis to remove non-performing items, then checked
the factor structure with EFA and when it was not as they expected, they
decided to reformulate the dimension originally and theoretically expected
of the measured construct. However, this seems to have given the developmental stage items too much say in the final scale. The effort becomes too
data-driven at too early apoint in the scale development. Thus, perhaps it
would be better if we had conducted EFA at the beginning and compared its
results to those obtained from the internal consistency analysis7.
In the course of EFA each time we need to sometimes delete an item,
which is areasonable process of forming aperfect scale. This seems especially
important if theoretical construct being measured is somewhat ill-defined, if
for instance, one is unsure about the potential number of dimensions [Kline
2010]8.
In contrast, CFA is the resultant model, which is derived in part from
theory and in part from a respecification based on the analysis of model
fit. This analysis (otherwise termed as restricted factor analysis or measurement model in structural factor analysis) is typically used to test hypotheses
regarding unmeasured sources of variability responsible for the communality among a set of observed variables scores. It can be contrasted with
EFA which addresses the same basic question but in an inductive or rather,
6

Such a being case, we can ask the question, as Flynn and Pearcy [2001] did: should one
simply ignore the presence of other factors or treat the set of items as if they form one factor?
7
Pecheux and Derbaix [1999] in the construction of an attitudinal measure, performed
simultaneously internal consistency computations and EFA in order to make optimal decisions about items deletion. They reported final scale with solid psychometric properties and
consistent dimensionality.
8
For example, a researcher could have generated 20 questionnaire items that are believed
to be indicators of the unidimensional construct such as hedonism. In the early stages of scale
development, researcher might use factor analysis to examine the plausibility of this assumption (i.e. the ability of a single factor to account for the intercorrelations among the 20 indicators) and to determine if all 20 items are reasonable indicators of the underlying construct of
hedonism (i.e. how strongly is each item related to the factor).

220

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

discovery-oriented mode. As Mulaik suggested [1987, p. 302]: exploratory factor analysis is regarded as a hypothesis-generating method, providing information for the researcher to use in formulating hypotheses. This,
however, as he further claimed demands from us finding a way of using
experience, going beyond the specific set of data stimulating the hypothesis,
what can be done only by testing hypotheses with additional data. Hence,
confirmatory factor analysis is logical sequel to exploratory factor analysis
[Mulaik 1987, p. 302].
The CFA is defined as a decision rule to accept or reject one or more
hypotheses about apopulation factor structure based on sample data. Techniques commonly employed in CFA do not always correspond to the hypotheses we have in mind when we employ them. So, the strongest possible
use of CFA would be to hypothesize aset of parameters (e.g. factor loadings,
correlations, and uniqueness) and test the fit of areproduced matrix to sample data without estimating any parameters based on sample data [Hurley et
al. 1997].
The CFA models provide some sort of validation or confirmation of the
theoretical construct. In construct validity, CFA has limited, but important
role. Specifically, CFA can be used to examine factorial validity, it can establish norms and test the invariance of the factor structure. More importantly,
it involves not only amechanism to test astructure but also it is atool to
further reduce items. In the last case, although it is strongly advised the
use of EFA to retain important items, EFA will tell us little about the potential threat to dimensionality, namely the presence of correlated errors
among the items. The EFA and item-based statistics can inform us as to the
magnitude of an item loading, as well as the potential cross-loading of an
item to another factor9, however EFA does not reveal potentially correlated
errors among the items. Just to remind here, if two items are highly correlated and share variance beyond that variance that is accounted for by their
factor, they can result in correlated errors. This situation violates a basic
tenet of classical test theory, where error terms among items should be un
9

Cross-loading means, that variable has more than one significant loading. Such a being case, the difficulty arises because a variable with several significant cross-loadings must be
used in labeling all the factors on which it has a significant loading. Yet how can these factors
be distinct and potentially represent separate concepts when they share variables? Ultimately,
the objective is to minimize the number of significant loadings on each row of the factor matrix. The researcher may find different rotation methods to eliminate any cross-loading and
thus define a simple structure. If however, variable persists in having cross-loading, it becomes
acandidate for deletion [Hair et al. 2010, p. 119].

221

Differences between EFA and CFA

correlated10. And when the errors are highly correlated, a dimensionality


is threatened. These threats often reveal themselves in a number of CFA
diagnostics, including fit indices, standardized residuals, and modification
indices11.
In order to illustrate associations between the observed and latent variables, patch diagrams either for CFA or EFA were drawn in Figure 22. Here,

Latent
variable

Latent
variable

Latent
variable

Latent
variable

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

Legend: Unidirectional arrows extending from the factors (latent variables) represent the effects (structural relationship) latent variables impact/influence on observable variables. Typically, these effects are
called factor loadings. Curved bidirectional arrow between the two factors in the model indicates the
correlation (covariance) of these factors.
Indicators X4, X5, and X6 (on the right) are congeneric [Jreskog 1971b] because they share a common
factor/latent variable 2. An indicator would not be considered congeneric if it loaded on more than one
factor. In case of congeneric factor loadings, the variance of an indicator is reproduced by multiplying its
squared factor loading by the variance of the factor, and then summing this product with the indicators
error variance. The predicted covariance of two indicators that load on the same factor is computed as the
product of their factor loadings times the variance.

Figure 22. Path diagrams of two correlated factors as modeled using exploratory
factor analysis (EFA) (on the left, with cross-loadings) and confirmatory factor
analysis (CFA) (on the right, with oblique rotation)
Source: own construction based on Hoyle 2000.
10
Note, that in case of CFA we speak of correlations, but it is important to note that covariances among the items should be preferably used as input to CFA.
11
On the other hand, although the high level of intercorrelation results in a high level of
internal consistency, highly disproportionate correlations among some items relative to others
also can result in correlated errors. As a result, when item intercorrelation is desired and multiple items that tap the domain of the constructs are needed, adequately fitting such items to
a CFA structure can be problematic [Floyd and Widaman 1995].

222

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

the rectangles represent observed variables, generally referred to as indicators. Larger circles represent unobserved (latent) variables. In this sense, large
circle represents afactor, whereas the smaller circles represent uniqueness,
which represents unobserved sources of influence unique to each indicator. The single-headed arrows suggest causal influence, indicating specifically that each indicator is caused by two unmeasured influences: 1) acausal
influence it shares with the other indicators, 2) an additional causal influence not shared with the remaining indicators. Also as observed, these diagrammed relationships may be converted into more advanced mathematical
models, for example, in structural equation modeling (SEM).
By custom, CFA requires the imposition of restrictions on the pattern
of weights, or factor loadings. Below are the CFA Equations (6.4) for aprototypic two-factor CFA model involving six indicators, shown also in path
diagram on Figure 22. The zeroes indicate no influence of the factor on the
indicator (in CFA language these parameters are fixed to zero). For instance, 1 influences X1 but not X2 and 2 influences X6 but not X3.
X1 = 1 + 02 + 1 ,

X2 = 1 + 02 + 2 ,
X3 = 1 + 02 + 3 ,

X 4 = 01 + 2 + 4 ,
X5 = 0 1 + 2 + 5 ,
X6 = 0 1 + 2 + 6 .

(6.4)

As observed from Figure 22, both EFA and CFA differ markedly in the
manner by which indicators cross-loadings are handled in solution entailing
two factors. In EFA, all indicators may freely load on all factors. The factors
should be rotated to maximize the magnitude of primary loadings and minimize the magnitude of cross-loadings. Factor rotation does not, however,
apply to CFA, because the identification restrictions associated with CFA
are achieved in part by fixing most or all indicators cross-loadings to zero12.
In other words, rotation is not necessary in CFA because simple structure
is obtained by specifying indicators to load on just one factor. In general,
when the EFA simple structure is atarget of inductively oriented extraction
12

A possible consequence of fixing cross-loadings to zero in CFA is that factor correlation


estimates in CFA tend to be of higher magnitude than in EFA.

223

Differences between EFA and CFA

and rotation algorithms, in the CFA simple structure typically is assumed or


imposed on the pattern of factor loadings [Brown 2006].
Unlike EFA, the CFA framework offers the researcher the ability to
specify the nature of relationships among the measurement errors (unique
variances) of the indicators (see Figure 23). Because CFA typically entails
amore parsimonious solution (i.e. CFA usually attempts to reproduce the
observed relationships among indicators with fewer parameter estimates
than EFA), it is possible to estimate such relationships when this specification is substantively justified and other identification requirements are met.
This specification, as shown on Figure 23, depicts an approach to modeling
error covariances (i.e. zero-order relationships are freely estimated between
pairs of the given indicators), which suggest that two indicators covary for
reasons other than the shared influence of the latent factor13.

Latent
variable

Latent
variable

X1

X2

X3

X4

X5

X6

Figure 23. Confirmatory factor analysis with oblique rotation


(correlated two factors) and correlated measurement errors
Source: own construction based on Hoyle 2000.

In measurement models, the specification of correlated errors is justified


on the basis of source or method effects that reflect additional indicator
13

Some other sources of relationships in CFA are also possible. For example, when the
CFA model consists of two or more factors, afactor covariance (afactor correlation being
the completely standardized counterpart) can be specified to estimate the relationship between
the latent dimensions. However, one may fix factor covariances to zero, akin to an orthogonal
EFA solution [Gerbing and Anderson 1984]. This issue will be addressed in further part of this
chapter.

224

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

covariation that resulted from assessment methods (e.g. observer ratings,


questionnaires), reversed or similarly worded items, or differential susceptibility to other influences such as response set, reading difficulty, or social
desirability [Brown 2006]. Brown argued that the inability to specify correlated errors (i.e. the nature of the relationships among unique variances) is
avery significant limitation of EFA. For instance, in applied factor analytic
research of questionnaires comprised of a combination of positively and
reversed-worded items, a common consequence of EFA limitation is the
tendency to extract and interpret factors that have little substantive basis
[Brown 2006].
In some situations the use of correlated errors can be meaningful and
specified apriori (e.g. longitudinal research). In practice, however, these errors are often employed in apost hoc manner to obtain an acceptable fit of
the model to the data [Bagozzi 1983; Fornell 1983]. The post hoc addition of
acorrelated error means that the observed covariation between agiven pair
of indicators has not been adequately accounted by the factors present in the
original model. Positive values of correlated errors mean that the original
model under-estimates the particular indicator covariance, whereas negative
values mean that the model over-estimates this covariance. Poor fit, defined
as the difference between the implied and observed indicator covariance,
can nearly always be remedied by adding acorrelated error term. Examples
of this use of correlated errors can be found in Srbom [1975], Bearden and
Mason [1980], Werts et al. [1980], Reilly [1982], and Aneshensel, Clark and
Frerichs [1983].
While the use of correlated measurement errors improves fit by accounting the unwanted covariation, it does so at a correspondent loss of the
meaning and substantive conclusions which can be drawn from the model.
Their post hoc use, means that indicator covariance is due to at least one
unknown common source. As correlated measurement terms are added,
the correspondence between the construct of interest and the empirically
defined factor becomes unclear. A preferred substantive representation of
this covariation would be to model it explicitly apart from the construct
ofinterest.
Proponents of CFA believe that researchers need to have a strong theory underlying their measurement model before analyzing data [Williams
1995]. The CFA will be often used in data analysis to examine the expected
causal connections between variables. On the other hand, supporters of EFA
believe that CFA is sometimes overapplied and used in many inappropriate
situations. Some researchers even believe that CFA is often used with little

Common factor analysis model CFAM

225

theoretical foundation [Brannick 1995; [Kelloway 1995]. However, each of


these models is appropriate in its own way. The EFA will be the most appropriate in initial phase of scale development, while CFA will be strongly useful
when ameasurement model and the final scale must be confirmed [Hurley
et al. 1997].
The choice between EFA and CFA is the question of the researchers
general purpose, the method of estimation or methodological approach
employed in aparticular analysis. The EFA is atype of analysis where the
purpose is to identify the underlying dimensional structure, if any, of aset of
items. And CFA is atype of analysis where the purpose is to test whether an
apriori dimensional structure (which measures the theoretical construct) is
consistent with the structure obtained in aparticular set of items.
In contrast to EFA, in CFA there are required at least two conditions to be
met: 1) agenuine, strong theory that posits astrong and unambiguous structure of relations among latent variables of the theoretical construct and the
observable variables that represent them, and 2) astrong and unambiguous
apriori structure that serves as the basis for the test of fit.
Analytical procedure, like EFA, confronts us with adecision each time we
need to make in order to use the other procedure such as CFA. In each of
these procedures we need to clarify at first, what the results shall mean for us
beyond the data we have at hand. In other words, EFA does not automatically give us final and exact meanings of the construct. It is us who create meanings for the explored issues. If EFA cannot tell us sufficiently what something
is, perhaps we should consider the other forms of factors analysis, beginning
a priori with well-defined theoretically multidimensional construct corresponding to latent variables and then seek to study the relations of these
latent variables in away where we can decide objectively that they apply. And
one way to do this is the CFA [Mulaik and McDonald 1978].

Common factor analysis model CFAM


Generally, there can be two distinct but equivalent ways to express the CFAM
model. For example, we may assume anumber of common factors which explain the observed correlations, in asense that when these are partialled out,
the partial correlations of variables become zero. If there is afactor Fj and
if it is partialled out, we can expect that no further intercorrelation remains
between observed variables. If so, partial correlations between any pair of

226

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

variables (e.g. Xi1 and Xi2) must vanish after the first factor F1 has been eliminated. Alternatively, we can say that each of observed variables can be expressed as the sum of a(common) part, that is its regression on anumber of
common factors plus residual about that regression.
When we decide to construct acommon factor analysis model, we strive
to explain the outcome of p observable correlated with each other variables
(items). The CFAM is but atemplate imposed upon the correlations among
aset of variables to see what things would be like in context of the variation
of these variables produced by variation in aset of common variables [Mulaik 1990].
The data in CFAM model is collected in the form of matrix of observations in reference to observed variables X which is then used to calculate
asample correlation matrix R or covariance matrix S. In typical common
factor model, the researcher has two options to choose. He may use either
matrix of covariances for aset of variables, or matrix of correlations in its
standardized form. If observed variables are standardized to have unit variance, the linear weights are called as standardized regression coefficients (in
regression analysis), patch coefficients (in causal analysis), or factor loadings
(as it is in case of factor analysis)14.
Such types of matrices either based on covariances sij or correlations
rij express the interdependence taking place in a set of observed variables
(X1,X2, , Xp), where information (contained in data matrix) is reproduced
by a smaller number of factors. In other words, the number of k factors
need to be always smaller than number p of observed variables. Factors are
then interpreted as latent (unobserved) common characteristics of the observed variables which have been entered into the analysis (X1, X2, , Xp).
So, the factor analysis helps us to replace (X1, X2, , Xp), with anew set of
variables (factors). The case just described occurs when observed variable
Xi = (X1, , Xp) might be expressed as follows:
Xi =

ij Fj + i,

(6.5)

j =1

where:
Fj f actors (latent variables, common factors); j = 1, , k. They appear
in more than one observed variable. If factor appears in all variables
14

The covariance between standardized variables (with mean of 0 and avariance of 1) reflect the correlation coefficient or product-moment (Pearsons) correlation coefficient.

Common factor analysis model CFAM

227

it is sometimes called as general factor, and when it appears only in


certain variables, it is called group factor.
ij loading of the i-th variable on the j-th factor being under estimation.
They are called parameters (coefficients) of the model, which reflect
the intensity level or weight of j-th factor in i-th variable. Loading
factors indicate, on how each factor influences on the given observable variable.
i uniqueness component for the respective factors Fj of every variable
Xi, where each variable has its own specific-systematic and random
error measurement element15.
Because in CFAM model i is simultaneously perceived through the lens
of specific-systematic error si and random measurement error ei, term i represents the sum of two parts:
i = si + ei.

(6.6)

From amodel (6.5) we can observe three sets of components, namely the
observed scores, latent variables-factors, and unique elements.
In case of p observed variables and k factors Eq. (6.5) may be further generalized to the following form:
Xi = i1F1 + i 2 F2 + + ik Fk + ei(i = 1, , p),

(6.7)

where we can e.g. three different observed variables linked with two factors
by appropriate weighting coefficients, the s and es residuals:
X1 = 11F1 + 12F2 + e1 ,
X2 = 21F1 + 22 F2 + e2 ,
X3 = 31F1 + 32 F2 + e3 .

The expression (6.7) represents akind of similar solution to regression on


observed variable Xi and the factors F1, F2, , Fk, with residuals ei16.
15

Uniqueness is sometimes denoted as U symbol.


According to McDonald [1985], factor analysis will always remain some further extension of regression models, though perhaps at first sight astrange one. As he claimed: aspecialized language has been adopted by factor analysts, afact that seems regrettable, because
virtually none of the words in their special vocabulary is necessary, given the existing technical language of statistics [McDonald 1985, p.14].
16

228

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

The following, below, linear combination:


ci = i1F1 + i 2F2 + + ikFk = hi2

(6.8)

shows the linear estimate of observed variable Xi which we treat as common


part ci or otherwise as hi2 source of communality (common variance) of Xi
that is explained by k common factors.
Moreover, in the factor analysis, the sum of squared factor loadings of the
j-th common factor (considering all observed variables):
Fj = 21 j + 22 j ++ 2kj =

2ij 

(6.9)

j =1

measures relative meaning of this factor in reference to other factors (assuming there are k factors).
Now, if we assume azero mean vector (e.g. in case of standardized variables), we will obtain the similar expression of the factor analytic model as
shown in (6.5):
Zi =

ij Fj + i.

(6.10)

j =1

Thus, for the standardized variables, model (6.5) after some minor modification will be as follows:
Zi = i1F1 + i 2F2 ++ ik Fk + i (i
,
(i = 1,1,
, p)p).

(6.11)

Finally, the same model (6.5) can be expressed in amatrix notation:


X = F + Vi,

(6.12)

where:
X = [ X1 ,, X p ]T vector observed variables,
F = [F1 ,, Fk ]T vector of common factors,
= [ij], i= 1, , p and j = 1, , k, (p k)-matrix of loading actors F(k 1),
X = F + Vi unique elements of observed variables.

229

Common factor analysis model CFAM

Simultaneously, as far as the most important conditions are concerned to


CFAM model construction, they are the following:
factors Fj should be centered, standardized and uncorrelated,
errors i are not correlated with each other, have zero correlation with
the common factors, and have zero expected value,
observed variables Xi are standardized,
and more precisely, as Kim and Mueller [1978] have explained:
E(Fj ) = 0,
V(Fj ) = 1,
cov(Fi , Fj ) = 0.
E(i ) = 0,

V(i ) = i2 , 
cov(i , j ) = 0,

(6.13)

cov(Fj , i ) = 0,
E(Xi ) = 0,
V(Xi ) = 1.

Since there is no covariance between Fj and i or between i and j then:


cov(Fj , i ) = cov(i , j ) = 0.

(6.14)

In the end, we need to underline afew distinctions between the factors relations in CFAM model. In the orthogonal case, factors are uncorrelated, the
factor pattern (which holds both for the orthogonal and oblique solution)
contains the elements ij that are the covariances between observed variables
and factors, i.e. ij, is the covariance between Xj variable and factor Fj. When:

= I,

(6.15)

(I denotes identity matrix, that is when factors are uncorrelated), the factor
pattern and factor structure are both given by .
In contrast, in the oblique case, the factors are intercorrelated. This fact
must be taken into account when computing the covariance between factors
and variables. The matrix giving the covariances between variables and factors is termed the factor structure and is given , where denotes relationships between factors [Jreskog and Reyment 1996].

230

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

Variance decomposition, matrix of correlation


and factor loadings
Another important aspect in factor analysis is the knowledge about the configuration of variance. To proceed with this topic, we will consider the following Equation:
Xi = ci + i = ci + si + ei.

(6.16)

Left side represents a total variance for observable variable Xi. Likewise,
there is avariance, unobservable for most part, i.e. for the terms on the right
side of Eq. (6.16). Here, we distinguish the variance due to measurement
error and specific-systematic error variance. As awhole, we call it aunique
variance (uniqueness)17 which is acombination of variance that is specific to
the observed variable and random error variance (i.e. measurement error or
unreliability in the observed variable) see Figure 24.
Total variance
Common variance
Specific variance
(communality)
(specificity)
Reliable variance

Measurement error
variance

Figure 24. Decomposition of the original variable variance


Source: Balicki 2009, p.141.

Generally, arule of decomposition can be presented on the basis of observed variance-covariance matrix S (for asample) which consists of two matrices representing common elements H, and uniqueness V [Balicki 2009]:
S = H + V.

(6.17)

The exemplary way to decompose the total variance of i-th variable is expressed as the sum:
s 2 ( Xi ) = s 2 (ci ) + s 2 (i ) = s2 (ci ) + s2 (si ) + s2 (ei ),
17

(6.18)

These terms are derived from psychological research, including the aspects such as: reliability and validity of tests, in which factor analysis plays asignificant role.

231

Variance decomposition, matrix of correlation and factor loadings

where:
s2(ci) communality represents that part of the variance of Xi that is accounted by k common factors,
2
s (i) unique variance (uniqueness) of Xi is unaccounted by the factors
and is therefore not shared by other variables,
s2(si) specific variance of Xi variable,
s2(ei) random measurement error variance of Xi.
Common variance of Xi is defined as apart of the total variance accounted
by the factors and this variance is shared with other variables. On the other
side, unique variance of Xi is defined as apart of total variance and is associated with specific-systematic and random errors which influence variable.
Moreover, total variance of observed variable can be expressed through
the agency of the squared factor loadings, as well as unique component:
s 2 ( Xi ) = 2i1 + 2i 2 ++ 2ik + i2 =

2ij + s2 .
j =1

(6.19)

In case of standardized observed variable we obtain:


s 2 ( Xi ) = hi2 + s2i = 1 ( j = 1, , k ).

(6.20)

Such abeing case, the communality of an observed variable is the square of


the factor loadings for that variable (or the square of the correlation between
that variable and the common factors). Uniqueness component is given as
follows s2i = 1 hi2, which is computed as one minus the communality of i-th
observable variable.
In sum, total variance has to be shared between two parts, i.e.:
k

2ij = hi2,
sum of squared factor loadings and communality that is
j =1
unique variance.
Figure 25 illustrates apattern of shared variance among scores on the
three observed variables. Each shaded circle represents the variance in one
of the measured variables, X1, X2 and X3. The overlap of the circles represents
shared variance or covariance. The black color, labeled as F, denotes the
area of overlap, involving all three circles, which corresponds to common
factor.
Factor loadings can be interpreted in Eq. (6.11) as the correlation coefficients between i-th variable and j-th factor. If they have similar interpretation

232

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

X1

X3

X2

Figure 25. Communality between three measures observed variables X1, X2


and X3 composing factor F

as the regression coefficients, and if we assume that factors and observable


variables are standardized, we can infer that such loadings are equivalent to
correlation coefficient pairs between the observed variables and factors. The
higher is such correlation, the more observed variable is saturated with its
respective factor, and simultaneously the greater meaning of the factor.
Using now the matrix X with observations and corresponding to it, acorrelation matrix R = [rij] i, j = 1.2, , p one can verify the patterns of correlation coefficients among all observed variables, assuming that each variable
will be more or less (or not at all) correlated with conceptual latent variable.
It is worth mentioning that R can be very useful if we see it favorably
through the agency of total variance components being expressed with factor loadings. We can, for instance, prove (assuming all observed variables are
standardized) for i j = 1, , p that:
k

ij jj,

(6.21)

rii = 2i1 + 2i 2 ++ 2ik + i2 = hi2 + i2 = 1. 

(6.22)

rij = i1 j1 + i 2 j 2 ++ ik jk =

j =1

and for i= j = 1, , p:

From (6.21) we infer that two observed variables are strongly correlated if
they have strong factor loadings of the same common factor. And because

Variance decomposition, matrix of correlation and factor loadings

233

correlation coefficients meet the condition, i.e. 1 rij 1, also the factor
loadings need to meet the same criteria, 1 ij 1, since communalities
cannot exceed 1.0.
Now, if we have correlation matrix R we can further clarify the nature of
the factor loadings. Skipping the specific factor and simultaneously assuming i = 0 (which means that total variance of variables is purely accounted
by the common factors), we modify R by placing communalities hi2 on the
diagonal (instead of 1s.). In consequence we obtain the reduced correlation
matrix [Balicki 2009]:
h12 r12

r21 h22

R=

rp1 rp2

r1 p

r2 p
.

h2p

(6.23)

On the other hand, matrix of factor loadings groups factor loadings ij


of all common factors based on all observed variables, assuming however
that some of the ij = 0:
11

21
=

p1

12

22

p2

1k

2k
.

pk

(6.24)

If matrix (6.24) does not include loadings of the specific factors, then we
define it as reduced matrix.
T
Between reduced matrix R =
and
of factor loadings appears relamatrix
tion which is the synthesis of (6.21) and (6.22):
R = T.

(6.25)

This relation makes up the principle of factor analysis. Presenting R =


in
twoT
T
forms and (after transposition), we turn to factorization. And transformation of R matrix into R matrix
= T simplifies the Eq. (6.12) to its standardized form Z = F.
At last, there is one more important issue to be mentioned in reference
to factor loadings. In determining asignificance level for the interpretation

234

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

Table 20. Guidelines for identifying significant


factor loadings based on sample size
Factor loading

0.30
0.35
0.40
0.45
0.50
0.55
0.60
0.65
0.70
0.75

Sample size
350
250
200
150
120
100
85
70
60
50

Sample size needed for significance based on 0.05 alpha level,


apower level of 80% and standard errors assumed to be twice
of conventional correlation coefficients.
Source: Hair et al. 2010, p.117.

of factor loadings, an approach similar to determining the statistical significance of correlation coefficients should be adopted. However, the research
findings [Hair et al. 2010, p.117] suggest that factor loadings have substantially large standard errors than typical correlations. Thus, factor loadings
should be evaluated at considerably stricter levels. The researcher can employ the concept of statistical power to specify factor loadings considered
significant for different sample sizes. With the objective of obtaining apower
level of 80 percent, the use of 0.05 significance level, and the proposed inflation of the standard errors of factor loadings, we can predict the sizes necessary for each factor loading value to be considered significant as presented
in Table 20.

Principal component analysis PCA vs. common factor analysis


The fundamental differences between principal component analysis (PCA)
and factor analysis (FA) as Jreskog and Reyment [1996, p.78] explained:
depend upon ways in which factors are defined and upon assumptions the
nature of the errors18. In their point of view, in PCA components are deter 18

Jreskog and Reyment [1996] also claimed that the errors are usually assumed to be small
in principal component analysis, whereas this is not in factor analysis. Basically, this implies
that component analysis accepts that alarge part of the total variance of avariable is important and in common with other observed variables. On the other hand, factor analysis allows

Variance decomposition, matrix of correlation and factor loadings

235

mined so as to account for maximum variance of all the observed variables.


In factor analysis, the factors are defined to account for the intercorrelations of the variables. Thus, principal component analysis variance-oriented,
whereas factor analysis is correlation-oriented [Jreskog and Reyment 1996].
Gorsuch [1990] explained that the main theoretical difference between
FA and PCA is that the common factor analysis includes error explicitly in
the model. These theoretical differences between component and common
factor analysis can be viewed as paradigmatic. Component model follows
the mathematical paradigm in that sense it perfectly reproduces each variable. In essence, PCA assumes that the sample matrix perfectly reflects the
population matrix. Unlike, the common factor model reflects the statistical
paradigm. In FA it is explicitly noted that the variables may be fallible and
that the sample matrix need not be the population matrix, hence estimations from the correlation matrix most accurately represent the task at hand
[Gorsuch 1990].
Although there exists theoretically gulf between PCA and FA, in practice
solutions for both analyses might often appear similar. In aseries of empirical investigations, that compared component and common factor analysis,
concluded that these two methods produce essentially equivalent solutions
[Velicer 1974, 1976, 1977; Velicer and Fava 1987; Velicer, Peacock and Jackson 1982]. Velicer and Jackson, in areview of the literature (when compared
component analysis and factor analysis), had stated that no distinction has
been demonstrated at the empirical level [Velicer and Jackson 1990, p.20].
The major empirical distinction between PCA and FA is that component
analysis gives higher loadings than common factor analysis regardless of the
type of rotation used [Gorsuch 1974; Lee and Comrey 1979; Velicer et al.
1982]. This is because the loadings obtained from component analysis do
not include, from the common factor analysis perspective, both common
and unique variance19.
aconsiderable amount of uniqueness to be present in the data and utilizes only the part of
avariable that takes part in correlation with other variables.
19
Some authors claim that PCA is not atrue method of factor analysis and there is disagreement among statistical theorists about when it should be used, if at all. They argue for
severely restricted use of components analysis in favor of afactor analysis [Bentler and Kano
1990; Ford, MacCallum and Tait 1986; Gorsuch 1990; Loehlin 1990; MacCallum and Tucker
1991; Mulaik 1990; Snook and Gorsuch 1989; Widaman 1990, 1993]. Some other authors yet
disagree, and point out either that there is almost no difference between principal components
and factor analysis, or that PCA is preferable [Arrindell and van der Ende 1985; Guadagnoli
and Velicer 1988; Steiger 1990a; Velicer and Jackson 1990].

236

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

Differences can be also found in the number of variables and the value of
the communalities. For example, it exists between the matrices from which
components and factors are extracted. Component analysis uses unities in
the diagonal, whereas factor analysis uses an estimation procedure for obtaining the communalities. As the number of variables decreases, the ratio
of diagonal to off-diagonal elements also decreases, and therefore the value
of the communality has an increasing effect on the analysis. As the number
of variables increase, communality estimates and the method by which exploratory factors are extracted both become less important20.
In PCA we begin with the correlation matrix among the variables, which
are then factored through an eigenvalue-eigenvector decomposition, where
eigenvectors are placed in columns, and eigenvalues are placed along its
main diagonal. When selecting components, fewer than observed variables
and when approximating the fit to data, we need to rotate these components
to obtain simple structure [Schneeweiss and Mathens 1995]. Unlike, in factor analysis, because we have common versus unique factors, we begin with
the correlation matrix R, but immediately make initial estimates regarding
the communalities shared among the variables. And usually squared multiple
correlation SMC (for each variable which is being predicted by all the other
variables) serves as that communality estimate [Costello and Osborne 2005].
The matrix of factor loadings is defined due to the fact that we have modified
the main diagonal the area in acorrelation matrix that contains the information regarding variances. While we speak of modeling the variance in
principal components, we speak of modeling the covariance in factor analysis. That is, for components, we are seeking to maximally account for the
variance among all the observed variables, with asmaller number of components. For factors, we model the variances per se, in order to maximally
account for the covariance among all the observed variables, with asmaller
number of common factors.
A general process of principal components extraction starts with amodel
which assumes the following form of Eq. [Sztemberg-Lewandowska 2008]:
zi = bi1s1i + bi 2 s2i ++ bin sni =

bij s ji,

(6.26)

j =1

20
Gorsuch suggested that, for example, once 30 variables have been reached, the differences between the two analyses for arotated solution with the same number of factors will be
small and unlikely to lead to different interpretations [Gorsuch 1974].

237

Variance decomposition, matrix of correlation and factor loadings

where:
zi value of i-th variable for respective observation,
sij value of j-th component for respective observation, j {1, , n},
bij principal component coefficients.
This model in matrix notation will be denoted as follows:

Z = BS,

(6.27)

where:
Z = [Z1, , Zn]T matrix standardized variables,

B = [bij]nn matrix of principal components coefficients,
S = [S1, , Sn]T matrix of principal components, S = (sj1, , sjn),
i {1, 2, , n} number of variable,
j {1, 2, , n} number of principal component.
Starting with Hotellings algorithm [Hotelling 1933] where principal component coefficients are described iteratively, at first stage, the coefficient of the
first principal component S1 is set with the purpose of maximization share of
this component in total variance of all observed variables:
V(Xi21 ) =

bi21.

(6.28)

i =1

This function is maximized with Lagranges multipliers, in restriction to


R = BB
TT.
In second stage, amatrix correlation of residuals is formed:

TT
R 1==
R
=T B

1B1,

(6.29)

where B1 = [bi1] is replaced with value loadings of first component. Defined


T
T
in this way matrix R 1=is
substituted
instead of R =
into
Eq.
R = BB
TT and then
loadings for second principal component S2 are calculated. Analogously,
loadings for third and fourth component are determined until the required
level of total variance is accounted for e.g. 75 or 87%.
In order to calculate loading of j-th principal component and i-th variable
we use the following formula:

bij =

j vij
n

i =1

where vij eigenvector.

vij2

,

(6.30)

238

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

Communality in percentage which is accounted by j-th component is calculated as follows:


j
j
h j = n 100% = 100%,
n
i

(6.31)

i =1

where i eigenvalue.
The percentage share of total variance which is accounted by k first principal components is calculated as the sum of the next k communalities:
Hk =

h j.

(6.32)

j =1

So, in order to extract principal components one needs to find eigenvalues and eigenvectors of matrix correlation R with:

RV = V.

(6.33)

The Eq. (6.33) can be simplified by:


det(R I) = 0,

(6.34)

hence
1 r12
det
= 0 (1 )(1 ) r12r12 = 0
r12 1

2 2 + (1 r122 ) = 0.  (6.35)

Theoretically, eigenvalues i for covariance matrix of the second level are


respectively:

1 = 1 + r12and2 = 1 + r12. 

(6.36)

In essence, an eigenvalue represents the information explained by principal


component. Also asum of eigenvalues reflects number of observed variables.
Such abeing case, one can divide i-th eigenvalue by number of variables to
obtain the proportion of variance explained on particular component:

Variance decomposition, matrix of correlation and factor loadings

p=

i
,
n

239
(6.37)

where:
p proportion of explained variance on i-th component,
n number of variables.
The consequence of relationship (between the information that is quantified and the number of items in the analysis) is that an eigenvalue of 1.0 corresponds to 1/n of the total variance in aset of items. Otherwise, aprincipal
component that achieves an eigenvalue of 1.0 contains the same proportion
of total information as the typical single item. If the objective of PCA is to arrive at asmaller number of components, that substantially capture the information included in the observed variables, the components should be more
information-loaded than the observed variables.

Methods of factor loadings estimation and factors extraction


At this stage one should not be concerned whether the underlying factors
are orthogonal or oblique. Nor should one be concerned whether the factors
extracted are interpretable or meaningful. The chief concern is here whether
asmaller number of factors can account for the correlation among amuch
larger number of observed variables, through the agency of particular estimation method [Kim and Mueller 1978].
There are a few methods of the factor loadings estimation and simultaneously factors extraction, for example: principal axes factoring, centroid
method, maximum likelihood, generalized least squares, alpha factoring,
minres, image factoring, canonical factor analysis21. Choosing a method
from along list methods is not always straightforward. As Cudeck [2000,
p.268] argued: most computer programs prespecify adefault method. This
can make the decision seem non-existent, but there are practical differences
between estimation methods that should be understood by a user of the
model. The extended treatments of the major alternatives can be reviewed
in text on factor analysis [e.g. presented by Harman 1976]. Most of them are
available in major software packages. Some of these methods were briefly
discussed.
21
Comparative research on these various methods for conducting an exploratory factor
analysis suggest that, in most cases, the same structure is identified regardless of the method
employed [Stewart 1981].

240

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

Principal axes factoring PAF


Principal axes factoring (PAF) is similar to principal components (PCA).
However, in contrast to PCA, it applies hi2 source of communality in the reduced correlation matrix of variables on the main diagonal (instead of 1s, as
it is in PCA), where hi2 communality of each observed variable is calculated
on squared multiple correlation with the other observed variables in set.
Factorial solution is achieved in similar way as in PCA, through the agency
of Eq. (6.34) with reduced matrix of correlation:
det(R I) = 0.

(6.38)

In the course of factors extraction we use acorrelation sum of i-th variable with all other variables in matrix, which is multiplied proportionally to
its size in relation to the sums of the correlation of other variables, giving
each variable adifferent weight [Gatnar 2003].
In PAF, the analysis of data structure is focused on common variance, not
on the sources of error that are unique to individual measurements. The conceptual approach in PAF (i.e. in understanding the shared variance in aset of
Xs observed variables through asmaller set of latent variables) may be more
convenient than mathematically simpler PCA approach which represents all
the variance in the X variables through smaller set of components.
Centroid method CM
Centroid method (CM) was originally invented by Thurstone22. This method is based on geometrical model where axes of common factors are placed
on vectors configuration which are constructed on the basis of observed
variables matrix X. These vectors are drawn in n-dimensional space, starting with one point (where cosines of the angles between vectors are defined
as the correlation coefficients of particular observed variables). Vectors are
nested with each other and factor takes the course through their center (of
the cluster) [Rusnak 1999].
Ok [1964] and Balicki [2009] explained, that first common factor
F1 can be extracted by projection of variable vectors on the axis running
through the common origin of vectors O and the center cluster of vectors
(otherwise centroid of points marking the ends of vectors) S1. These two
22
After all, Thurstone [1937] rejected centroid approach because he suspected it was arbitrary and artifactual. Thustone was supported in his thoughts by Wilson and Worcester [1939,
p.136].

Variance decomposition, matrix of correlation and factor loadings

241

points form axis direction (O, S1). If there is added one more factor F2 then
reference axis, standing for F1 will be perpendicular to the first centroid
axis. Sum of positive vectors projections in the second axis is equal the sum
of negative projections within axis. In short, this method of factors extraction requires that calculations are made to former factor loadings, moving
(each time) the beginning of vectors O configuration along the first axis to
the centroid.
In PAF, similarly as in the PCA extracted factors account for maximum
of variance of the observable variables. First factor is alinear combination of
variables that accounts for maximum part of their total variance. Second factor (uncorrelated with the first) accounts for the maximum part of variance,
which has remained after extraction of first factor. This process proceeds,
until the common variance of observable variables is completely explained.
In extraction of subsequent factors, we simply sum up the correlation of
i-th observed variable with the other remaining variables and divide it by the
sum of all correlation coefficients in matrix correlation.
Method of minimal residuals MINRES
In MINRES, factor loadings are estimated in order to minimize the sum
of squared elements that are placed off the diagonal in the residual matrix
correlation. In other words, sum of the squared deviations in observed coefficient correlation values between observed variables is minimized from
values of these coefficients, which are reproduced by respective factors23.
Because the fundamental theorem in factor analysis is expressed as [Harman 1976]:

R = T,

(6.39)

so what is here the most expected, is the best obtained fit between observed
T
correlation matrix R and the reproduced correlations R .=A
least
squares fit is
obtained by fitting:

R(R +
= V),
T

(6.40)

(R I),by(R
= H),
T

(6.41)

or fitting:

23
Conceptually, the idea of extracting factors by minimizing the residual correlations is
an obviously direct approach. This idea is certainly not new. However, its full accomplishment
was dependent on the high-speed computer resolutions.

242

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

where:

H = I V = dia(T)

(6.42)

is the diagonal matrix of communalities determined on the basis of . Minimization of residuals leads to principal component analysis and in case of
(6.41) we obtain MINRES. This condition may be expressed more precisely
as follows:
min [R I] T diag( T ) .

(6.43)

Maximum likelihood ML
Maximum Likelihood was invented by Lawley [1940] and further developed by Jreskog [1967]24. In this method, in the first stage, we assume that
observed variables will be of multivariate normal distribution25. In ML we
strive to estimate model parameters, which provide largest likelihood of explaining matrix of observed variables.
Now, let be avariance-covariance matrix of which elements are estimated on the basis of n observations X's. Elements of this matrix can be
expressed as factor loadings and residual variance:

= T + V.

(6.44)

In sample, its variance-covariance matrix S will be expressed as follows:


S = T + V.

(6.45)

Using the maximum likelihood estimation we simply try to minimize:


FML = tr(S 1 ) log | S 1 | p,

(6.46)

24
Originally this method was proposed for econometric simultaneous equation models by
Koopmans, Rubik and Lepnik [1950] under the name full-information maximum likelihood.
25
Fabrigar et al. argued that if data are relatively normally distributed, ML is the best choice
because: it allows for the computation of awide range of indices of the goodness of fit of the
model (for example in CFA models), and permits statistical significance testing of factor loadings and correlations among factors and the computation of confidence intervals [Fabrigar
etal. 1999, p.277].
If the assumption of multivariate normality is on the other hand severely violated they
recommended one of the principal methods [Fabrigar et al. 1999]. In general, ML will give the
researcher the best results, depending on whether the data are generally normally-distributed
or significantly non-normal, respectively [Costello and Osborne 2005].

Variance decomposition, matrix of correlation and factor loadings

243

where tr() and || denote the trace and the determinant of respectively,
and p denotes number of observed variables.
Further, if X's are normally distributed in ML, the elements of S follow
aWishart distribution with n degrees of freedom26. The log-likelihood function, neglecting afunction of observations is given by [Jreskog 1967]:
1
log L = log | | + tr(S 1 ),

(6.47)

and assuming that:


FML = log | | + tr(S 1 ) log | S | p,(6.48)
then, maximizing log L is equivalent to minimizing FML and n times the
minimum value of FML is equal to the likelihood ratio test statistic of goodness of fit27.
This method has afew important advantages. Firstly, it can be used for
correlation or covariance matrix without causing incompatibility of both
solutions. Secondly, ML does not require pre-determining the variation of
the common resources of the correlation matrix, when it serves as the starting point. Thirdly, ML is based on statistical tests of every extracted factor,
hence we can use acomplex apparatus of the statistical inference. Another
satisfactory property of ML (as well as GLS discussed next) is that it has
scale invariance and scale freeness28.
Generalized least squares GLS
The estimation procedure for generalized least squares (GLS) was developed
by Jreskog and Goldberger [1972]. Again, it is assumed that the data are
arealization of amultivariate normal distribution.
26

If S has aWishart distribution (asomewhat less restrictive assumption than the requirement that the observed variables follow amultivariate normal distribution), minimizing the
ML discrepancy function produces ML Wishart estimates.
27
For more comments, see the next section on CFA models.
28
Scale invariance refers to the property that the value of the fit is the same regardless of
the change of scale of the measurements. For example, if value of the fit function is the same
when transforming acovariance matrix to acorrelation matrix, then the estimator is scale invariant. Asimilar concept is that of scale freeness, which concerns the relationship between parameter estimates based on untransformed variables and those based on linearly transformed
variables. More specifically, if scaling factors can be determined that allow one to obtain transformed estimates from untransformed estimates (and vice versa), then the estimator is scale
free [Kaplan 2000].

244

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

In GLS we estimate the remaining correlation coefficients after the factors


extraction, and minimize the sum of squared differences between correlations of the model and observed correlations. In GLS, correlation coefficients
before they are taken into account, they are weighted [Gatnar 2003]. Every
weight is selected on its amount of communality regarding each variable.
Weights are selected, so that the correlations between observed variables
with low communality obtain lower weights, than those with high communality.
Jreskog and Goldberger [1972] proposed the following estimation procedure which calls for minimization of the quantity:
1
FGLS = tr(I S 1)2.
2

(6.49)

This yields ascale-free method and when normality is assumed, it produces


estimates which have the same asymptotic properties as the maximum likelihood estimates.
In approach to GLS estimation, in practice, is unknown, thus for rule
of GLS:
2
1
FGLS = tr 1(S ) ,
2

(6.50)

instead of we use S which gives:


2
1
1
FGLS = tr S 1(S ) = tr(I S 1)2,
2
2

(6.51)

and which is the criterion to be minimized in GLS procedure.


Alpha, image and canonical factor analysis
Some other important extraction methods are [Anderson and Acito 1980]:
alpha factor analysis, image factor analysis and canonical factor analysis.
Alpha factor analysis focuses on creating factors with maximum reliability. This method treats unique factors as errors introduced by the psychometric sampling, and the communality estimates are treated as reliabilities
in ameasurement context. That is the correlation between e.g. two standardized variables zi and loadings i is given by [Kaiser and Caffrey 1965]:

Variance decomposition, matrix of correlation and factor loadings

rij =

ziT z j
n

Ti j
n

iT j
n

Ti j
n

for all i j,

245
(6.52)

because the error components are independent.


The

Ti Tij j
((i ij)j) terms represent the covariances between common elenn

ments of i and j. The variances of the common elements are the communalities, e.g.

Tj j
Ti i
= hi2 and
= h2j . The correlation between any two variables
n
n

is obtained by dividing the covariance of the variables by the product of the


respective standard deviations. Therefore, the correlation between common
elements of the two variables after error has been removed is obtained by
dividing

Ti j

hi2h2j

. In matrix notation, this is generalized to the correlation ma-

trix of the common factors. Thus, alpha factor analysis is concerned with
factoring the matrix of correlations among the common parts of the original
variables29.
Relationship between alpha value and eigenvalue i can be expressed as
[Rummel 1970]:
=

p
1
1
p 1 i

,

(6.53)

where p is the number of sample variables.


If i 1 reliability is less than or equal 0. This suggests astopping rule
for alpha factor analysis extract all alpha factors with eigenvalues greater
than unity.
Another method, canonical factor analysis [Rao 1955], determines
common factors as linear combinations of the common parts which have
29

Also an iterative procedure suggested by Kaiser and Caffrey [1965] was used to obtain
the alpha solution. Initially, trial values of H1 = I, are used as estimates for communalities.
This step results in obtaining the same initial eigenvalues for principal components and alpha
analysis. However, the resulting factor pattern for alpha analysis is scaled by the square roots
of the communalities.

246

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

maximum correlation with linear combinations of the observed variables30.


Canonical factor analysis identifies factors that are maximally related to the
measured variables.
Rao arrived at his formulation of canonical factor analysis via an attempt
to define factors that would have maximum generalizability to other samples
of objects. Unlike, Kaiser and Caffrey arrived at their formulation of alpha
factor analysis via an attempt to define factors that would have maximum
generalizability to other measures of the underlying variables.
Both, alpha and canonical factors, have the property of invariance under
scaling. That is the same factors are found regardless of the units of measurement of the observed variables. And the loadings are proportional to scaling
constants. Canonical factor and alpha methods can be said to be scale-free
in sense that they yield the same factors when starting from differently scaled
variables.
Finally, image factor analysis focuses on factors that exclude or minimize unique factors. This analysis is unlike common factor analysis in that
explicit, determinate definitions for the common and unique portions of the
data are provided. This step is accomplished by defining the common part
of each variable as the regression of that variable on all
the other variables.
These regression estimates become amatrix of images Z, and the unique factors of the data are known as the anti-images V. Therefore, the observed data
matrix can be expressed as follows [Guttman 1953]:

Z = Z + V.(6.54)

The regression estimates (images) are given by:

Z = Z( R 1E),

(6.55)

where: R1 is the inverse correlation matrix, E diagonal matrix of variances


of
the regression residuals. The matrix of image covariances G is given by
Z' Z / n which after substitution and rearrangement of terms becomes:
G = R + E R 1 E 2E.(6.56)
30
It is desired to determine only those uncorrelated common parts which are predictable
from column vector of p observable variables, or in other words, those which are maximally
related to this column.

Variance decomposition, matrix of correlation and factor loadings

247

The expression ER1E turns out to be equal to the matrix of anti-image covariances, , so that Eq. (6.56) can be written as:
G = R + 2E,

(6.57)

which is known as the fundamental equation of image analysis.

Selected approaches to communality estimation


In communality estimation, researcher can use anumber of estimation procedures, many of which have been known since the first applications of factor analysis [Madansky 1965; Ramsey and Gibson 2006, p.93]. Because one
rarely knows in advance what proportion of the variance is unique and what
is shared with other variables in the matrix (if one did, one would probably
not need to be doing an exploratory analysis), some sort of estimate must be
used in the initial phase.
The following approaches can be used [Loehlin 2004]31:
highest correlation of avariable,
average correlation or triads,
squared multiple correlation SMC,
iterative improvement of the estimate.
The former one, which is very serviceable in large matrices, can be used
as the communality estimate for agiven variable with the highest absolute
value of its correlation with any other variable in the matrix [Rusnak 1999]:
hi2 = max rij ,fori j.(6.58)
j

In the highest correlation approach, the largest off-diagonal number in


each row in the matrix is put into the diagonal with positive sign. The highest
correlation of avariable with another variable in matrix is not of course its
communality, but it will, in ageneral way resemble it. So, the variables that
share much variance with the other variables in matrix will have high correlations with those variables and hence get high communality estimates, whereas variables that do not have much in common with any other variables in the
matrix will have low correlations and hence get low communality estimates.
However, in some cases it will not do so well, e.g. avariable that has moderate correlations with different variables might have ahigh true communal 31

Some other yet is the Burts method [Balicki 2009, pp. 151152].

248

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

ity but would receive only amoderate estimate. Nevertheless, in reasonably


large matrices this quick and easy method is often adequate.
Average correlation and triads-based methods are the response to criticism of the highest correlation method. The former one assumes that communality is based on average correlation coefficients of variable Xi with other
variables:
hi2

1
=
r ,
p 1 j=1 ij

(6.59)

j i

and in the latter method, communality is expressed with the formula:


hi2 =

rijrim
rjm

,

(6.60)

where:
rij, rim highest correlation coefficients of variable Xi with the other remaining variables,
rjm correlation coefficient of variables j and m.
A more sophisticated approach, but requiring more computation, is to
estimate the communality of agiven variable by the squared multiple correlation SMC, with all the remaining variables in the matrix. In practice, this
is usually done by obtaining the inverse of the correlation matrix fromR.
The reciprocals of the diagonal elements of the inverse correlation matrix
subtracted from 1.0, yield the squared multiple correlations [Loehlin 2004].
The SMCs are not communalities, in fact, they are systematically lower
than (at most equal to) true communalities. Nevertheless, they are related to
the communalities in ageneral way, in that sense, if avariable is highly predictable from other variables in the matrix, it will tend to share agood deal
of variance in common with them, and if it is unpredictable from the other
variables, it means that it has little common variance. In large matrices, the
SMCs are often slightly below the theoretical true communalities32.
32

Kline [2010] pointed out, that there are certain difficulties in SMCs application. He mentioned that using multivariate correlation for the purpose of initial estimation of communalities
hi2 causes that factor analysis may incorrectly reproduce all the common factors. This fact was
proved by experimental research in reference to already known factors. And in case of large
matrices (including large set of p variables) multivariate correlation coefficients reach almost
the level of 1.0 which may cause overestimation of the initially defined communalities.

Variance decomposition, matrix of correlation and factor loadings

249

This value is calculated from the following assumption:


hi2 = 1

r ii

.(6.61)

where rii is the j-th diagonal element of the inverse correlation matrix R1.
Finally, in the approach based on iterative improvement of the estimate,
researcher makes an initial communality estimate, by obtaining afactor pattern matrix, and then uses it to obtain the set of communalities implied by
the factor solution. In typical case of uncorrelated factors, these are just the
sums of the squares of the elements in the rows of factor pattern matrix. One
can then take these implied communalities, which should represent abetter
estimate than the initial ones, put them in place of the original estimates in
reduced correlation matrix and repeat the whole process. The factor pattern matrix should yield better estimates of the communalities, which can
be reinserted in reduced correlation matrix. The process is repeated until the
successive repetitions no longer lead to changes in the estimates.
Disadvantage of iterative solutions for the communalities is that they will
sometimes lead to so-called Heywood case33, that is, acommunality will converge on avalue greater than 1.0. This is awkward, because ahypothetical
variable that shares more than all of its variance with other variables is not
too meaningful. Some factor analysis computer programs will stop the itera 33
With the increasing use of good methods of estimation, researchers are 2 encountering
cases where the best-fitted estimates are improper, because one or more estimates of uniqueness are negative. This of course is unacceptable, as variance are essentially positive quantities
(means of squares). Even zero residual variance is unacceptable, as it implies exact dependence
of an observed variable on the common factors. This could only be true if the variable has no
measurement error. That negative uniqueness might arise was first pointed by Heywood [1931],
hence it is commonly referred to as aHeywood case. Alternatively, it is known an improper
solution. MacDonald [1985, p.79] summarized this situation in afew points:
Some researchers tend to regard the fact that Heywood cases can occur as an indication
that something is wrong with the basic principles of the common factor model and that
we should use some other technique of multivariate data analysis instead e.g. principal components.
A Heywood correlation matrix is aperfectly possible correlation matrix for apopulation.
On the other hand anon-Heywood population can give samples, by chance, in which
estimators of some positive population residual variances are negative, hence Heywood
case in asample does not prove that the population is aHeywood case. A second sample
might yield adifferent conclusion.
Sometimes aHeywood case can be cured by fitting fewer factors, but often this gives
unacceptably poor fit.

250

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

tive process automatically when an offending communality reaches 1.0 but


this isnt much better, because avariable with no unique variance is usually
not plausible either. A possible alternative strategy in such acase might be
to show e.g. by means of a2 test, that the fit of the model with the communality reduced to asensible value is not significantly worse than it is with
the Heywood case communality. If this proves not to be the case, the model
is unsatisfactory and something else must be considered when extracting
adifferent number of factors, reseating variables to linearize relationships,
eliminating the offending variable, or the like. Another strategy is to limit
the number of iterations. Two or three will often produce asubstantial improvement in communality estimates without taking one across the line into
Heywood territory [Loehlin 2004].

Number of factors
Another question is how many factors to retain. Both over-extraction and
under-extraction of factors has deleterious effects on the results. Determining how many factors to include in the model requires one to balance the
need for parsimony (i.e. amodel with relatively few common factors) against
the need for plausibility (i.e. amodel with asufficient number of common
factors to adequately account for the correlations among measured variables) [McNemar 1942].
Methodologists have regarded process of specifying too few factors in
amodel (related with underextraction) as amuch more severe error than
specifying too many factors (i.e. overextraction) [Thurstone 1947; Rummel
1970; Cattell 1978]. Empirical research has supported this notion. When too
few factors are included in a model, a substantial error is likely to appear
[Fava and Velicer 1992; Wood, Tataryn and Gorsuch 1996]. Observed variables that load on common factors, not included in the model, can falsely load
on factors included in the model, and poor estimates of the factor loadings
can be obtained for observed variables that do actually load on the factors
included in the model. Such distortions result in rotated solutions in which
two common factors are combined into asingle common factor and in solutions with complex patterns of factor loadings that are difficult to interpret
[Comrey 1978].
Empirical research revealed that overextraction introduces much less error to factor loading estimates than underextraction [Fava and Velicer 1992;
Wood, Tataryn and Gorsuch, 1996]. Such models often result in rotated so-

251

Variance decomposition, matrix of correlation and factor loadings

lutions in which the major factors are accurately represented and the additional factors have no observed variables that load substantially on them or
have only asingle variable that loads substantially on each additional factor.
Nonetheless, overextraction should be avoided too [Comrey and Lee 1992].
Solutions with too many factors prompt aresearcher to postulate the presence of constructs with little theoretical value and thereby develop unnecessarily complex theories. Additionally, overextraction can accentuate poor
decisions made at other steps in afactor analysis.
There are two general methods of finding common factors. One of them
depends on objective and the other on subjective criteria. In objective
methods (based on e.g. data distribution and application of statistical tests),
aprocess of factors extraction continues until there is no longer the case. In
statistically based methods we seek the exhaustive account of the number of
factors underlying aset of items, where source of covariation is accounted
for particular factors in maximum way. Typically, a process of extraction
ends when the sufficient level of likelihood, p value is attained.
In contrast, subjective methods are heuristic and mainly based on researchers point of view and the choice of better fit in data presentation.
Subjective methods must be systematically revised in context of geometrical
data exploration pertaining to specific characteristics appearing on the extracted factors and their loadings [Tarka 2010b].
Some form of the generalized rule to consider the appropriate number of
factors was proposed by Balicki [2009]:
( p + k ) < ( p k )2,

(6.62)

where: p observed variables and k common factors.


The solution for k factors is therefore:
k<

1
2 p + 1 8 p + 1 .
2

(6.63)

Thus, having based on (6.63), one can find asufficient number of common
factors at certain level of observed variables (for comparison see Table 21).
Table 21. Number of factors and number of observed variables
p

10

12

13

14

15

Source: Balicki 2009.

20
14

252

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

The specific methods related to number of factors extraction are described below.
1. Eigenvalue method
This method is also known as the Kaiser-Guttman rule (which refers to the
eigenvalues > 1.0 rule). Method alone is very straightforward. At first, we
obtain eigenvalues derived from the input correlation matrix. Then, we determine how many eigenvalues are greater than 1.0 and use that number to
determine the number of non-trivial latent variables that exist in the input
data.
The Kaiser-Guttman rule has wide appeal because of its simplicity. However, some methodologists have criticized this procedure because it can
result in either overextraction or underextraction, and because of its somewhat arbitrary nature. For example, sampling error in the input correlation
matrix may result in eigenvalues of 0.99 and 1.01, but nonetheless the Kaiser-Guttman rule would indicate the latter is an important factor whereas
the former is not.
2. Half method
Number of factors cannot exceed the number of observed variables and every factor should have explained at least from 1 to 10% of the total variance.
The disadvantage of this method, is the subjective criteria being used in order to generate asufficient number of factors. The advantage is the simplest
procedure, which yields the scores easily interpretable.
3. Jolliffe method
This method is focused only on those factors which eigenvalues are greater
than 0.7. In practice, agreat disadvantage of Jollifes method is that it defines
too many factors, especially when alarge set of observed variables (weakly
correlated) is taken into account.
4. Explained variance value method
Number of factors is generated, depending on the pre-assumed percentage
level of variance that is accepted by researcher, e.g. ranging from 65 to 85%.
5. Scree plot method
In the scree plot one uses the eigenvalues that can be taken from the input or reduced correlation matrix (Fabrigar et al. [1999] noted reasons why
scree tests based on reduced correlation matrix are preferred). The visible
slope of the line (with vertical position) indicates the number of factors

Variance decomposition, matrix of correlation and factor loadings

253

that should be retained. On the other hand, scree plot marks those factors
that should be eliminated from analysis. As aresult, the scree test involves
examining the graph of the eigenvalues and looking for the natural bend
or break point in the data where the curve flattens out. The number of data
points above the break (i.e. not including the point at which the break occurs) is usually the number of factors to retain, although it can be unclear
if there are data points clustered together near the bend. This can be tested
simply by running multiple factor analyses and setting the number of factors to retain manually once at the projected number based on the apriori
factor structure, again at the number of factors suggested by the scree test
if it is different from the predicted number, and then at numbers above and
below those numbers.
A serious limitation of this method is that the results of the scree test
may be ambiguous (e.g. no clear shift in the slope) and open to subjective
interpretation. However, as noted by Gorsuch [1974], the scree test performs
reasonably well under conditions such as when the sample size is large and
when well-defined factors are present in the data.
6. Parallel analysis
This method is based on ascree plot of the eigenvalues obtained from the
sample data against eigenvalues that are estimated from adata set of random
numbers (i.e. the means of eigenvalues produced by multiple sets of completely random data) [Horn 1969; Humphreys and Montanelli 1975].
The term parallel analysis refers to the fact that random data set(s) should
parallel aspects of the actual research data (e.g. sample size, number of observed variables). The rationale of parallel analysis is that the factor should
account for more variance than is expected by chance (as opposed to more
variance that is associated with given observed variables, per the logic of the
Kaiser-Guttman rule). In parallel analysis, both the observed sample and
random data eigenvalues are plotted, and the appropriate number of factors
is indicated by the point where the two lines cross. Thus, factor selection is
guided by the number of real eigenvalues greater than the eigenvalues generated from the random data, that is, if the real factor explains less variance
than the corresponding factor obtained from random numbers, it should
not be included in the factor analysis.
Although parallel analysis frequently performs well, like the scree test it
is associated with somewhat arbitrary outcomes. For instance, chance variation in the input correlation matrix may result in eigenvalues falling just
above or below the parallel analysis criterion.

254

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

7. Reduced matrix R can be successfully applied to detect the optimal


number of common factors, where matrix of loadings reinforces the
reconstruction process of the reduced correlation matrix R (R = T)
If there is considered constrained number of factors, then this reconstruction will be just approximate, which means for product of T is approximately equal R . Difference R T is perceived as ameasure of precision,
according to which number of k factors are extracted, and which simultaneously account for the observed correlations. Elements of these differences
which are placed off-diagonal in matrix, are called residuals, which have
not been yet accounted for the extracted factors. Residual correlation coefficients are somewhat comparable with analysis of the partial correlation
coefficients34.
Assuming the level k(k < p) of the extracted factors, the difference R T
will be denoted as Rk and elements of this matrix will be as follows rij.k:
RRkk ==RR
TT ==[[rrijijkk]].

(6.64)

As aresult, we are interested in the extent of correlations remaining among


the items after the first, second and another factor has been extracted. If
second factor does not capture all correlation after the extraction of the first
factor, then further analysis may be needed to account for the remaining
correlation. This process is continued, until each successive factor is extracted from the residual matrix (based on iteration). It is conducted until the
matrix contains only small residuals.
The most important aspect of factors extraction is the moment when one
should stop the whole process. Because expectations for the number of factors are known, all that matters is the criterion that is used here. In FA, if
factors explain the correlation to its maximum, all residuals (or residual correlation coefficients) should be small. It is therefore necessary to take aclose
look at their values. If they do not differ much from 0, the resource variability has been eliminated. However, an arbitrary assessment and decision
making (depending on whether the residuals are small or large), still raises
34

If true factors really exist in data, the partial correlations should be small, because items
are explained by item loadings on the given factors. The exception regarding high correlations
as indicative of a poor correlation matrix occurs when two items are highly correlated and
have substantially higher loadings than other items on that factor. Then, their partial correlation may be high because they are not explained to any great extent by the other items, but do
explain each other.

Variance decomposition, matrix of correlation and factor loadings

255

many questions. In such cases, one can apply the Saunders criterion [Ok
1964], where process of extracting the factors will be completed when there
is the relation:

p
p
2p
1 pk
rijk <
p
p 1 j=1
n p
i =1

j =1

2ij ,

j i

(6.65)

where:
k the highest (in arow) extracted factor, which means anumber of
extracted factors,
rij.k early predefined residuals, after the extraction of the highest factor,
n number of observations, based on which the correlations were calculated.
However, meeting the above conditions, that is, obtaining: few factors
and residuals matrix (which is nearly equal 0), in practice, is arather impossible task. Such abeing case, we need to find some solution in middle. For
instance, if observed variables have p-dimensional normal distribution, then
one can use Bartletts test with the following hypothesis [Bartlett 1950, 1951]:
H0 : T + V,

(6.66)

for which statistics of the likelihood ratio will be:


2 p + 4k + 11 T + V

2 = n
,(6.67)
ln
6
S

where S covariance matrix in asample and (2p + 4k + 11)/6 is the element


of correction, which was proposed by Bartlett35. It improves the convergence
of the statistic distribution to the distribution of chi-square statistics.
2
In Bartletts test we reject hypothesis H0 when ob
> 2. As aresult we need
T
to extract more factors, because differences S V, are still large and
statistically significant.
35

Bartletts test is based on two general assumptions. Firstly, the correlation matrix should
be tested to evaluate whether the matrix is an identity matrix I. If null hypothesis cannot be
rejected (that the correlation matrix is an identity matrix), then factors cannot sensibly be extracted from the matrix. Secondly, all another factors are being successively extracted in order
to contain progressively less and less information or variance. As a result, the residual correlation matrix can be employed, to evaluate if there has been left any information [Bartlett 1950].

256

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

The statistic (6.67) is also applied for data derived from R correlation matrix and is expressed as follows [Balicki 2009]:
rij2k
2 p + 4k + 11

2 = n
,

i< j rii2k rjjk

(6.68)

where asum is extended into the elements of the residual correlation matrix,
placed below or above the diagonal.
In McDonalds [1985, pp. 5659] point of view: there might appear some
problem when we analyze residuals as abasis for determining whether or not
the model fits the data well enough, which is of course lack of the comforting
sense of objectivity that comes from choosing astatistical significance level
and consistently applying it. In many cases, both solutions, based on residuals or significance test might be very helpful. When McDonald performed
tests in his study, he found out at two factors hypothesized, and apart from
the fact that the 2 was significant, there were worst discrepancies in residual
matrix. On the other hand, when he assumed a three-factor model, apart
from non-significant 2, the residuals showed that afourth factor would be
ill-defined and unnecessary, because none of the residuals was larger than
0.022 [McDonald 1985].

Geometrical identification and techniques of factors rotation


One of the objectives of factor analysis is to display the configuration of variables in a simple way. Geometry helps researcher to describe the relationship
between observed variables and provides significant information on how the
extracted factors (rotated or unrotated) are made. Ageometrical perspective
of factor analysis makes up a significant element of FA which assists and
leads the whole analysis to the end [Yule 1897].
In geometry and linear algebra, arotation is atransformation in aplane
or in space that describes the motion of arigid body around afixed point.
By rotating the factor axes, we place factors so that each contains only
afew highly loaded variables, whereby the factors become combinations
of interdependent variables. As Jreskog and Reyment [1996] explained,
it is usually difficult to find an objective definition of the exact position of
the factors. The concept of simple structure (put forward in 1935 by Thurstone) goes a long way toward doing this. The positions of the unrotated
factors are strongly dependent on the particular set of variables in analysis.

Variance decomposition, matrix of correlation and factor loadings

257

So when one variable is left out, the factor positions will almost certainly
change.
In order to explain the above assumptions we should have focused on
the graphical presentation of correlation coefficients in n-dimensional space
assuming that space dimension is defined by number of observations in matrixR. On the other hand, subspace can be reduced to k-dimensional (k < p)
space (where p denotes collection of vectors in n-dimensional space which
contains vector-variables Xi. Variables that are correlated with each other,
should be set on avector, that is, asort of segment in straight line with defined length and its direction (in analytical geometry, vector is defined as
directed/pointed segment where beginning a and end b of segment is set
apart). As a result, vector consists of a (beginning) and b (end) denoting
[a,b]. Its absolute value length is denoted by |a,b|. Alength of vector is also
denoted as hi, as avariable represented on that vector according to its length.
Another variable is denoted as hk.
If one presents two observed variables as vectors, acorrelation between
them equals product scalar of two vectors, which is product of absolute values in either vectors by cosinus angle. Simply saying, correlation coefficient
between two variables equals total length of two vectors multiplied by cosinus angle length between them [Thurstone 1947]. It can be otherwise explained as: rik = hihk cosik, provided, both vectors are made up of the same
length, hi = hk = 1. In other way it can be stated that:

r12 = h1h2 cos12,
(6.69)
where:
r12 correlation coefficient between observed variables 1 and 2,
h1 length of vector representing variable 1,
h2 length of vector representing variable 2,
12 angle between vectors in variables 1 and 2.
If we assume that both h1 and h2 > 0 at r12 = 0 then 12 = 90 and respectively for negative correlation r12 = 0.60, we obtain 12 > 90 or even
12 = 180. Analogously for positive correlation r12 = +0.60, cosinus angle
between vectors will be 0 12 90. Both vectors will be expressed in
unit length (which means they represent total variance), therefore h1 = 1 and
h2 = 1. This implies that product scalar (in either vectors) equals the cosinus
angle between them: r12 = cos 12. For example, if 12 = 45 then we obtain
correlation r12 = 0.707.
The process of placing points in geometrical configuration according to
specific vectors arranged between two axes, begins with correlation estima-

258

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

tion on hypothetical variables and further transformation of their correlation coefficients into vectors. In fact, the more variables come into data
analysis, more vectors come out onto geometrical space and simultaneously
more complex geometrical presentation such data becomes.
In sum, correlations in matrix of observed variables reflect the mutual
configuration of vectors and axes within which agiven configuration of vectors make up common factors (see, for example, Figure 26 presenting two
vectors X'1 X'2 of two observed variables; '12 the angle between them as the
correlation, and F1 and F2 as the common factors).
F2
X'1(11 , 12 )

h1
12

X'2( 21, 22 )

'
12

11

F1

Figure 26. Vectors configuration defined in the space


of two common factors F1and F2
Source: own construction based Rusnak 1999.

Factors rotation
In rotation we simplify and clarify the data structure. Usually obtaining the
best solution depends on determining the transformation matrix and finding initial orthogonal solution36. Two restrictions are placed on transfor 36

Conventional wisdom advises researchers to use orthogonal rotation because it produces more easily interpretable results, but this is on the other hand, a flawed argument. In the
social sciences (i.e. marketing research) we generally expect some correlation among factors,
since behavior of the people is rarely partitioned. Therefore, using orthogonal rotation results
sometimes in a loss of valuable information if the factors are correlated, and oblique rotation
should theoretically render a more accurate, and perhaps more reproducible, solution. If the
factors are truly uncorrelated, orthogonal and oblique rotation produce nearly identical results
[Costello and Osborne 2005, p. 34].

Variance decomposition, matrix of correlation and factor loadings

259

mation matrix. Firstly, and that is essential, transformation matrix must be


non-singular. If not, the common factors space will collapse. Secondly, and
this is for convenience only, transformation matrix must be scaled such that
the derived factors have unit variance. For uncorrelated factors, the restriction of transformation matrix must be orthonormal, which guarantees the
imposition of the first two restrictions.
Harris and Kaiser [1964] provided some general conditions to be met
when we approach to rotation. At first, we must find the most suitable coordinate axis with (geometrically) simplest structure, and then, decide on precise interpretation of factor loadings projections (e.g. vectors configuration)
placed on the new coordinate axis system.
In rotation we turn axes, so if, for example, two factors are denoted as F1
and F2, and rotation angle as then we obtain:
F' = F1 sin + F2 cos ,
1

F'2 = F1 cos + F2 sin

(6.70)

for rotation reverse clockwise, and


F'1 = F1 sin + F2 cos ,


F'2 = F1 cos F2 sin

(6.71)

for rotation clockwise.


If factor loadings matrix is multiplied by orthogonal matrix B then we
obtain [Rusnak 1999]:

B(B)T + V = BBTT + V = T +V = S.

(6.72)

In consequence, different rotation techniques and orthogonal transformations can be obtained. From matrix there could be a number of new
factor loadings matrices, generating the same variance-covariance matrix S.
Now, the observed variables form the spaced vectors configuration in
factors space, where coordinate system is chosen and which is made of k
orthogonal axes associated with respective factors F1, , Fk. The preferable
entry of reference axes to vectors configuration yields the factor solution.
However, depending on the coordinate system position (which is set on arbitrary basis of the researcher), one can obtain good or bad representation of
factor loadings being the extension of observed variables vectors projections

260

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

on factoraxes [Carroll 1953; Kaiser 1958]. In short, the effectiveness of rotation depends not only on factors loadings structure, where points (representing observed variables) are clustered around factors axes. It also depends
on appropriate technique of rotation of the common factors axes.
Selected techniques of factors rotation
In literature we find two general types of rotation techniques37. As matter of
fact, translation of Thurstones qualitative criteria into precise mathematical
terms took several years and the efforts of many scientific workers. In consequence, numerous solutions were developed. We shall confine here only to
widely used techniques in most computer programs38.
37

The raw, unrotated factors are rather meaningless mathematical abstractions.


However, some other yet analytic rotation techniques used in exploratory common
factor analysis can be found in the work of Browne [2001], where he summarized the other
authors propositions such as: Crawford-Ferguson (CF) family of rotation criteria [Crawford
and Ferguson 1970], Yates geomin [Yates 1987], McCammons minimum entropy criterion
[McCammon 1966], and McKeons infomax [McKeon 1968].
In a brief, according to the first proposition, Crawford and Ferguson have suggested
afamily of complexity functions based on a measure of complexity of c(s). This family is indexed by a single parameter, (0 1), and the Crawford-Ferguson criterion in general is
aweighted sum of a measure of complexity of the p rows of and a measure of complexity of
the m columns. In orthogonal rotation the Crawford-Ferguson is equivalent to the orthomax
family such as: quartimax, varimax, equamax, parsimax, or factor parsimony [Crawford and
Ferguson 1970, pp.324326]. Quartimax, varimax and equamax were previously members
of the orthomax family and parsimax and factor parsimony were suggested by Crawford and
Ferguson [1970].
In another yet technique (i.e. Yates geomin) this criterion employed an adaptation of
ameasure of row complexity first suggested by Thurstone [1935]. Thurstone gave details of an
algorithm for minimizing complexity function f(L), but it was not then successful [Thurstone
1935, p. 197] so that his complexity function had little impact at the time. It was modified by
Yates [1987, p. 46] by replacing the sum of within row products of squared reference structure
elements by a sum of within row geometric means of squared factor pattern coefficients. Yates
intended geomin for oblique rotation only. However, it can also be used for orthogonal rotation, but may not be optimal for this purpose.
McCammons criterion [McCammon 1966] was on the other hand based on the entropy
function of information theory. McCammon intended his minimum entropy criterion for use
in orthogonal rotation only. As proved, it is unsatisfactory in oblique rotation.
Finally McKeon [1968] pointed out, that if the squared factor loadings would be interpreted
as frequencies, the criterion may be regarded as a measure of information about row categories conveyed by column categories and, simultaneously, as a measure of information about
column categories conveyed by row categories. He consequently named it infomax. McKeons
infomax criterion gives good results in both orthogonal and oblique rotation.
38

Variance decomposition, matrix of correlation and factor loadings

261

First general family of techniques is called oblique, where factors are dependent on each other, i.e., they are correlated. The other family belongs to
orthogonal rotation, that is, where factors remain independent.
Orthogonal transformation depends on turning matrix with factor loadings into B which is made by rotation of coordinate configuration in factorial space within the close range of starting coordinate point. On the other
side, factors which capture the extent to which the observed variables are
correlated allow us to examine the extent to which these factors themselves
are intercorrelated (an oblique rotation).
The most often applied orthogonal technique in practice is varimax. It
tends to purify the factors and accounts as much of the covariance in the
data domain as possible, so as to make it easier to conceptualize the entire
domain. An algorithm to rotate the loadings was proposed by Kaiser [1958],
which maximized squared variance in factor loadings for each factor, on
given number of factors and given communality:

V =

i =1

4ij

j =1

2ij

i =1 max,
p2

(6.73)

where:
k number of common factors,
p number of variables,
ij factor loading i-th variable on j-th factor.
Usually this technique is applied with the loadings normalized divided
by the square root of the communality, in order to make each row sum of
squares equal unity.
Another technique quartimax is similar to varimax. However, quartimax is focused on simplifying the columns of afactor matrix. It tends to
simplify the rows but not the columns of the factor pattern. It may then leave
ageneral factor with no near-zero loadings.
Quartimax rotation is considered as less effective than varimax, for it
maximizes squared variance in factor loadings for each variable (in contrast
to varimax), on respective number of factors and respective communality
that is leading to retain orthogonal factors [Gatnar 2003]:
Q=

4ij max.
i =1 j =1

(6.74)

262

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

Another rotation technique, equimax, combines varimax or quartimax:


Q + V max,
where and are weights both criteria.
Simplified equimax criterion is expressed as follows:
2

p 2

ij
k p

E = 2ij i =1 max,(6.75)

j =1 i =1

.
+
If by any chance = 0.5, then we obtain another rotation biquartimax.
If restriction of orthogonality is released, it is impossible to apply directly
the quartimax criterion or the varimax. It is due to the fact, the interfactor
relationships are not considered when the criteria are in this form, and when
applied, all factors will collapse into the same factor, that is, one factor which
best meets the criterion [Carroll 1953]39. In fact, this is the point where we
reach to family of oblique rotations.
The objective of the oblique rotation is to identify correlated factors.
Among the variety of techniques, there are oblimin, quartimin, promax, or
orthoblique rotations.
In oblimin we minimize the correlation of squared loadings in distinct
columns. As aresult, the sum of squared correlation coefficient for variables
is minimized with axes perpendicular to the hyperplanes designated by axes
fitted the variables (according to simple structure of factors) [Aranowska
2005]:

where =

O=

p
p
p 2 2

p ij ir 2ij 2ir min,

j =1 i =1
i =1
i =1

(6.76)

j<r
39

Varimax, or any of the techniques, should not be used when there is atheoretical expectation of a general factor [Gorsuch 1970]. Because varimax serves to spread variance
evenly among factors, it will distort any general factor in the data. Quartimax [Carroll 1953]
is probably the orthogonal rotation procedure of choice when a general factor is expected.

Variance decomposition, matrix of correlation and factor loadings

263

where:
2ij loading for factor j,
2ir loading for factor r.
Oblimin procedure is based on primary axes configuration represented by
oblique factors, fitted to observed variables, and reference axes.
In case of quartimin, function is minimized without:
p
p
p 2 2

2
Qu = p ij ir ij 2ir min.

j =1 i =1
i =1
i =1

(6.77)

j<r

In third oblique rotation technique, promax [Hendrickson and White


1964], we maximize simple structure while allowing the factors to become
correlated. Promax begins with an orthogonal varimax rotation and then
relaxes the solution to an oblique rotation, using the criteria presented
by Hendrickson and White [1964]. After obtaining varimax solution, it is
transformed to an oblique solution that has the same high and low loadings, but with the low loadings reduced (if possible) to near-zero values40.
The second step is done by direct calculation, not iteration, so that if an
orthogonal solution can correctly identify the factors, promax provides an
efficient route to an oblique solution. The second step of Promax solution is
avariant of aprocedure called Procrustes [Hurley and Cattell 1962], which
forces afactor pattern matrix to abest least squares fit to apre-designated
target matrix41.
In fourth rotation, orthoblique [Harris and Kaiser 1964], like promax,
one reaches an oblique solution via an orthogonal rotation, but the strategy
is aslightly different one. The first k eigenvectors of the correlation matrix
(where k is the desired number of factors) are subjected to an orthogonal
rotation (originally, raw quartimax, although others can also be used). The
transformation matrix developed in this step is then rescaled in its rows or
columns or both by suitable diagonal matrices, to become the final transformation matrix, that is, the matrix that transforms an initial principal factor
solution into the final rotated oblique solution.
40

Promax probably works well and robustly because it is fairly simple. After varimax loadings have been estimated, they are essentially raised to powers, so that high loadings (e.g. 0.8)
become a little lower, but low loadings (e.g. 0.2) disappear to near-zero.
41
It gets its name from the legendary Greek who forced travelers to fit his guest bed by
stretching or lopping them as necessary.

264

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

Some other yet techniques are: maxplane42 [Cattell and Muerle 1960]
and KD transformation [Kaiser and Madow 1974], which focus on low pattern coefficients and work at transforming factors to get as many pattern
coefficients close to zero as possible. Techniques such as these strive directly
for the kind of simplicity in apattern matrix alarge number of near-zero
paths. They are most often used to apply final touches to an approximate
solution arrived at by another technique.
In the social sciences, especially in marketing research, factor analytic
reports provide only few examples of oblique rotations. The orthogonal rotation dominates despite the strong likelihood that correlated factors and
hierarchical factor solutions are intuitively attractive and theoretically justified in many research cases. The careful researcher should almost invariably
perform both an orthogonal and an oblique rotation. These solutions can be
then compared to identify the simpler structure and to determine whether
the oblique rotation produces amarked increase in the hyperplane count.
Oblique solutions have been found particularly useful in the theory construction and are likely to play asignificant role in the development of any
theory of consumers attitude [Stewart 1981].
Graphical display of factors in n-dimensional space
Visual presentation and inspection of factors may be confined by their number. Two or three dimensional space with points as factor loadings is easy to
imagine and visualize through the agency of graphics. However, visual projections with more than just three factors make the analysis more difficult.
In order to find the best solution, one has to limit anumber of factors which
are being displayed on graph at one time. It is necessary to select only two
or three factors among the rest, which will be placed initially on the space.
For example, if there were four factors, we should proceed with the first two
factors and then display another remaining two.
When the need arises (i.e. when additional factors appear on n-dimensional space), the number of required graphs should be also increased. This
can be proved as follows:
m=
42

k(k 1)
,
2

(6.78)

The important aspects of the maxplane rotation are the following. It: 1) permits obliqueness, 2) works directly by maximizing the hyperplane count, 3) puts no restriction on the factor
patterns and their relationships, 4) permits parameters to be inserted conveniently in response
to statistical and other properties of the given research, 5)tends to select first the hyperplanes
bearing the factors of largest variance [Cattel and Muerle 1960].

Variance decomposition, matrix of correlation and factor loadings

265

where:
m number of required graphs,
k number of factors.
Satisfactory principles in FA rotation
At the end it is worth mentioning some core rotational principles in factor
analysis. Most of them are equally applicable to orthogonal or oblique solutions [Tarka 2013b, p. 204206]:
The principle: rotation to agree with factors from past factors analyses
This approach, which has been widely resorted to, especially in the final stages of arotation, consists in rotating until as many as possible of the factors
agree with the factors previously established in independent researches.
The principle: rotation to put axes through the center of clusters
This may be done either by picking out the outstanding correlation clusters
in the original correlation matrix, or by considering the clusters which exist
in the projection on asingle plane when the number of factors is known and
plotted. In general, if there are two factors operating fairly evenly in acertain
matrix, the noticeable correlation clusters are likely to occur in the regions of
overlap of the two factors. In these regions the shared variance (communality) is higher. Such comparatively even distribution of loadings is likely to
occur when the total variance is accounted for by aconsiderable number of
factors. In such circumstances, acluster is more likely to represent aregion
of overlap of several factors than the region of strong influence of one factor.
On the other hand, with one or two factors, the high points (clusters) of the
matrix may well be the variables best defining the factors. For example, in
amatrix satisfying the two-factor theory we put the axis through the center
of the most highly intercorrelating bunch of variables. However, since both
possibilities exist, there is no guarantee that asalient cluster is anything more
stable than aprovince of overlap of two or more real, functional factors.
The principle of orthogonal additions: rotation to agree with successively
established factors
In an n-dimensional orthogonal system, if the position of n 1 axes is
known from previous sources of evidence, the position of the n-th axis is automatically established. One can begin therefore with variables which, apart
from specific factors, measure only the known factors, or even only asingle
known factor. By trial and error, guided by the researchers insight, one then
attempts to add variables to the matrix which will introduce, apart from specifics, only one new factor. When the new factor is determined, a further

266

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

set of variables can be added introducing another new factor, the position
of which in turn becomes fixed by the earlier factors. In this way, starting
with one factor of known position, it should be possible, theoretically, by
successive additions to fix the rotation of amost complex multi-dimensional
factorization.
The principle of expected profiles: rotation to produce loading profiles
congruent with general researchers expectations
It is possible that on general psychological grounds, one could validly conclude that certain kinds of sources, e.g. traits, should manifest certain general forms of factor loading pattern in certain batteries of variables. One would
then rotate so that the maximum number of factors would give loading profiles, i.e. factor patterns of the kind required. According to this principle, one
would rotate to get profiles of loadings having arelationship to the nature of
the source traits as shown by the nature of the trait elements in which the
factor tends to appear most persistently.
The principle of parallel proportional profiles
This begins with the same general scientific principle of parsimony which
forms the premise for Thurstones simple structure, but arrives at adifferent
formulation of the meaning of the principle in the field of factor analysis.
The principle of parsimony should not demand which is the simplest set of
factors for reproducing particular correlation matrix? but rather which set
of factors will be most parsimonious at once with respect to this and other
matrices considered together?. This parsimony must show itself especially
when the correlations emanate from many diverse fields of observation.
The criterion is then no longer that the rotation shall offer the fewest factor
loadings for any one matrix, but that it shall offer the fewest dissimilar (and
therefore the fewest total) loadings in all the matrices together.
Higher-order factors
Sometimes when we fit the oblique factors, we need to further investigate
model, by using the exploratory factor analysis once again. In the oblique
structures, we may, for example, obtain higher-order factors (common generic properties of two or more given distinct generic properties) to the correlations between the factors. Such abeing case, there can be more than one
level of analysis.
The correlated factors which explain the correlations of the observed variables are called the first-order or primary factors, which explain the correlations between primary factors by second-order factors, with oblique simple

Variance decomposition, matrix of correlation and factor loadings

267

General
factor

Factor 1

Factor 2

O.V.1 O.V.2 O.V.3 O.V.4 O.V.5 O.V.6


E1

E2

E3

E4

E5

E6

Legend: in square, O.V. represents i-th observed variable and E in circles denotes i-th
error of the measured variable.

Figure 27. A patch diagram for higher-order factor analysis model


Source: Bollen 1989b, p.315.

structure. Such aprocess could, in theory, be continued until the correlations between the highest-order factors reach zero (that is, where we obtain
the orthogonal simple structure) or until the number of higher-order factors
is two or less so their correlations cannot be non-trivially explained by common factors. In general, factors influencing the observed variables may be
influenced by the other factors that need have direct effects on the observed
variables (this situation is presented on Figure 27).
The potential for the relations between higher- and lower-order factors
has long been recognized by Thurstone [1947]. Gerbing and Anderson
[1984] argued that the failure to consider higher-order factors may explain
the correlated errors that are common in CFA models. For instance, we may
find correlated errors between series of tests that tap different dimensions of
the personal values.

Factor scores analysis


After finding an appropriate factors solution based on rotation, the researcher may calculate factor scores which are used for various purposes, such as to
serve as proxies for latent variables, and to determine the examinees relative

268

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

standing on the latent dimension. For studies directed toward the ordination of objects on the basis of composite variables we use plots of the factors
scores. And since composite variables are fewer in number than the observed variables, amore parsimonious classificatory scheme often emerges
[Jreskog and Reyment 1996].
Conceptually, afactor score is the score that would have been observed for
the examinee if it had been possible to measure the factor directly. In other
words, they are composite measures of each factor computed for each object.
Factor score represents the degree to which each individual scores high on
the group of items with high loadings on afactor. Thus, higher values on the
variables with high loadings on afactor will result in ahigher factor score43.
In applied research, factor scores are often computed by so-called coarse
factor scores, which are unweighted composites of the raw scores of observed
variables (e.g. averaging or summing up) found to have salient loadings on
the factor. However, there are many reasons why coarse factor scores may
poorly represent factors (e.g. they may be highly intercorrelated even when
the factors are truly orthogonal [Grice 2001].
Alternatively, factor scores can be estimated by multivariate methods that
use various aspects of the reduced or unreduced correlation matrix and factor analysis coefficients. The resulting values are called refined factor scores.
Afrequently used method of estimating these scores is Thurstones [1935]
least squares regression approach, although several other strategies have
been developed e.g. Bartlett [1937] and Anderson and Rubin [1956]44. In
the majority of instances, refined factor scores have less bias than coarse factor scores and thus are favored as proxies for factors [Harman 1976; Grice
2001].
Acomplicating issue in factor score estimation is the indeterminate nature of the common factor model. With respect to factor scores, this indeterminacy means that there are an infinite number of sets of factor scores
43
Key characteristic that differentiates a factor score from summated scale is that the factor score is computed based on the factor loadings of all variables on the factor, whereas the
summated scale is calculated by combining only selected variables. Therefore, although the
researcher is able to characterize a factor by the variable with the highest loadings, consideration must be given to the loadings of other variables, albeit lower, and their influence on the
factor score [Hair et al. 2010, p. 129].
44
Bartlett scores method estimates factor score coefficients, where the scores that are produced have a mean of 0. The sum of squares of the unique factors over the range of variables
is minimized, and Anderson-Rubins method is the modification of the Bartlett method which
ensures orthogonality of the estimated factors. The scores that are produced have a mean of 0,
have a standard deviation of 1, and are uncorrelated.

Variance decomposition, matrix of correlation and factor loadings

269

that could be computed from any given factor analysis that would be equally
consistent with the same factor loadings [Grice 2001]. The degree of indeterminacy depends on several aspects, such as the ratio of items to factors and
the size of the item communalities (e.g. factors defined by several items with
strong communalities have better determinacy). If ahigh degree of indeterminacy is present, the sets of factor scores can vary so widely such that an
individual ranked high on the dimension in one set may receive alow ranking on the basis of another set. In such scenarios, the researcher has no way
of discerning which set of scores is most accurate. Thus, although typically
neglected in applied factor analytic research, the degree of factor score indeterminacy should be examined as part of EFA, especially in instances when
factor scores are to be computed for use in subsequent statistical analyses.
Grice [2001] has specified three criteria for evaluating the quality of factor scores, namely:
validity coefficients correlations between the factor score estimates
and their respective factor scores,
univocality the extent to which the factor scores are excessively or
insufficiently correlated with other factors in the same analysis,
correlational accuracy how closely the correlations among factor
scores correspond to the correlations among the factors45.
Direct factor scores and regression estimation
The solution for the factor scores matrix is relatively simple and straightforward. If we start with the basic (for standardized variables) factor equation
Z = FT, the F is derived from [Jreskog and Reyment 1996, p. 223]:

F = Z(T)1.

(6.79)

This assumes that is asquare, non-singular matrix and contains as many


factors as variables. Such asituation might arise in using principal components and retaining all the factors. When fewer factors than variables are
used, the solution is slightly more complicated. If there are k factors and
pvariables, we approximately assume that V is small, as aresult:
Z(n p) = F(nk ) (Tk p),

(6.80)

45
For instance, Gorsuch [1997] has recommended that validity coefficients should be at
least 0.80 although higher values (e.g. greater than 0.90) may be required in some situations
(e.g. when factor scores are used as dependent variables).

270

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

and postmultiplying it by we obtain:


Z = FT. 

(6.81)

Finally, after postmultiplication by (T)1, we have:


F = Z(T)1. 

(6.82)

The original observations, the elements of Z are used only as approximations


to the values necessary to compute the exact factor scores in terms of partial
variance. The factor scores, in this case, are approximations to the factor
measurements in the reduced space delineated by k factors [Jreskog and
Reyment 1996].
Although the above direct method yields generally favorable results for
principal components, it is not always appropriate for factor analysis. Once
again, consider the basic equation for factor score, however assuming that V
is not necessarily small:

Z = FT + V,

(6.83)

where F and are matrices concerned with the common part of Z whereas
V involves the unique parts of the variables in Z.
In the general case, the common factor score matrix F will have k columns, whereas V will have p columns, one for each of the observed variables.
Therefore, there is actually atotal of p + k factor scores to estime, not justk. In
this situation, aregression model may be utilized to obtain e.g. aleast squares
fit of F to the data. If we let F be the matrix of estimated common factor
scores, the usual regression becomes [Jreskog and Reyment 1996, p. 224]:
F(nk ) = Z(n p)Q(Tpk ),

(6.84)

where Z is the standardized data matrix and Q is amatrix of regression coefficients. The n by k matrix of factor scores, F contains elements Fnj that indicate the amount of factor j in object n. The factor scores are interpreted in
the same way as observations on any variable.
Premultiplication of both sides (6.84) by ZT and division by n, yields:
1 T
1 T
Z F = Z ZQ,
n
n

(6.85)

Variance decomposition, matrix of correlation and factor loadings

271

since the term (1/ n)Z TF is ap by k correlation matrix of observed variables


with common factors. These correlations are, however, given by the factor
loadings matrix or B for unrotated and rotated orthogonal factors, respectively, and by the primary structure matrix Sp for oblique factors. Thus, we
may write = RQ and
Q = R1,

(6.86)

substituing (6.86) into (6.84) yields the final solution:


F = ZR 1.

(6.87)

Sample size and soundness of observed variables


Methodologists have proposed ahost of rough guidelines for estimating an
adequate sample size for an EFA. Most of these guidelines involve determining sample size based on the number of measured variables included in
the analysis, that is, with more measured variables requiring larger sample
sizes. Sometimes such guidelines also specify a minimum sample size regardless of the number of measured variables. Unfortunately, there are serious drawbacks to such guidelines. One is that these recommendations vary
dramatically. For instance, Gorsuch [1974] suggested aratio of five examinees per measured variable and that the sample size should never be less
than 100. In contrast, Nunnally [1978] and Everitt [1975] proposed ratio
of 10 to 1. The primary limitation of such guideline is that adequate sample size is not a function of the number of measured variables per se but
is instead influenced by the extent to which factors are overdetermined and
the level of the communalities of the observable variables. When each factor
is overdetermined (i.e. at least three or four measured observable variables
represent each factor) and the communalities are high (i.e. an average of 0.70
or higher)46 accurate estimates of population parameters can be obtained
with samples as small as 100 [MacCallum et al. 1999]. Under more moder 46

By custom, item communalities are considered high if they are 0.8 or greater [Velicer
and Fava 1998] but this is unlikely to occur in real data. More common magnitudes in the
social sciences are low to moderate communalities of 0.40 to 0.70. If an item has a communality of less than 0.40, it may either not be related to the other items, or suggest an additional
factor that should be explored. The researcher should consider why that item was included in
the data and decide whether to drop it or add similar items for future research.

272

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

ate conditions asample size of at least 200 might be needed. However, when
these conditions are poor, it is possible that samples as large as 400 to 800
might not be sufficient.
The strict rules regarding sample size for common exploratory factor
analysis have also revealed that adequate sample size is partly determined
by the nature of the data [Fabrigar et al. 1999; MacCallum et al. 1999]. The
stronger the data is, the smaller the sample can be for an accurate analysis. Strong data in factor analysis means uniformly high communalities
without cross-loadings, plus several observable variables loading strongly on
each factor. In practice these conditions are rare [Mulaik 1990; Widaman
1993]. Hence if some problems emerge in the data, alarger sample can help
determine whether or not the factor structure and individual items are valid.
It is also worth noting that obtaining parameter estimates that closely
approximate population values is only one criterion which might be considered when we determine sample size. In some situations, additional concerns might also play arole. Most notably, when FA involves the testing of
formal hypotheses regarding model fit or parameter estimates (as is sometimes done in maximum likelihood ML), statistical power might also be
considered. A researcher could specify ahypothesis of interest, adesired level of power, and an assumed population value for model fit. The sample size
necessary to achieve these objectives are then calculated (see for example the
work of MacCallum, Browne and Sugawara [1996]).
Another important aspect in FA is the soundness of the selected observable variables in the study [Cattell 1978]. If researcher inadequately samples
measured variables from the domain of interest, he or she may fail to uncover
important factors. In fact, we miss the point when developing appropriate
latent variables associated with the measured theoretical construct. If variables (irrelevant to the domain of interest) are included, then spurious factors might emerge or true factors might be obscured. Therefore, we should
carefully define the domain of interest and specify sound guidelines for the
selection of observable variables.
Research suggests that FA procedures provide more accurate results when
each factor is represented by multiple measured variables in the analysis (i.e.
when factors are overdetermined) [MacCallum et al. 1999; Velicer and
Fava 1998]. Usually, it is recommended that at least three to five observed
variables should represent each factor in astudy [MacCallum et al. 1999].
Thus, when we design astudy, we need to consider the nature and number of
factors, which might emerge. Alternatively, in cases in which there is little or
no basis to anticipate the number and nature of factors, we should attempt

Variance decomposition, matrix of correlation and factor loadings

273

to delineate as comprehensively as possible the population of observed variables for the domain of interest. In this case, one should include in the study
asample of these measured variables that is as large as feasible [Cattell 1978].
Selection of observed variables requires consideration of good psychometric properties of measures. When FA is conducted on observable variables with low communalities (i.e., variables for which the common factors
explain little common variance), substantial distortion in results can occur
[MacCallum et al. 1999; Velicer and Fava 1998]. There are reasons why communalities for observed variables might be low. One reason is low reliability,
and as it can be explained, variance due to random error cannot, by definition, be explained by common factors. Due to this fact, variables with low
reliability will have low communalities and thus should be avoided.
A second reason why an observed variable might have alow communality is that the variable is unrelated to the domain of interest and thus, shares
little in common with other measured variables in that domain. Therefore,
to the extent such information is available, aresearcher should consider the
validity (e.g. face validity, convergent validity) of measured variables when
selecting items to the analysis.

Interpretation of factors and factor indeterminacy


Guttman [1955] in his work pointed out on the implication of factor indeterminacy for the interpretation of factors. In his opinion, such interpretations
are usually based upon the factor loadings produced by the factor analysis,
but these are not sufficient to determine uniquely the observed variables corresponding to the factors. For Guttman the fact that two distinct but legitimate interpretations of afactor could correspond to observed variables that
were partially opposed to one another was absurd. Something, he believed,
was fundamentally wrong with the model of common factor analysis47.
Mulaik and McDonald [1978], while accepting the indeterminacy of factor interpretation (especially in exploratory analysis), have not regarded this
as afatal flaw requiring the abandonment of the common factor model in
favor of other models. They thought factor indeterminacy is just aspecial
case of a more general form of indeterminacy commonly encountered in
science, known to philosophers of science, as the empirical underdetermination of theory-data. Inductive methods of generating theory always contain
47

He urged factor analysts to look at models that were free of indeterminacy [Guttman
1953].

274

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

an indeterminate element. Those who fail to recognize the indeterminacy of


induction frequently do so because they are victims of the belief that one can
make inductive inferences uniquely and unambiguously from data without
making prior assumptions [Chomsky and Fodor 1980].
In science, concepts are not uniquely determined by the data of experience. Science is sometimes concerned with generalizing beyond the particulars of experience. But there is no unique way to do this in connection
with agiven set of particulars. When trying to confine the basis for making a generalization from experience to specific, determinate phenomena
already observed and defined (as some do by urging, for example, the use of
specific linear combinations of aset of observed variables to stand for what
is common to them), may actually get in the way of formulating creative
generalizations that go beyond what is already known or observed but which
nevertheless are eventually tied to experience by the efforts to test such generalizations empirically with extensions to new data. The implication of this
is that determinate models like component analysis may not have as many
scientific virtues as do indeterminate models like factor analysis [Mulaik and
McDonald 1978]. Still, resolving the indeterminacy in assigning meaning to
the common factors is no different from resolving the indeterminacy that
exists when one tries to see other things in the forms of clouds, or make
sense out of a novel situation. If there is to be a meaning seen at all, one
must project or impose it. That means fixing upon one of several alternative
courses of action to take and proceeding to test the consequences of this
course of action. In factor analysis this involves the problems in formulating
an appropriate interpretation for the factors.
The interpretation could be e.g. ahypothesis put forth to account for the
values of the factor structure and possible interfactor correlation coefficients
obtained in a factor analysis of the observed variables. One may identify
the common and unique factors of the analysis with existing variables, and
one may believe that these have same correlations among themselves and
with the observed variables. But the indeterminacy of the common factor
model means that variables in the world answering to this description are
not uniquely determined by this description. So, whatever interpretation
one gives to the factors need not be unique. Other researchers may form
other equally viable interpretations.
But there is away to go on, that takes something of value from performing the factor analysis. One can proceed conditionally, on aparticular interpretation for the factors to formulate testable hypotheses involving the
original observed variables and additional variables. For example, one might

Variance decomposition, matrix of correlation and factor loadings

275

assert that these variables in the world stand in the same relationship to the
observed variables as does acommon factor to the observed variables in factor
analysis. Such abeing case, one must identify this putative common factor
variable in some way independently of the factor analysis one has just performed. For example, it will not do to identify acommon factor as whatever
variable is common to these tests, and then to point to the observed variables
of the factor analysis giving rise to the factor in the first place. By doing
this we return only to the impasse. Rather, one must come up with: 1) some
specific measured variable not included in the original analysis, or 2) define
it as whatever is common not just to the original observed variables but to
some other larger set of measurable variables that may include the original
observed variables. After that, one is in aposition to test ones interpretation.
There are two cases that may occur. In the first case, one finds the correlations between the putative factor-variable and the observed variables. If they
do not correspond to the original factor structure loadings, then the interpretation that the chosen variable stands in the relation to the observed variables (as does acommon factor of the analysis) is wrong. In the second case,
if one includes, with the original observed variables, additional variables
from the larger set of variables, one should be able to identify acommon factor on which the original variables have their original pattern loadings and
the additional variables also have non-zero loadings with expected signs on
this common factor [Mulaik and McDonald 1978]. If one does not find these
predicted results, then something is wrong, either with the interpretation,
or the manner of constructing the additional tests, or both. If, as aresult of
these tests of the interpretation of afactor, one does not end up rejecting the
interpretation, then one may want to go on with this interpretation of the
factor as auseful basis for further research. But at this point one will have to
make avery subtle shift from trying to answer essentially meaningless question, like what are the common factors of these variables, to asking questions,
like if these are the only common linear causal factors standing in this particular relationship to these observed variables, then do these observed variables
exhibit patterns of correlation consistent with hypothesis. However, at this
point, one has actually moved beyond doing just an exploratory analysis to
confirmatory analysis (discussed in next section), where one may look back
on the original variables and subjects provoking series of hypotheses. From
this perspective, one may also see a way to select observed variables, and
study hypotheses of causal relations between the now-defined factor of one
set of variables and the defined factors of other variables, say, in astructural
equations modeling SEM.

276

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

Confirmatory factor analysis model CFA


Model of CFA
The CFA is strongly rooted in the analytical approaches where aprocedure of
explaining many variables with simultaneous parameters estimation in aset
of mutually correlated equations is considered. The CFA is part of the larger
family of methods known as structural equation modeling (SEM)48 which
play an essential role in the composition of measurement model through
paths analysis [Brown 2006; MacCallum and Austin 2000]. However, when
one is conducting SEM, each of the measurement models is evaluated separately before the next step is undertaken in reference to the structural model.
As Thompson [2004, p.110] noticed: it makes little sense to relate any constructs within an SEM model if the factors specified as part of the model are
not worthy of further attention.
In the CFA model, atheoretical structure is specified and tested for its
degree of correspondence (or fit) with the observed covariances among the
items in the factor(s).
Usually, aprocess of CFA construction runs through the following stages:
model specification,
model identification,
parameters estimation,
data fit analysis,
optional modification of the model.
In CFA we identify factors that account for the variation and covariation among aset of indicators. The CFA is based on common factor model
assumptions. However, while EFA is generally adescriptive or exploratory
procedure, in CFA the researcher must prespecify all aspects of the factor
48
Generalized structural equations model assumes the multi-equation linear regression
model often termed as path analysis and confirmatory factor analysis (as measurement model).
The most famous SEM mode was invented by Jreskog and Srbom [1993] LISREL (LInear
Structural RELationships). Its name was taken after popularized computer software.
Latent variables in SEM are called either exogenous or endogenous. An exogenous variable
is a variable that is not caused by other variables in the solution. Conversely, an endogenous
variable is caused by one or more variables in the model (e.g. other variables in the solution
exert direct effects on the variable). Thus, exogenous variables can be viewed as synonymous
to X independent, or predictor (causal) variables. Similarly, endogenous variables are equivalent to Y dependent, or criterion (outcome) variables. In structural models, an endogenous
variable may be the cause of another endogenous variable.

Confirmatory factor analysis model CFA

277

model, e.g. the number of factors, pattern of indicator, factor loadings, and
so forth [Brown 2006]. In that model, one may constrain certain parameters to mathematically permissible values (e.g. a variance may be constrained to equal any positive number, acorrelation may be constrained to
equal 1, +1, or any number in between) [Thompson 2004].
In the process of model construction, we follow rules that are strictly based
on operational approach to measurement, such as [Hoyle 2000; Kozyra 2004]:
definition of the theoretical construct, under investigation, e.g. personal values measuring hedonism,
dimensions identification of the construct, where each extracted dimension is defined by one measured factor,
precision of relationships put in the line of observed and factors; this
relationships express on how and to what extent, latent variables can
be measured through the agency of observed variables; in order to exclude the considerable influence of errors in measurement, there is advised to use more than one observable variable at one factor.
Due to the fact that CFA model is based on linear relationship among
the observable variables and common factors along with specific and measurement errors, hence it is expressed as follows [Sztemberg-Lewandowska
2008]:
Xi =

ij j + i

(i = 1, , p),

(6.88)

j =1

where j for j = 1, , k denotes factors or latent variables49.


Once again, aunique component i represents specific error and the remaining random error in observed variable Xi. Both elements are errors in
Xi with respect to measured factor j and both are uncorrelated with j and
with each other.
Lets now assume the CFA model is based on four variables X1, X2, X3, X4
and two factors 1, 2 as follows:
X1 = 111 + 1 ,

49

X2 = 211 + 2 ,

X3 = 32 2 + 3 ,
X 4 = 42 2 + 4 .

(6.89)

Here in CFA model as it can be noted, we will use different notation for the factor as opposed to EFA model.

278

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

In matrix notation, the same CFA model will be expressed as:


X = X + ,

(6.90)

where:
X avector (41) of observable variables,
X avector (42) of factor loadings,
avector (21) of latent variables factors,
a vector (41) of measurement errors and specific errors due to
unique variables.
Model (6.90) is defined in the context of factor loadings, relationships
between factors in variance-covariance matrix and relationships based on
measurement and specific errors in variance-covariance matrix . As aresult (in matrix notation) afour-variable model is the following:
X1 11

X2
21
=
X3 0

X 4 0


1 + 2 ,

32 2 3


42
4
0
0

(6.91)

where factor 1 is described by observed variables X1, X2 and factor 2 is


described by X3, X4.
This model must be supplemented with two matrices: 1) (22) variance-covariance matrix of the latent variables j:
2
= 1

12


22

(6.92)

and 2) (44) variance-covariance matrix of the errors for respective variables:


2

0 2

2
,
=
(6.93)
0

0 23

2
0

0
0

where error variances are placed on diagonal, and error covariances are
equal zero, which means that errors remain uncorrelated.

Confirmatory factor analysis model CFA

279

Additionally, in CFA model we need to assume that [Sztemberg-Lewandowska 2008]:


1) expected values of observed variables and factors are equal zero,
E(X) = E(F) = 0,
2) relationships between observed variables xi and factors j remain linear;
model identification, estimation of goodness of data fit to model and estimation of parameters, is based on discrepancies between the variance
covariance matrix of observed variables and variance-covariance
matrix for hypothetical model ()50, where denotes vector containing
free parameters of the CFA model,
3) besides measurement errors :
have the expected value equal E() = 0 and stable variance for all observations,
are independent but in practice they may be also correlated,
are not correlated with factors, E(T) = E(T) = 0.
If we have the covariance matrix for X variables and the covariance
matrix for X is expected value of XXT, where X = X + , that is:
= E( T ) = E ( X + )( X + )T =

T
T
T
= E( X + X ) + E( X ) + E( T TX ) + E(T ) =
= X E( T ) TX + X E(T ) + E( T ) + E(T ) =
= X E( T ) TX + E(T ), (6.94)

where E(T) = and E(T) = , then covariance matrix of X will be


decomposed in terms of the elements in X, and respectively:
= XXT + .


50

(6.95)

Browne and Cudeck [1993] explained there are three general assumptions in reference
to dicrepancy in CFA models:
Firstly, there is function refered to as the discrepancy due to approximation, which measures the lack of fit of the model to the population covariance matrix.
Secondly, there is the discrepancy due to estimation, defined as function measuring the
difference between the model fit to the sample covariance matrix and the model fit to the
population covariance matrix.
Finally, there appears discrepancy due to overall error, defined as function which measures
the difference between the elements of the population covariance matrix and the model fit
to the sample covariance matrix.

280

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

This formula (describing covariance and variance matrix) is implied by hypothetical model and for agiven model it depends on the vector :

() = XXT + .

(6.96)

Thus, CFA model is defined with respective parameters of the model in


matrices X, and . And there are three types of the parameters to be
estimated in accordance to following matrices X, and : free (unknown)
parameters, constrained (unknown) parameters, and assigned (known) parameters.

CFA unstandardized vs. standardized solution


and covariancemean structure
The CFA may include an unstandardized (parameter estimates are expressed in the original metrics of the indicators) or standardized solution.
In CFA, instead of using acorrelation matrix (i.e. acorrelation matrix that
is acompletely standardized variance-covariance matrix), one may consider
a variance-covariance matrix (needed to produce an unstandardized CFA
solution) or raw data that is used by the software to produce an input variance-covariance matrix.
Of particular note here is the fact that many CFA models are based on unstandardized estimates, such as the standard errors and significance testing
of model parameters. Various forms of measurement invariance (e.g. equivalence of parameter estimates within and across groups further discussed)
are also based on the unstandardized solution [Loehlin 2004]. In contrast,
EFA focuses rather on standardized values.
Methodologists in the applied CFA research often express astrong preference for reporting unstandardized solutions because the analysis itself is
based on unstandardized variables, and completely standardized values are
potentially misleading. For instance, the true nature of the variance and relationships among indicators and factors can be masked when these variables
will be standardized. When the original metric of variables is expressed in
meaningful units, unstandardized estimates more clearly convey the importance or substantive significance of the effects [Loehlin 2004].
Now lets briefly discuss the issue of analysis of covariance structures or
mean structures in CFA model. In the former instance, parameters (related
to factor loadings, error/unique variances and covariances, factor variances
and covariances) are estimated to reproduce the input variance-covariance

Confirmatory factor analysis model CFA

281

matrix. As MacDonald [1985, p.98] noted: analysis of covariance structures


includes all the theory of common factor analysis and other applications that
are not usually thought of as common factor analysis. We might claim that
the analysis of covariance structures is to correlational statistics as the analysis of variance is to experimental statistics51. It allows the researcher to set
up and test very detailed hypotheses about the relations of the measures but
it requires detailed specification of the design of the study and conversion of
that design by mathematical analysis into aspecific hypothesis to fit the test.
Because the analysis of covariance structures is based on implicit assumption that indicators are measured as deviations from their means (all indicator means equal zero), thus, CFA model can be extended to the analysis
of mean structures, where CFA parameters reproduce the observed sample
means of indicators, which are included along the sample variances and
covariances as input data. Such CFA models include also parameter estimates of the indicator intercepts (predicted value of indicator when factor is
zero)52 and can be extended to the latent factor means, which are often used
in multi-group CFA models to test whether distinct groups differ in their
relative standing on certain latent dimensions. However this is not only the
case of multi-group comparisons. Here, even asingle group can be analyzed
with mean values and intercepts. For example, in panel data acontrast between the means of the same latent variable at two time points may reveal if
agroups average has shifted. If the same group of indicators is employed at
two or more time points, we can test whether the intercepts in CFA equation
model are stable over time. Finally, two latent variables that are distinct but
are assigned the same units can also be compared.

Scaling latent variable factor


In CFA model, every factor must have its scale identified. Due to the unobservable nature of latent variables and lack of defined metrics (units of
measurement) we need to set these units of measurement by ourselves. In
CFA, this is typically accomplished in one of two ways.
In the first and by far the most popular method, researcher fixes the metric
of the factor to be the same as one of its indicators. The indicator selected to
pass its metric onto the factor is often referred to as amarker or reference
51

It is a matter to combine hypotheses about means with hypotheses about covariances in


a single model for what is then called the analysis of moment structures.
52
Akin to multiple regression, an indicator intercept is interpreted as the predicted value
of the indicator when the latent factor is zero [Brown 2006].

282

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

indicator53. When areference indicator is specified, aportion of its sample


variance is actually passed onto the factor. This specification assigns to afactor ascale related to that of the explained (common, shared) variance of the
reference indicator (variable). In this method, we impose constraints based
on the unit loading identification criteria, which means for respective factor,
the unstandardized coefficient (loading) for the direct effect on any one of its
indicators is fixed to 1.0. However, specification of any other positive scaling
constant would do, but 1.0 is the default in most CFA software [Kline 2010].
In the second method, the variance of the factor is fixed to aspecific value, usually 1.0 (on the basis of unit variance identification criteria)54 and
consequently, astandardized factor is produced. And although the factors
have been standardized (i.e. their variances are fixed to 1.0), the fit of this
model is identical to the unstandardized model (i.e. models estimated using
marker indicators). While useful in some circumstances (e.g. as aparallel to
the traditional EFA model), this method is used less often than marker indicator approach. The former strategy produces an unstandardized solution
(in addition to acompletely standardized one), which is useful for several
purposes, such as already mentioned tests of measurement invariance across
groups and evaluations of scale reliability.

CFA model identification


Generally, amodel will be considered as identified, if all unknown parameters will be found, which means two vectors 1 and 2 imply similar variance-covariance matrix (1) = (2)55, and vectors are equal: 1 = 2. Thus,
the model is identified, if on the basis of known information (e.g. variances
and covariances in the input data) we obtain aset of parameter estimates for
each parameter in amodel whose values are unknown (e.g.factor loadings,
53
Assuming that scores on each multiple indicator of the same factor are equally reliable,
the choice of which indicator is to be the reference variable is generally arbitrary. One reason
is that the overall fit of the model to data is usually unaffected by the selection of reference
variables. Another is consistent with the domain sampling model, wherein effect (reflective)
indicators of the same factor are viewed as interchangeable. However, if indicator scores are
not equally reliable, then it makes sense to select the indicator with the most reliable scores as
the reference variable.
54
It is much more common to impose unit variance identification constraint.
55
The (1) and (2) are the implied covariance matrices evaluated at 1 and 2 so if two
sets of different values for the parameters lead to the same value for the implied covariance
matrices, then the model is identified.

Confirmatory factor analysis model CFA

283

factor covariances, etc.)56. The from aCFA model is identified when we


can show that the elements of are uniquely determined functions of the
elements of .
In literature, the following options of the model identification dominate
[Brown 2006].
In under-identified model, the number of unknown (freely estimated) parameters exceeds the number of known information (e.g. elements of the input
variance-covariance matrix). An under-identified model cannot be solved because there are an infinite number of parameter estimates that result in perfect
model fit. Clearly, without any restriction on the elements of X, and ,
the model is under-identified. Hence, in order to eliminate indeterminacy, one
constrains some of the parameters. One might set these parameters to be fixed,
known constraints, mostly zero. For example, in the hypothesis of uncorrelated errors of measurement constrains the off-diagonal elements of to
zero. Another yet type of constraint is to set parameters equal.
On the other hand, if the number of knowns would equal the number of
unknowns, the model would be just-identified. In just-identified models
there appears one unique set of parameter estimates that perfectly fit the
data. Unlike under-identified models, just-identified models can be solved.
In fact, because the number of knowns equals the number of unknowns, in
just-identified models, there exists asingle set of parameter estimates that
perfectly reproduce the input matrix.
Finally, amodel will be considered as over-identified if the number of
knowns (i.e. number of variances and covariances in the input matrix) exceeds the number of freely estimated model parameters.
In consequence of the above differences between the number of knowns
and the number of unknowns (i.e. freely estimated parameters) we obtain
different levels of degrees of freedom df in CFA models. In over-identified
model we have positive dfs, just-identified models have 0 df (because the
number of knowns equals the number of unknowns) and underidentified
models have negative dfs (cannot be solved or fit to the data).
Below a few examples of under-identified, just-identified and overidentified models are presented.
In case of under-identified model lets assume the following Equation:

X + Y = 7.

(6.97)

56
The parameter vectors: 1 and 2 are any specific values of . Bollen [1989b] assumed
that excludes imaginary numbers and improper values such as negative variances or undefined solutions.

284

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

In this instance, there are two unknowns (X and Y), and one known
(X + Y = 7). This model is under-identified, because the number of unknown
parameters (X and Y) exceeds the known information. Consequently, there
are an infinite number of pairs of values that X and Y could take on to solve
for X + Y = 7 (X = 1, Y = 6; X = 2, Y = 5, etc.).
Consider now the model depicted in Figure 28. For this model, the input
matrix would be comprised of three knowns (pieces of information): the
variances of X1 and X2, and the covariance of X1 and X2. The unknowns of the
CFA solution are the freely estimated model parameters. In Figure28 model,
there are four freely estimated parameters: two factor loadings X11 , X21 and
two indicator errors 1, 2. In this example, the metric of 1 was set by fixing its variance to 1.0. Thus, because the factor variance 11 was fixed, it is
not included in the count of unknowns. Alternatively, we may have opted
to define the metric of 1 by choosing either X1 or X2 to serve as amarker
indicator. In this case, the factor variance would contribute to the count of
freely estimated parameters, but the factor loading of the marker indicator
would not be included in this because it was fixed to pass its metric on to 1.
Regardless of which method is used to define the units of measurement
of1, the count of freely estimated parameters in Figure 28 equals 4. Thus,
the CFA model is under-identified because the number of unknowns (four

Input matrix (3 elements)

X11

X21

X1

X2

X1

X2

X1
11
X2
21
22
Freely estimated model parameters = 4 (e.g. two factor loadings, two error variances)

Figure 28. Under-identified CFA model (df = 1)


Source: Brown 2006.

285

Confirmatory factor analysis model CFA

freely estimated parameters) exceeds the number of knowns (three elements


of the input matrix = two variances, one covariance) [Brown 2006].
Incidentally, it would be possible to identify the Figure 28 model if additional constraints are imposed on the solution. For instance, the researcher
could add the restriction of constraining the factor loadings to equality. In
this case, the number of knowns (3) would equal the number of unknowns
(3) and the model would be just-identified [Brown 2006].
In just-identified models there exists one unique set of parameter estimates that perfectly fit the data. Before further applying this concept to CFA,
we need to consider the example from simultaneous Equations:
X + Y = 7,

(6.98)

3X Y = 1.

(6.99)

As observed, the number of unknowns (X, Y) equals the number of knowns


(X + Y = 7, 3X Y = 1). Through basic algebraic manipulation, it can be
readily determined that X = 2 and Y = 5, that is, there is only one possible
pair of values for X and Y.

1
X11

Input matrix (6 elements)

X21

X31

X1

X2

X3

X1

X2

X3

X1
11
X2
21
22
31
X3

32
33
Freely estimated model parameters = 6 (e.g. three factor loadings, three error variances)

Figure 29. Just-identified CFA model (df = 0)


Source: Brown 2006.

286

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

Now consider the CFA model in Figure 29. Here the input matrix consists
of six knowns (three variances, three covariances), and the model consists of
six freely estimated parameters: three factor loadings and three indicator errors (again we assume that the variance of 1 was fixed to 1.0). This CFA model is just-identified and would produce aunique set of parameter estimates
X11 , X21, X21, 1, 2, 3 that perfectly reproduce the correlations among X1,
X2, and X3. Thus, although just-identified CFA models can be fit to the sample
input matrix, goodness-of-model-fit evaluation does not apply because, by
nature, such solutions always have perfect fit. This is also why goodness of fit
does not apply to traditional statistical analyses such as multiple regression,
that is, these models are inherently just-identified [Brown 2006].
It is important to note that while aCFA model of aconstruct consisting
of three observed measures may meet the conditions of identification (as in
Figure 29), this is true only if the errors of the indicators are not correlated

1
X11

Input matrix (6 elements)

X21

X31

X1

X2

X3

X1

X2

X3

X1
11
X2
21
22
31
X3

32
33
Freely estimated model parameters = 7 (e.g. three factor loadings, three error
variances, one error covariance)

Figure 30. Under-identified CFA model (df = 1)


Source: Brown 2006.

287

Confirmatory factor analysis model CFA

with each other. For instance, the model depicted in Figure 30 is identical
to Figure 29, with the exception of acorrelated error between indicators X2
andX3. This additional parameter now brings the count of freely estimated
parameters to 7, which exceeds the number of elements of the input variance-covariance matrix (6). Thus, the Figure 30 model is under-identified
and cannot be fit to the sample data.
Another model will be over-identified if the number of knowns (i.e. number of variances and covariances in the input matrix) exceeds the number of
freely estimated model parameters. For example, the one-factor model depicted in Figure 31 model is structurally over-identified because there are
10 elements of the input matrix (four variances for X1 X4, six covariances),
but only eight freely estimated parameters (four factor loadings, four error
variances and the variance of 1 is fixed to 1.0). The difference in the number
of knowns (b) and the number of unknowns (a; i.e. freely estimated param-

1
X11

X41

X31

X21
X1

X2

X3

X4

Input matrix (10 elements)

X1
X1

X2

X3

X4

11

X2

21

22

X3

31

32

33

X4

41

42

43

44

Freely estimated model parameters = 8 (e.g. four factor loadings, four error variances)

Figure 31. Over-identified CFA model (df = 2)


Source: Brown 2006.

288

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

eters) constitutes the models degrees of freedom df. Thus, the Figure31 presenting a model is over-identified with df = 2 [Brown 2006]:
df = b a = 10 8 = 2.

(6.100)

Next model (see Figure 32) is also over-identified with df = 1 (assuming


that 1 and 2 have a non-zero correlation). In this solution, there are 10
elements of the input matrix. However, this model consists of nine freely
estimated parameters (four factor loadings, four error variances, one factor
covariance), thus resulting in one df.
An important aspect of over-identified solutions is that goodness-of-fit
evaluation applies, specifically, how well the model is able to reproduce the
input variances and covariances (i.e. the input matrix) using alesser number of unknowns (i.e. freely estimated model parameters). Thus, as in just
21 0

X11

X31

X21

X41

X1

X2

X3

X4

Input matrix (10 elements)

X1
X1

X2

X3

X4

11

X2

21

22

X3

31

32

33

X4

41

42

43

44

Freely estimated model parameters = 9 (e.g. four factor loadings, four error variances,
one factor covariance)

Figure 32. Over-identified CFA model (df = 1)


Source: Brown 2006.

289

Confirmatory factor analysis model CFA

identified models, the available known information indicates that there is


one best value for each freely estimated parameter in the over-identified
solution. Unlike just-identified models, over-identified models rarely fit the
data perfectly (i.e. aperfectly fitting model is one whose parameter estimates
recreate the input variance covariance matrix exactly) [Brown 2006].
Specification of amodel to have at least zero dfs is anecessary but not
sufficient condition for identification. Figure 33 illustrates example of empirically under-identified solutions [Kenny 1979].
In an empirically under-identified solution, the model is statistically justor over-identified, but aspects of the input matrix prevent the analysis from
obtaining aunique and valid set of parameter estimates. The most obvious
example of empirical under-identification would be the case where all covariances in the input matrix equal 0. However, empirical under-identification
can result from other patterns of (non)relationships in the input data. For
example, Figure 33 depicts a model that is identical to the just-identified
model presented in Figure 29, yet its input matrix reveals an absence of

1
X11

X21

X31 = 0

X1

X2

X3

Input matrix (6 elements)

X1
X1

X2

21

X3

X2

X3

11

31
31

=0

22
32

33

32

=0

Freely estimated model parameters = 6 (e.g. three factor loadings, three error variances)

Figure 33. Empirically under-identified CFA model (df = 0)


Source: Brown 2006.

290

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

a relationship between X3 and X1 and X2. This aspect of the input matrix
would render the Figure 33 model functionally equivalent to the underidentified solution presented in Figure 28.
Similarly, Figure 34 presents amodel that is identical to the over-identified model in Figure 32. However, because the factor correlation is 0 (due to
alack of relationship of X1 and X2 with X3 and X4), this solution would be
analogous to simultaneously attempting to estimate the Figure 28 solutions
(under-identified). For these and other reasons (e.g. increased power and
precision of parameter estimates, methodologists [Marsh et al. 1998] recommended aminimum of three indicators for each latent factor to avoid possible source of under-identification.
21 = 0

1
X11

2
X31

X21

X41

X1

X2

X3

X4

Input matrix (10 elements)

X2

X1

X3

X4

33
43

44

11

X1

X2

21

22

X3
X4

31

*
* 41

* 32
* 42

* 31 = 0

* 32 = 0

* 41 = 0

41

* = 0
Freely estimated model parameters = 7 (e.g., three factor loadings, three error
variances, one error covariance)

Figure 34. Empirically under-identified CFA model (df = 1)


Source: Brown 2006.

Confirmatory factor analysis model CFA

291

t rule, three- and two-indicator rule of the model identification


Finally, we present the most popular rules which can be applied in the process of appropriate model identification. They are as follows [Bollen 1989b]57:
trule, the three-indicator rule and two-indicator rules. For all rules, we
assume that each latent variable is assigned ascale, otherwise, identification
will be not possible.
In t-rule we have () = X TX + , where X is a matrix p q
1
which means pq elements, has q(q + 1) non-redundant parameters,
2
1
and has p( p+ 1) unique parameters. Thus, () is decomposed into
2
1
1
1
pq + p( p + 1) + q(q + 1) parameters. With p variables in X, has p( p+ 1)
2
2
2
known to be identified elements. If we know none of the parametres in X,
1
1
or then it is impossible to solve for pq + p( p + 1) + q(q + 1) elements
2
2
1
in using only the p( p+ 1) known elements in the covariance matrix of X.
2
1
The t-rule requires that t p( p+ 1), where t is the number of free pa2
rameters in . In other words, the number of free parameters t must be less
than or equal to the number of unique elements in the covariance matrix of
X. The t-rule is necessary but not sufficient condition for model identification.
Another rule is based on the three-indicator approach. For example,
in order to identify aone-factor model we need to have at least three indicators with non-zero loadings and diagonal. With more than three
indicators, the unifactor model is over-identified. And amultifactor model
is identified when it has: three or more indicators per factor each row of
X, is with one and only one non-zero elements, and adiagonal . There
are no restrictions on .
Finally, two-indicators rule is an alternative sufficient condition for
measurement models with more than one factor. Like three-indicators rule,
is assumed to be diagonal. Each factor is scaled (e.g. one factor loading
set to 1 for each factor). Under these conditions, having two indicators per
factor is sufficient to identify the measurement model provided that the
57

However, in SEM literature there can be also found rank rule [Reilly 1995], side-by-side
rule [Reilly and OBrien 1996].

292

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

factor complexity of each X is one that there are no zero elements in .


Bollen [1989b] even generalized this rule by loosening the requirements
for . He distinguished four conditions that should be sufficient for model
identification:
each row of X has one and only one non-zero value,
there are at least two indicators per latent variable,
each row of has at least one non-zero off-diagonal element,
is diagonal.
A major difference from the first two-indicator rule is that some off-diagonal
elements of can be zero, but this rule applies.
Table 22 presents the above-described rules of identification. As observed,
even if the model is not identified, individual parameters or equations that
are part of the model may be identified. Third column summarizes the rule,
and the fourth and fifth indicates whether the rule provides anecessary or
sufficient condition for identification [Bollen 1989b].
Table 22. Number of factors at the defined level of observed variables
Identification rule

Identifies

Summary of rules

Necessary
condition

Sufficient
condition

t-rule

Model

1
t p( p+ 1)
2

Yes

No

Threeindicator
rule

Model

q1
one nonzero element per row of X,
three or more indicators per factor,
diagonal

No

Yes

Two-indicator rules
Rule 1

Model

q>1
ij 0 for all iand j
one non-zero element per row of X,
two or more indicators per factor,
diagonal

No

Yes

Rule 2

Model

q>1
ij 0 for at least one pair of iand j,
where i j
one non-zero element per row of X,
two or more indicators per factor,
diagonal

No

Yes

Source: Bollen 1989b, p. 247.

Confirmatory factor analysis model CFA

293

CFA fitting function and methods of estimation


The CFA involves at least afew testing approaches to fit of models in the
data. The process that entails afitting function is the mathematical operation
that minimizes, for example, the difference between and S. Most frequently there is used the FML (ML maximum likelihood function)58 which is the
determinant and trace summarizing important information about matrices
such as S and 59. The objective of ML is to minimize the differences between these matrix summaries (i.e. the determinant and trace) for S and .
Maximum likelihood function in similar way, as it was in common exploratory factor model, has several requirements that render it an unsuitable
estimator in some circumstances. As compared with some other estimators,
ML is more prone to Heywood cases and is more likely to produce markedly
distorted solutions if minor misspecifications have been made to the model.
Just to remind, key assumptions of ML are that the: 1) sample size should be
large, 2) observed variables should have been measured on continuous scales
(i.e. approximate interval-level data), 3) distribution of the indicators should
be multivariate normal.
Although the actual parameter estimates (e.g. factor loadings) may not be
affected, non-normality in ML analysis can result in biased standard errors
(and hence faulty significance tests) and apoorly behaved 2 test of overall
model fit. If non-normality is extreme (e.g. marked floor effects, as would
occur if most of the sample responded to items using the lowest response
choice such as 0 on a010 scale), then ML will produce incorrect parameter
estimates, that is, the assumption of alinear model is invalid. So, in case of
non-normal and continuous indicators, it is better to use estimator such as
maximum likelihood mean (MLM) estimation with robust standard errors and 2 [Bentler 1995] which provides the same parameter estimates
as ML, but both the model 2 and standard errors of the parameter esti 58

However, there can be applied fitting functions based on generalizes least squares GLS,
asymptotic distribution free ADF functions, and many others (see further description). The
principle of ML estimation in CFA is to find the model parameter estimates that maximize
the probability of observing the available data if the data were collected from the same population again. In other words, ML aims to find the parameter values that make the observed
data most likely, or conversely, maximize the likelihood of the parameters given the data.
59
The determinant is asingle number (i.e. ascalar) that reflects ageneralized measure of
variance for the entire set of variables contained in the matrix. The trace of amatrix is the sum
of values on the diagonal (e.g. in avariance-covariance matrix, the trace is the sum of variances).

294

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

mates are corrected for non-normality in large samples. The MLM produces
the Satorra-Bentler scaled (mean adjusted) 2 in which the typical normal
theory 2 is divided by ascaling correction to better approximate 2 under
non-normality60.
The problem with ML or similar method estimation GLS (generalized
least squares), in the social sciences research, is that the data has ararely
normal distribution shape. If one or more of the indicators is non-normal or
based on categorical responses, the normal theory using ML or GLS estimation cannot be used. More specifically it is due to the information in S which
will be incorrect. Hence, the estimates based on S will be also incorrect.
In the process of estimation, the variation in the measured variables is
completely summarized by the sample covariances only when multivariate
normality is present. If it is violated, the variation of measured variables will
not be completely summarized by the sample covariances, that is, information
from higher-order moments is needed. In this situation S1 is no longer the
correct estimator the matrix representing weights, tr trace operator, which
takes the sum of the elements on the main diagonal of the matrix. The parameter estimates do remain unbiased and consistent, i.e. as sample size grows
larger, converges to but they are no longer efficient. This suggests that
theoretically two important problems occur with normal theory estimators
(when we apply ML, GLS) that is, when the observed variables do not have
amultivariate normal distribution, the 2 goodness-of-fit test is not expected
to produce an accurate assessment of fit, rejecting too many (>5%) true models. Tests of all parameter estimates are expected to be biased, yielding too
many significant results [West, Finch and Curran 1995]. In this instance, the
estimator such as weighted least squares (WLS) is more appropriate61.
The weighted least squares approach should be used for non-normal,
continuous data, although MLM is often preferred, given its ability to outperform WLS in small and medium-sized samples [Hu, Bentler and Kano
1992; Curran, West and Finch 1996]. The WLS method has been developed
for estimating aweight matrix based on the asymptotic variances and covariances of polychoric correlations that can be used in conjunction with ama 60
For Satorra-Bentlers [Satorra and Bentler 1994] scaled statistic and also for the central
2 distribution (when we evaluate the CFA model), appears problem of the model rejection,
because of very large sample size. Even when discrepancy between estimated model and data
is very small, if the sample size is large enough, almost any model will be rejected because the
discrepancy is not statistically equal zero [Hu and Bentler 1995].
61
Some other yet known methods are: robust weighted least squares (e.g. RWLS) and
unweighted least squares (ULS) [Flora and Curran 2004].

Confirmatory factor analysis model CFA

295

trix of polychoric correlations in the estimation of SEM models [Browne


1982, 1984; Muthn 1984; Jreskog 1994; Muthn and Satorra 1995].
The WLS method applies the following fitting function:
T

FWLS = S () W 1 S (),

(6.101)

where:
S avectors sample statistics (e.g. polychoric correlations),
() the model-implied vector of population elements in (),
W apositive-definitive weight matrix.
Browne [1982] showed that if a consistent estimator of the asymptotic
covariance matrix of S is chosen for W, then FWLS leads to asymptotically
efficient parameter estimates and correct standard errors as well as a 2
distributed model test statistic. Furthermore, he presented solution for
estimating the correct asymptotic covariance matrix in the context of continuously distributed observed data using observed fourth-order moments.
Because this formula holds without specifying aparticular distribution for
the observed variables, FWLS is often called the asymptotically distribution
free (ADF) estimator when used with acorrect asymptotic covariance matrix. The ADF method makes the promising claim that the test statistics for
model fit are insensitive to the distribution of the observations when the
sample size is large [Hu and Bentler 1995, p.79].
Browne [1984] has primarily focused on WLS as it was applied to continuous but non-normal distributions, whereas Muthn [1983, 1984] presented
a continuous/categorical variable methodology (CVM) for estimating
CFA that allows any combination of dichotomous, ordered categorical or
continuous observed variables. With CVM, bivariate relationships among
ordinal observed variables are estimated with polychoric correlations, and
the CFA is fit using WLS estimation. The key contribution of CVM is that
it essentially generalized Brownes work with FWLS beyond the case of continuous observed data, as Muthn had described the estimation of the correct asymptotic covariance matrix among polychoric correlation estimates
[Muthn 1984; Muthn and Satorra 1995]. Thus, unlike normal-theory estimation, CVM provides asymptotically unbiased, consistent and efficient parameter estimates as well as acorrect chi-square test of fit with dichotomous
or ordinal observed variables62.
62
Parallel but independent research performed by Jreskog [1994] similarly generalized
Brownes work to the estimation of the correct asymptotic covariance matrix among polychoric
correlations.

296

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

The weighted least squares estimation has two potential limitations in


research applications of CFA with ordinal data. Firstly, limited prior simulation evidence suggests that the computation of polychoric correlations is
generally robust to violations of the latent normality assumption. Thus, although polychoric correlations may be generally unbiased, CFA model test
statistics and standard errors might be adversely affected because of biases
in the asymptotic covariance matrix introduced by non-normality among
latent response variables. Secondly, afrequent criticism against WLS estimation is that the dimensions of the optimal weight matrix W are typically
exceedingly large and increase rapidly as afunction of the number of indicators in amodel. By virtue of its size in the context of alarge model (i.e.
a model with many observed variables), W is often non-positive definite
and cannot be inverted when applying the WLS fitting function [Bentler
1995; West, Finch and Curran 1995]. Furthermore, calculation of asymptotic values requires alarge sample size to produce stable estimates. Specifically, Jreskog and Srbom [1996] suggested that aminimum sample size
should be available for estimation of W. As the elements of W have substantial sampling variability when based on small sample sizes, this instability has an accumulating effect as the number of indicators in the model
increases [Browne 1984].
On the other side, in reference to ADF, simulation studies have shown
that 2 test statistics are consistently inflated when ADF estimation is applied to sample product-moment covariance or correlation matrices of continuous observed data [Hu, Bentler and Kano 1992; Chou and Bentler 1995;
Curran, West and Finch 1996]. Similarly, simulation studies applying WLS
estimation to the analysis of polychoric correlation matrices have also reported inflated 2 test statistics [Dolan 1994; Hutchinson and Olmos 1998]
and negatively biased standard error estimates. In particular, Potthast [1993]
reported that these problems worsen as afunction of increasing model size
(i.e. number of indicators) and decreasing sample size.
To address the problems encountered when using WLS with small to
moderate sample sizes, Muthn, du Toit and Spisic [1997] introduced yet
arobust WLS approach that is based on the work of Satorra et al. [Satorra
and Bentler 1990; Chou, Bentler and Satorra 1991; Satorra 1992]. With this
method, parameter estimates are obtained by substituting adiagonal matrix,
for W in FWLS function, the elements of which are the asymptotic variances
of the thresholds and polychoric correlation estimates (i.e. the diagonal elements of the original weight matrix). Once avector of parameter estimates is
obtained, arobust asymptotic covariance matrix is used to obtain parameter

Confirmatory factor analysis model CFA

297

standard errors. Calculation of this matrix involves the full weight matrixW,
however, it need not be inverted. Next, Muthn, du Toit and Spisic [1997] described arobust goodness-of-fit test via calculation of amean- and variance
adjusted 2 test statistic. Calculation of this test statistic also involves the
full weight matrix W but similarly avoids inversion. An interesting aspect
of robust WLS method is that the value for the model degrees of freedom
is estimated from the empirical data, in amanner inspired by Satterthwaite
[1941] rather than being determined directly from the specification of the
model. The robust goodness-of-fit test presented by Muthn [1997] essentially involved the usual 2 test statistic multiplied by an adjustment akin
to the Satorra and Bentler [1986, 1988] robust 2 test statistic, with model
degrees of freedom estimated from the data.

CFA model evaluation selected fit indices


The acceptability of the model can be evaluated on the basis of three major
aspects [Brown 2006]:
overall-absolute goodness of model fit63,
the presence or absence of localized areas of strain in the solution (i.e.
specific points of ill fit)64,
the interpretability and statistical significance of the model parameter
estimates.
Brown [2006] argued that a common mistake in applied CFA research
committed by many researchers is when they decide to evaluate models exclusively on the basis of overall goodness of fit. That is, descriptive fit indices
are best viewed as providing information on the extent of amodels lack of
fit, and although these indices provide conclusive evidence of amodel, they
cannot be used in isolation of other information to support the conclusion
of agood-fitting model. Goodness-of-fit indices provide aglobal descriptive
63

Evaluation of the model fit is usually one of the things researchers are most concerned
about. In the existing literature, there are two main approaches for assessing model fit. One
approach is based on goodness-of-fit test statistics, whereas the other approach emphasizes
the so-called fit indices. Actually, these two approaches are closely related because the definition of many fit indices involves goodness-of-fit test statistics [Zhang 2008, p.301].
64
For example, we can use residuals which can be examined to identify localized areas of
strain. Standardized residuals greater than 1.96 (for p < 0.05) or 2.58 (for p < 0.01) may indicate
areas of strain. Moreover, positive standardized residuals indicate that the model parameters
underestimate the relationship, whereas negative standardized residuals indicate the model
parameters overestimate the relationships [Harrington 2009, p.54].

298

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

summary of the ability of the model to reproduce the input covariance matrix, but the other two aspects of fit evaluation (localized strain, parameter
estimates) provide more specific information about the acceptability and
utility of the final solution.
In this section, although ahost of goodness of fit indices are available in
literature, only a handful will be described and recommended. They were
selected on the basis of their popularity, and classified under:
absolute indices,
comparative or incremental indices,
parsimony-based fit indices.
The first classic goodness of fit (i.e. absolute index) is chi-square. It is
based on the following null hypothesis: H0 : = (), where its most desired
score simply means afailure to reject it. Ahypothesis testing CFA model focuses on goodness of fit, and in typical situation, aresearcher specifies H0 to
be tested against alternative hypothesis H1. So if hypothetical model is correct, then H0 is true and residuals matrix contains only zeros: () = 0.
In practice, we do not know elements, hence we estimate them on the
sample variance-covariance matrix S [Hu and Bentler 1995]. We do not either know the model parameters, i.e. elements of the vector, which are also
the subject of estimation. And as previously explained there are afew methods of estimation. Each of them depends on minimization of the function
F S, ( )65.
In CFA we strive to obtain estimates for each parameter of the measurement model (e.g. factor loadings, factor variances and covariances, indicator
error variances and possibly error covariances) that ought to produce an implied variance-covariance matrix resembling e.g. the sample variance-covariance matrix as closely as possible. The S observed variance-covariance matrix
is based on sample elements, which are drawn out of population covariance
matrix , and whose elements are hypothesized to be functions of parameter
vector : = () explaining that, population variance-covariance matrix
equals the implied by model a variance-covariance matrix. Parameters are
estimated so that discrepancy function between the sample variance-covariance matrix and the implied variance-covariance matrix is minimized66.
65

Function F is called the fitting function. It varies according to the estimation method (i.e.
maximum likelihood function FML, generalized least squares FGLS, unweighted least squares
FULS, weighted least squares FWLS, (or ADF) or robust weighted least squares FWLSMV.
66
In research practice, the statistical test for goodness-of-fit test statistic is obtained simultaneously with the estimation. A goodness-of-fit test statistic indicates the similarity between
the variance-covariance matrix based on the estimated-implied model () and the popula-

Confirmatory factor analysis model CFA

299

As the CFA tests ahypothesized model fit in reference to empirical data,


the understanding of the fit application context in amodel is quite specific.
It is due to fact that each model represents simplified theoretically relationship in practice, and model alone cannot perfectly fit to data. Researcher
can only check all possible models and compare them under the conditions
of goodness of fit. In other words, awell or average model fit simply means
that this model belongs to group of models, which are not the worst. Therefore goodness of fit indices help us either to reject the model or inform us,
that this model cannot be rejected in all, assuming simultaneously its correctness67.
If now, we follow FML through the agency of which 2 will be expressed as
[Marsh, Balla and McDonald 1988, pp. 391392]:

2 = FML(n 1)

(6.102)

2 = (n 1)F S, ( ),

(6.103)

or:

then we will understand that this function has asymptotic 2 distribution


m(m + 1)
with df = (c p) degrees of freedom, where n is the sample size, c =
2
is the number of the unreduced variances and covariances, and p is the number of parameters.
The 2 provides atest of whether residual differences between and S
converge in probability to zero as the sample size approaches infinity [Cudeck and Browne 1983; Netemeyer, Bearden and Sharma 2003]. As typically
used, the model will be rejected if the 2 is large relative to the df, and accepted if the 2 is non-significant or small. For atrue model, the 2 has an
expected value equal to the df and does not systematically vary with sample
size [Tucker and Lewis 1973].
tion variance-covariance matrix from which asample S has been drawn. On the basis of
this definition, it is apparent that testing is aprerequisite to interpreting modeling results. If
amodel cannot be considered consistent with the population variance-covariance matrix, as
represented via sample data, there is not much point to interpreting the model parameters
[Chou and Bentler 1995, pp. 3738].
67
Hypothesized models are best regarded as approximations to reality rather than exact
statements of truth. For example, any model can be rejected if the sample size is large enough.
From this perspective, Cudeck and Browne [1983] argued that it is preferable to depart from
the unrealistic assumption of the hypothesis testing approach that any model will exactly fit
the data.

300

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

In literature criticisms of 2 included the following arguments [Marsh,


Balla and McDonald 1988; Bollen 1989a; Jreskog and Srbom 1989]:
more complex models based on 2 obtain better fitting function as
compared to simple models, though less complex models are much
more preferred,
it is inflated by sample size and thus large n solutions are routinely
rejected on the basis of 2, even when differences between S and ( )
are negligible68,
it is based on very exact fit and stringent hypothesis that S = .
Large 2 dependence on the size of the sample, forced researchers to propose anormed 2 index which is divided by number degrees of freedom:

2
;(6.104)
df

2 and 2/df are the most frequently used.


As previously noted, the 2 for the false model varies directly with sample
size, but the 2 for atrue model does not. Because df does not vary with the
sample size, hence, the effect of sample size on the 2/df ratio must be the
same as for the 2.
For alternative models with the same data, increasing the number of
parameters results in abetter (i.e. lower) 2. The 2/df ratio incorporates
apenalty function for using more parameters. And 2/df ratio on the order
of 3:1 or less is associated with better-fitting models, except in circumstances with larger sample (greater than 750) or other extenuating circumstances, such as high degree of model complexity [Marsh, Balla and McDonald
1988, p.392].
Suppose now that (), that is, the null hypothesis is not true. Then T
test will not be 2 distributed, but it may be still distributed as anoncentral
2 variate. However, an argument could be made that the null hypothesis is
never exactly true and that the distribution of the test statistic will be approximated better by anon-central chi-square with non-centrality parameter69.
68

Bentler and Bonett warned that the probability of detecting afalse model increases with
n even when the model is trivially false. Similarly, Marsh and Hocevar [1985, p.567] noted that:
most applications of confirmatory factor analysis require sometimes asubjective evaluation of
whether or not astatistically significant 2 is small enough to constitute an adequate fit.
69
A non-centrality parameter and degress of freedom df are required for the specification of anon-central 2 distribution, which can be denoted by 2(df, ).

Confirmatory factor analysis model CFA

301

The non-centrality parameter represents ameasure of discrepancy between and () and should be considered as apopulation badness-of-fit
statistic [Steiger 1989]. Thus, the larger the , the farther apart the true alternative hypothesis from null hypothesis. The usual central 2 distribution, is
aspecial case of the non-central 2 distribution for which = 0.
An estimate of the non-centrality parameter can be obtained as the difference between the statistic and its associated degrees of freedom. For models
that are not extremely mis-specified, an index developed by McDonald and
Marsh [1990], relative non-centrality (RNI) can be proposed. It is defined
as follows:
( i2 dfi ) ( h2 df h )
.
RNI =
2
i dfi

(6.105)

Because the RNI lies outside the 0 to 1 range, to remedy this, Bentler [1990]
adjusted the RNI index so that it would like the range of 0 to 1.
Finally, the other well-known indices for measuring absolute goodness
of-fit of data in CFA models are: GFI goodness-of-fit [Jreskog and Srbom 1981] and AGFI adjusted goodness-of-fit [Mueller 1996; Mulaik et al.
1989; Babakus, Ferguson and Jreskog 1987] with the range [0, 1]:
F S, ( )
GFI = 1
,
F S, ()

AGFI = 1

c F S, ()
c
(1 GFI),

= 1
df h F S, ()
df h

(6.106)

(6.107)

where:
F S, ( ) the least value of F fit function for hypothesized model,
S, () the least value of F fit function, when there is no hypothF
esized model,

c the number of unreduced variances and covariances of the
observable variables,
df h = c p (p is the number of estimated parameters) degrees of freedom for the hypothesized model.
Jreskog and Srbom [1981, pp. 140141] stated that GFI is: ameasure
of the relative amount of variances and covariances jointly accounted for by

302

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

the model and further asserted that: unlike 2, GFI is independent of the
sample size, whereas AGFI corresponds to using mean squares instead of
total sums of squares. Thus, AGFI incorporates apenalty function for additional parameters. When the number of free parameters increases, so the
number of degrees of freedom decreases, and simultaneously the level of
AGFI is decreasing too.
Values of both indices (GFI and AGFI) range from 0 to 1, although as it
occasionally happens, there might be negative values too. The higher index,
the better is goodness of fit in data to the model. Many researchers interpret
GFI or AGFI scores in the 0.80 to 0.89 range as representing reasonable fit,
and scores of 0.90 or higher are considered as an evidence of good fit.
In confirmatory factor analysis, Bentler and Bonett [1980] explained, that
the valuable information can be also obtained by comparing the ability of
nested models to fit the same data. In particular, one can compare the fit of
the proposed (hypothesized) model with the fit of anull model in which all
the p variables are assumed to be uncorrelated. In general, the null model
represents the more restrictive model that should be considered as baseline
model. Such types of models can be verified through the agency of incremental fit indices (known also as comparative fit indices)70.
For instance, Bentler and Bonett replaced the Tucker-Lewis index
(TLI)71 [Tucker and Lewis 1973] proposing non-normed (NNFI) and nor
med (NFI) index, which is expressed as follows72:
70
If the fit of abaseline model is reasonable, because the sample size is small or because
the measured variables are relatively uncorrelated, then the fit of ahypothesized model will
automatically be reasonable. If, however, the fit of the baseline model is reasonable, then there
is little covariance to explain and there is no basis of support for the hypothesized model, even
if it also fits the data.
71
Tucker-Lewis (TLI) index takes into account the expected value of the 2 statistic of the
target (proposed) model. The NFI and TLI assume atrue null hypothesis and therefore acentral 2 distribution of the test statistic. The most important characteristic of NFI is that the index is additive for nested model comparisons. Thus, if one defines the incremental normed fit
index comparing models Mj and Mk as NFIjk, it will be obvious that NFI for model Mk is the
additive sum of the component fits [Bentler 1990].
72
When considering the NFI and NNFI, we need at first, consider what is meant by the
nested models. For example, lets assume that we have amodel Mj which is constructed of the
model Mk by replacing free parameters in Mk with zero. Model Mj is more restricted model as
compared to Mk which means both models have the same structure, however there appears to
be more parameters (set on zero value) in Mj. This is, how we can define to opposite models:
Mi independent model or otherwise, baseline model, and Ms saturated model. Model Mi
has very constrained structure, where there are no factors based on observed variables, and
where [Sztemberg-Lewandowska 2008]: 1) observed variables are measured without error,

Confirmatory factor analysis model CFA

NFI =

i2 h2
i2

Fi Fh
F
= 1 h ,
Fi
Fi

i2 h2

df dfh

Fi Fh

dfi df h
=
NNFI =
,
Fi
1
i2

dfi n 1
dfi 1

303
(6.108)

(6.109)

where:
Fh the lowest value of F fit function for hypothesized (target) model
with degrees of freedom df h,
Fi the lowest value of F fit function for independent (baseline) model
with degrees of freedom dfi,
df h degrees of freedom for hypothesized (target) model,
dfi degrees of freedom for independent (baseline) model.
Values obtained in NFI, are placed in the range of 01 and in NNFI they
may even fall outside the range 01. As such, we should interpret only NFI
index. Its values close to 0 suggest that the proposed (hypothesized or target)
model is not much better than a model of complete independence (baseline model) and values close to 1.0 suggest, that the proposed model is an
improvement over the baseline model. Simply put, NFI assesses the fit of
ahypothesized model, relative to the fit of baseline model by scaling the 2
number from 0 to 1, with large numbers indicating better models.
Bentler and Bonett cautioned that the absolute value of NFI and NNFI
may be difficult to interpret but lower values, e.g. 0.90 usually mean the
model can be improved substantially. Harvey, Billings and Nilan [1985] described well-fitting models that yield NFI of at least 0.90 (i.e., where only
arelatively small amount of variance remains unexplained by the model).
Hence, values falling between 0.90 and 0.95 are considered marginal, above
0.95 are good, and below 0.90 are considered to be apoor.
2)observed variables are independent. On the other hand Ms model has lots of free (undefined)
parameters as there are variances and covariances of observed variables, i.e. dfs = (c p) = 0.
More restricted models which are close to Mi have higher overall 2 than less restricted
ones which are close to Mk. Since the former are nested within the latter, their goodness-offit
can be compared statistically with differences in 2 tests. Probably agood example of these
models might be acomparative framework where we present them on acontinuum, with the
independence model at one extreme, and the saturated model at the other extreme, and also
with the hypothesized model somewhere in between [Byrne 2010, p.73].

304

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

A major disadvantage of NFI is that it cannot be smaller if more parameters are added to the model. Its penalty for complexity is zero. In other
words, in NFI index there is no penalty for adding parameters. Thus, the
more parameters added to the model, the larger the NFI index.
Another disadvantage of NFI is that it is affected by sample size. It may
not reach the level of 1.0 in smaller samples. This difficulty was partially
resolved by its modified version, namely NNFI index, where the degrees of
freedom adjustment in the index was designed to improve its performance
near 1.0 (not necessarily to permit the index to reflect other features such as
parsimony) [Bentler and Bonett 1980]. However, as Bentler [1990] claimed,
the NNFI can be very small, especially in small samples, implying aterrible
fit when other indices suggest an acceptable model fit.
Bollen [1986] in order to modify NNFI index, so as to maintain its desirable feature, while minimizing its undesirable features (in the context of
sample size) proposed incremental fit index (IFI). However, this solution
did not wholly solve the problem of variability in the index.
Bentler [1990] proposed yet two additional indices, which were designed
to solve two basic problems of NFI and NNFI. Namely, normed comparative fit index (CFI)73 which was designed to correct the model fit when there
were small samples, and non-normed fit index (FI) in order to reduce number of values calculated for NNFI index which values exceed the interval
[0, 1].
The CFI and FI indices are the following:
l
CFI = 1 1 ,
l2

(6.110)

l
FI = 1 h ,
li

(6.111)

where: lh = (n 1)Fh df h , li = (n 1)Fi dfi , and l1 = max(lh , 0),


l2 = max(lh , li 0).
73
In comparison with TLI, if CFI index is less than one, then CFI is always greater than
the TLI. Index CFI pays apenalty of one for every parameter estimated. Because the TLI and
CFI are highly correlated only one of the two should be reported. Index CFI should not be
computed if the RMSEA (further discussed) of the null model is less than e.g. 0.158 or otherwise one will obtain too small avalue of the CFI. Moreover, TLI and CFI depends on the
average size of the correlations in the data. If the average correlation between variables is not
high, then TLI will not be very high.

Confirmatory factor analysis model CFA

305

James, Mulaik and Brett [1982] pointed out aserious drawback of either
Bentler-Bonetts NFI index or Jreskog-Srboms GFI. They argued that one
can get goodness-of-fit indices, approaching unity by simply freeing up more
parameters in amodel. In their opinion, it is due to estimates of free parameters which are obtained in such amanner as to get best fits to the observed
covariance matrix conditional on the fixed parameters. So, each additional
parameter freed to be estimated will remove one less constraint on the final
solution with consequently better fits of the model-reproduced covariance
matrix to the sample covariance matrix.
Indices that adjust existing fit indices for the number of parameters (that
are estimated) are called parsimony fit indices. James and Mulaik introduced
two such indices: parsimony goodness-of-fit index PGFI and parsimony
normed fit index PNFI, which combined logically interrelated parts of
information about the model: 1) goodness of fit and 2) parsimony of the
model:
df
PGFI = h GFI,
(6.112)
dfn
PNFI =

dfh
NFI,
dfn

(6.113)

where: dfn = c when there is no hypothesized model that is when all parameters of the model are established, i.e. there is no need to estimate parameters.
These indices provide the information about which model among aset
of alternative competing models is the best, considering its fit relative to
its complexity. A parsimony fit index is improved either by abetter fit or by
a simpler model. In this case, a simpler model is the one with fewer estimated parameters. And the parsimony ratio is the basis for these measures
which is calculated as the ratio of degrees of freedom used by amodel to the
total degrees of freedom available.
Parsimonious fit indices adjust the level of model fit with the parsimony
penalty coefficient, which for the vast majority of substantive models is less
than unity. The adjustments are used to penalize models that are less parsimonious, so that simpler theoretical processes are favored over more complex ones. In general, the more complex model, the greater loss in the model
fit. As aresult, we expect that parsimonious fit indices will have lower critical values from their non-parsimonious counterparts. However, simple use
of such indices alone in the interpretation of models remains controversial.

306

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

There are two problems associated with their use: 1) ahigh sensitivity to the
model size, i.e. anumber of observed variables, and 2) possibility, that the
rigid application of the criterion could lead researcher to selection of an incorrect model [Konarski 2009].
Finally, we should move for amoment to indices known as badness-of-fit,
such as [Steiger and Lind 1980]:
root mean square error of approximation (RMSEA),
standardized root mean residual (SRMR),
root mean square residual (RMSR).
RMSEA, SRMR and RMSR fall into category of indices, in which high values
are indicative of poor fit.
Steiger and Lind [1980] introduced RMSEA for evaluating covariance
structure models. This absolute measure of fit is based on the noncentrality
parameter and its computational formula is as follows:
RMSEA =

2 df
df (n 1)

,

(6.114)

where n is the sample size and df is the degrees of freedom of the model.
If 2 is less than df, then the RMSEA is set to zero. Like the TLI, its penalty for complexity is the 2 to df ratio. This index corrects for tendency of
2 goodness-of-fit test statistic to reject models with alarge sample or alarge
number of observed variables. It explicitly tries to correct for both model
complexity and sample size by including each in its computation. As aresult, the problems and paradoxes inherent in testing CFA models with large
sample size are reduced.
Another advantage of the RMSEA is the fact that it offers acoherent estimation strategy, both apoint estimate and aconfidence interval are available. Use of confidence intervals and PCLOSE (p of close fit see its further
description in the next chapter) can help understand the sampling error in
the RMSEA74 [Steiger 1990b; 2000].
MacCallum, Browne and Sugawara [1996] used RMSEA values of 0.01;
0.05 and 0.08 to indicate excellent, good, and mediocre fit respectively. In
practice, when RMSEA cutoff values are < 0.05 they indicate agood fit,
and RMSEA values of 0.1 or more are often taken to indicate poor fit.
74

There is greater sampling error for small df and low n models, especially for the former.
Thus, models with small df and low n can have artificially large values of the RMSEA. For instance, a2 of 2.098 (avalue not statistically significant), with adf of 1 and n of 70 yields an
RMSEA of 0.126.

Confirmatory factor analysis model CFA

307

Next index, SRMR, is apopular absolute fit index which can be viewed as
the average discrepancy between the correlations observed in the input matrix and the correlations predicted by the model (though in actuality, SRMR
is apositive square root average). Accordingly, it is derived from aresidual
correlation matrix. In SRMR the bias will be greater for small n and for low
df studies. This measure tends to be smaller as sample size increases and as
the number of parameters in the model increases. In most instances (e.g.
models involving asingle input matrix) the SRMR can take arange of values
between 0 and 1 with 0 indicating aperfect fit (i.e. the smaller the SRMR, the
better the model fit)75.
A similar index, RMSR (an average of the residuals), reflects the average discrepancy between observed and predicted, e.g. covariances. However,
RMSR can be difficult to interpret because its value is affected by the metric of the input variables, thus, SRMR is generally preferred, especially for
comparing fit across models. Although no statistical treshold level can be
established, the researcher can assess the practical significance of the magnitude of SRMR in light of the research objectives and the observed or actual
covariances or correlations.
Some other yet useful measures (which are based on the information criteria) in CFA model are the indices such as: Akaikes information criteria
(AIC) and Schwartzs Bayesian information criteria (BIC) [Akaike 1987]:
p
AIC = p ln ei2 + 2K ,
(6.115)

i =1
where:
p number of variables,
p 2
+the
AIC = p ln ei
2Ksum of squared residuals,

i =1
K number of the estimated model parameters.
First component of AIC Eq. (6.115) describes the goodness of fit in model
to data, which is decreasing when the number of factors is increasing, unlike
the second part of (6.115) which is increasing. Number of factors should be
defined in accordance with the lowest level of AIC index.
Because AIC is acomparative index, hence it is meaningful when two different models are estimated. As aresult, AIC index can be useful for com-

75

Smaller values of the RMSR are simultaneously associated with better fitting models with
scores below 0.05 considered as evidence of good fit [Byrne 1989; Jreskog and Srbom 1984].

308

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

paring alternative models that vary in the number of parameters used to


describe the same data. Lower values indicate better fit, hence the model
with the lowest AIC is the best fitting model.
The AIC index incorporates penalty function based on the number of
parameters that are estimated. The AIC serves as ameasure of the relative
quality of astatistical model, for agiven set of data. As such, AIC provides
means for model selection. The AIC deals with the trade-off between the
goodness of fit of the models and the complexity of the model. It is founded
on information entropy and it offers arelative estimate of the information
lost when agiven model is used to represent the process that generates the
data. However, AIC does not provide atest of amodel in the sense of testing
anull hypothesis, i.e. AIC can tell nothing about the quality of the model in
an absolute sense. If all the candidate models fit poorly, AIC will not give any
warning of that.
Finally, BIC index will be expressed as follows:
p
BIC = p ln ei2 + K ln p,
(6.116)

i =1
where the lower values indicate better goodness of fit in model to data.
The BIC solution is strongly recommended for models consisting of many
variables or few parameters which are being included in the model. The BIC
increases the penalty as sample size increases.
Burnham and Anderson [2004] presented afew simulation studies that
suggested AIC tends to have practical advantages over BIC. Firstly, AIC is
derived from principles of information and BIC is not, despite its name. Secondly, because the (Bayesian) derivation of BIC has aprior of 1/R (where R
is the number of candidate models), which is not sensible, since the prior
should be adecreasing function of k. Burnham and Anderson [2004] also
proved that AIC can be derived in the same Bayesian framework as BIC, just
by using adifferent prior76.
Given now the ambiguity that arise from multiple fit indices, it is essential
to develop some unifying set of principles for comparing fit indices. It is

76

Some other yet comparison of AIC and BIC, in the context of regression, was provided
by Yang [2005]. He argued that AIC is asymptotically optimal in selecting the model with the
least mean squared error, under the assumption that the exact true model is not in the candidate set (as is virtually always the case in practice). The BIC is not asymptotically optimal
under the assumption. Yang further showed that the rate at which AIC converges to the optimum is, in acertain sense, the best possible.

Confirmatory factor analysis model CFA

309

clear from review of the literature that different indices emphasis different
aspects of the model fit. In opinion of Hair et al. [2010], using typically three
or four fit indices, should provide fully adequate evidence of model fit. Some
research findings suggest afairly common set of indices which perform adequately across awide range of situations and the researcher need not to report all goodness-of-fit indexes because they are often redundant. However,
researcher should report at least one incremental index and one absolute
index, in addition to 2 value and the associated degrees of freedom, because
using asingle goodness-of-fit index, even with relatively high cut-off value is
no better than simply using 2 goodness-of-fit alone. Thus, providing 2 value and degrees of freedom, the CFI or TLI, and the RMSEA will usually provide sufficient unique information to evaluate the model. These indices were
to some extent described in Table23, where guidelines primarily referred to
different sample sizes and model complexity. As we can infer, simple models
and smaller samples should be subject to more restricted evaluation than
more complex models with larger samples. Likewise, more complex models
with smaller samples may require somewhat less strict criteria for evaluation
with the multiple fit indices [Hair et al. 2010].
Another Table 24 presents six dimensions according to Bollen and Longs
[1993] point of view along which fit indices in CFA might differ. Although, it
is possible that other potential dimensions of difference could be identified,
these dimensions characterize many of the decisions that have been invoked
in justifying the use of CFA model fit indices.
This classification should help the researchers to identify which fit indices
are the best for models being evaluated in particular applications. So then,
this system of classification makes it possible for researchers to select indices on the basis of their own particular concerns [Bollen and Long 1993,
pp.1517].
As far as the dimension no. 1 is concerned, we may consider sample fit indices as estimates of population quantities, particularly in reference to standard errors and confidence intervals for fit indices which provide astronger
inferential focus for these measures. The desire for afirmly statistical framework for fit indices in CFA has existed for a long time [Steiger and Lind
1980], and this desire caused the development of aclass of indices that estimate population parameters. Bentler [1990], Bollen [1989b] and McDonald
[1989] have developed such indices, all of which employ information from
the non-central 2 distribution. A related concern [Browne and Cudeck
1989] involved cross-validity of models in either the multi-sample or single
sample case. In all these cases and in all of the indices, concerns are being

310

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

Table 23. Basic characteristics of selected fit indices demonstrating goodness


of-fit across different model situations
Number of variables k
N < 250
k 12
12 < k < 30
k 30
k 12
2

InsigSignificant Significant Insignificant


p-values
p-values
nificant
p-values
even with expected
p-values
expected
good fit
even with
good fit
CFI or TLI 0.95 or
0.95 or
Above 0.92 0.95 or
better
better
better
RNI
May not
0.95 or
Above 0.92 0.95 or
diagnose
better
better, not
misspecifiused with
cation well
N > 1,000
SRMR
Biased
0.08 or less Less than
Biased
upward
(with CFI 0.09 (with upward;
use other
of 0.95 or CFI above use other
indexes
higher)
0.92
indexes
RMSEA
Values
Values
Values
Values
<0.08 with <0.08 with <0.08 with <0.07 with
CFI = 0.97 CFI of 0.95 CFI above CFI of 0.97
or higher
or higher
0.97
or higher
Fit indices

N > 250
12 < k < 30
k 30
Significant Significant
p-values
p-values
expected
expected

Above 0.92 Above 0.90


Above
0.92, not
used with
N > 1,000
0.08 or
less (with
CFI above
0.92)
Values
<0.07 with
CFI of 0.92
or higher

Above
0.92, not
used with
N > 1,000
0.08 or
less (with
CFI above
0.92)
Values
<0.07 with
CFI of 0.90
or higher

Source: Hair et al. 2010, p.672.

expressed about the adequacy of data from asingle sample to characterize an


underlying population structure.
Another basis for categorizing fit indices is the extent to which they penalize heavily parameterized models (dimension no. 2). James, Mulaik and Brett
[1982] when extending work by Tucker and Lewis [1973] and Jreskog and
Srbom [1981], discussed penalties for complex, highly parameterized models. The penalty notion was also discussed in case of Akaikes information
criterion (AIC) [Akaike 1987] for model selection, where all other things
being equal, more heavily parameterized models will be penalized relative to
simple models with fewer parameters. The AIC clearly represents the penalty
logic and encourages researchers to select the simplest solution from arange
of alternative models [Browne and Cudeck 1989].
Next dimension refers to issue of comparability of fit indices (i.e. whether
they are expressed in standard metric, typically ranging from 0 to 1) or incomparability (i.e. in situation when indices are only approximately normed

Confirmatory factor analysis model CFA

311

Table 24. Dimensions along which fit indices can vary


No.
Dimension
Description
1 Population based vs. Population-based fit indices estimate aknown population
sample based
parametric, and sample-based fit indices describe the datamodel fit in the observed sample at hand
2 Simplicity vs. com- Fit indices that favor simple models penalize models in which
plexity
many parameters are estimated. Fit indices that do not employ such acorrection do not penalize for model complexity
3 Normed vs. non
Fit indices that are normed are constructed to lie within an
normed
approximate [0, 1] range. Non-normed fit indices do not
necessarily lie in this range
4 Absolute vs. relative Relative fit indices are defined with respect to aspecific model
(otherwise increthat serves as an anchor for subsequent comparisons. Absomental)
lute fit indices do not employ such acomparison anchor
5 Estimation method Estimation method-free fit indices provide characterizations
free vs. estimation
of model fit that are unaffected by the choice of aspecific
method specific
estimation method. Estimation method-specific fit indices
provide different fit summaries across different methods of
estimation
6 Sample size indeSample-size-independent fit indices are not affected by sample
pendent vs. sample size, either directly or indirectly. Sample-size-dependent fit
size dependent
indices vary as afunction of observed sample size.
Source: Bollen and Long 1993, p.16.

to a0 1 metric or have relatively arbitrarily defined metrics). In fact, any


non-normed measure has the potential for being normed to a0 1 metric
and some appropriate baseline for the norming can be defined.
In the literature, there are two different normative standards. The null
model logic presented by Bentler and Bonett [1980] employs observed variables that are mutually uncorrelated. The second standard involves the models being compared with the observed data in terms of the total observed
variance that is accounted for by ahypothesized model77.
In reference to the dimension no. 4 (absolute vs. relative incremental indices), we can state, that many fit indices rely on abaseline model for comparison. In some cases, null model can posit zero correlations among observed
variables, but this may not always be appropriate [Sobel and Bohrnstedt
1985]. Bentler [1990] developed fit indices based on non-centrality param 77
In general, most researchers in social sciences prefer measures of fit that are normed in
the appropriate range and expressed as either aproportion of total fit or aproportion of total
variance [Cohen 1988].

312

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

eters which have employed abaseline criterion for comparison in reference


to the degree of model misspecification. More interestingly, Bentler in his
work developed acomparative baseline using uncorrelated variables mode,
despite the reservations that have been voiced previously in this standard.
Such fit indices are described as being relative in that sense, they rely on
three sources of information for their computation [Bollen and Long 1993]:
1)the sample covariance matrix, 2) the reproduced covariance matrix from
the hypothesized model, and 3) a reference point (e.g. a null model) that
serves as an anchor for describing fit. Thus, when we think about the issue of
absolute and relative fit indices we may focus on the anchor point for establishing model fit78.
In case of indices based on the estimation methods (free vs. specific) there
has been predominance of maximum likelihood methods ML. Other less
popular estimators are: non-iterative estimators [Bentler 1982], estimators
asymptotically equivalent to maximum likelihood estimators [Browne 1974],
estimators based on elliptical [Browne 1982; Bentler 1983] or more general
distributions [Browne 1982; Bentler 1983; Ihara, Berkane and Bentler 1990;
Ding, Velicer and Harlow 1995].
The last dimension no. 6 pertains to the role of sample size and its influence in the determination of overall model fit in CFA. Some guidance and understanding of the role that sample size plays in the determination of model
fit is important in study design analysis. Sufficiently large samples are necessary if we are to evaluate reliably aparticular model of interest. Often, this is
defined in terms of asking when sample are big enough [Tanaka 1987] to
balance the trade-off between sufficient statistical power to test amodel and
excessive statistical power so that substantively uninteresting model-data
deviations are nevertheless statistically significant [Bollen and Long 1993].

Statistical power and significance of the CFA models parameter


estimates
Except to the restrictive assumptions that govern the statistical test of S
there appears aproblem that, with enough statistical power, even trivial departures of S from will result in rejection of the null hypothesis. On the
78

Comparative fit indices such as those presented by Bentler and Bonett [1980] define arbitraly selected comparison point such as the model of mutually uncorrelated observed variables. The logic of these indices implies that no more complicated model can be hypothesized
for data if the data support the mutual uncorrelated model.

Confirmatory factor analysis model CFA

313

other hand, insufficient statistical power will lead to afailure to detect nontrivial departures of S from .
Determination of statistical power for CFA models in many cases is generally ignored. However, some contributions to the technical nature on CFA
simplify the determination of statistical power and promise to make statistical power astandard component of the inferential framework for CFA models. Just as there are two generally levels of statistical testing involved in CFA
(overall fit and parameter estimates), there are two levels of power considerations [Hoyle 1995].
On one level, there is the statistical power of the overall test of fit. And
with the exception of 2 goodness-of-fit test, MacCallum, Browne and Sugawara [1996] have outlined aprocedure for evaluating statistical power of the
test of fit based on the RMSEA [i.e. index invented by Steiger and Lind 1980].
A virtue of the strategy they proposed is that it only requires knowledge of
the number of variables, the degrees of freedom of aspecified model, and
some hypothetical sample size. Thus, power of the overall test of fit can be
calculated prior to data collection [Hoyle 1995].
On another level, there is the statistical power of individual parameter estimates. At this level it is important to realize that statistical power is not the
same for each and every free parameter in aparticular model. Moreover, parameter estimates within amodel are interdependent [Kaplan 1995]. Thus, it
is difficult to ascertain power for particular parameters without data in hand.
Generally, there are two strategies for evaluating statistical power of parameter estimates. Saris and Satorra [1993] advocate astrategy that involves
intentionally misspecifying a parameter of interest, then evaluating the degree to which the misspecification affects overall model fit. Simultaneously
Kaplan [1995] has developed a strategy for using specification-search statistics as ameans of indexing the statistical power of tests of particular parameters. In either case, overall fit or parameter estimates, knowledge about
statistical power can provide important context for understanding the outcome of statistical tests in CFA.
Now, in case of the interpretability, size and statistical significance of the
model parameter estimates, an initial step in this process is to determine
whether the parameter estimates make statistical and substantive sense.
From astatistical perspective, the parameter estimates should not take on
out-of-range values such as: completely standardized factor correlations that
exceed value of 1, negative factor variances, or negative indicator error variances. Such out-of-range values, which are often referred to as Heywood
cases (or offending estimates), may be indicative of model specification

314

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

error or problems with the sample or model-implied matrices (e.g. anon


positive definite matrix, small n). Thus, the model and sample data must be
viewed with caution to rule out the existence of more serious causes of these
outcomes [Hoyle 1995].
Statistical significance (for the obtained results indicating whether the estimated parameters are significant) is determined by dividing the unstandardized parameter estimate by its standard error. Because this ratio can be
interpreted as azscore, 1.96 would be the critical value at an alpha level of
0.05 (two-tailed).
In addition, it is important to evaluate the standard errors of the parameter
estimates and to determine if their magnitude is appropriate, or problematically too large or too small. Standard errors represent estimates of how much
error is operating in the model parameter estimates (i.e. how closely the
model parameter estimates approximate the true population parameters).
In other words, standard errors provide an estimate of how stable the model
parameter estimates would be if we were able to fit the model repeatedly by
taking multiple samples from the population of interest.
Although small standard errors might imply considerable precision in the
estimate of the parameter, the significance test of the parameter (i.e. zstatistic) cannot be calculated if the standard error approximates zero. Conversely,
excessively large standard errors indicate problematically imprecise parameter estimates (i.e. very wide confidence intervals) and hence are associated
with low power to detect the parameter as statistically significant from zero.
Problematic standard errors may stem from avariety of difficulties, such as
aincorrect model, small sample size, use of non-normal data, an improper
estimator, matrix type, or some other combination. Unfortunately, there are
no specific guidelines available to assist the researcher in determining if the
magnitude of standard errors is problematic in agiven data set. This is because the size of standard errors is determined in part by the metric of the
indicators and factors and the size of the actual parameter estimate, which
vary from data set to data set. However, keeping the metric of the variables
in mind, the researcher should be concerned about standard errors that have
standout values or approach zero, as well as parameter estimates that appear
reasonably large but are not statistically significant [Brown 2006].
Assuming now the problems related to insufficient sample size and inappropriate standard errors the researcher must consider parameter estimates
in the model that fail to reach statistical significance. For example, anon
significant factor loading in acongeneric CFA solution indicates that an observed measure is not related to its purported latent dimension, and would

Confirmatory factor analysis model CFA

315

typically suggest that the indicator should be eliminated from the measurement model. In anon-congeneric solution, anon-significant cross-loading
would suggest that this parameter is not important to the model and can
be dropped. Likewise, anon-significant error covariance suggests that the
parameter does not assist in accounting for the relationship between two
indicators (beyond the covariance explained by the factors).
Further, afactor variance that does not significantly differ from zero typically signals problems in the solution, such as: the use of amarker variable
(that does not have arelationship with the factor), substantial non-normality
in the input matrix, or use of asample that is too small. On the other side,
non-significant factor covariances (correlations) should be of concern, if they
depend on the theoretical context of the CFA solution.
At last, error variances are inversely related to the size of their respective
factor loadings, that is, the more indicator variance is explained by afactor,
the smaller the error variance will be. Thus, assuming no other problems
with the solution (e.g. aleptokurtic indicator, small n), non-significant error
variances should not prompt remedial action and in fact may signify that
an indicator is very strongly related to its purported factor [Brown 2006]79.
Finally, the acceptability of parameter estimates should not be determined
solely on the basis of their direction and statistical significance. Because CFA
is typically conducted in large samples, the analysis is often highly powered
to detect rather trivial effects as statistically significant. Thus, it is important
not only to demonstrate that the specified model reproduces the relationships in the input data well, but that the resulting parameter estimates are
of amagnitude that is substantively meaningful. For example, the size of the
factor loadings should be considered closely to determine if all indicators
can be regarded as reasonable measures of their latent constructs80.
Also interpretability of the size and statistical significance of factor intercorrelations depends on the specific research context. The size of the factor
correlations in multifactorial CFA solutions should also be interpreted with
79

In applied research, however, indicator error variances almost always differ significantly
from zero because an appreciable portion of an indicators variance is usually not explained by
the factor.
80
The issue of what constitutes asufficiently large parameter estimate varies across empirical contexts. For example, in applied factor analytic research of questionnaires, completely
standardized factor loadings of 0.30 (or 0.40) and above are commonly used to operationally
define afactor loading or cross-loading. However, such guidelines may be viewed as too liberal in many forms of CFA research, such as construct validation studies where scale composite scores, rather than individual items, are used as indicators [Brown 2006].

316

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

regard to the discriminant validity of the latent constructs. Small, or statistically non-significant, factor covariances are usually not considered problematic and are typically retained in the solution (i.e. they provide evidence that
the discriminant validity of the factors is good). However, if factor correlations approach 1.0, there is strong evidence to question the notion that the
factors represent distinct constructs. In applied research, a factor correlation that exceeds 0.80 or 0.85 is often used as the criterion to define poor
discriminant validity. When two factors are highly overlapping, acommon
research strategy is to respecify the model by collapsing the dimensions into
asingle factor and determine whether this modification results in asignificant degradation in model fit [Brown 2006].

Alternative models in CFA


In practice, when asingle model fits well (as reflected through the agency of
various statistical fit indices), some researchers may erroneously infer that
such model has been already specified, and that in some sense the model
has been proven. However, it is desirable to test the fit of model with some
other alternative models. The fit of the most preferred model will have more
impression when that fit occurs in the context of testing several models, especially when some of these models are theoretically plausible and are not
articulated so as to create straw arguments. Hence, the stronger test of the
proposed model is to identify and test alternative models that represent truly
different, but highly plausible hypothesized models.
Alternative models may range from the ones with radically different
structures to those with minor differences in one or two parameters. In practice, alternative models are usually represented as below [Thompson 2004]81:
1. Independent model which specifies the measured observed variables
that are all perfectly uncorrelated with each other, and thus no factors are
present. The model estimates only the model specification variances of
each observed variable. Statistics quantifying the fit of this model are very
useful in serving as abaseline for evaluating the fits of plausible alternative models.
2. One-factor model. This model is disconfirmable, usually not plausible
for most research situations. However, if this model is not theoretically
81
However there can be distinguished other models based on: uncorrelated measurement errors, or correlated measurement errors. For more information in reference to these
models, see Bollen [1989b, pp. 289292] or Sobel and Bohrnstedt [1985].

Confirmatory factor analysis model CFA

317

plausible, the fit statistics from this model test can be useful in characterizing the degree of superiority of the alternative multi-factor model.
Sometimes, it is desirable to rule out the fit of this model, so that support
for the multi-factor model will be even stronger.
3. Correlated vs. uncorrelated factors model where one may quantify the
degree of model fit on the basis of the correlated vs. uncorrelated factors.

Respecification of CFA model


If the initial CFA model does not fit the data well, we need to find out what
are the potential causes standing for it. Common cause is the misspecification. The errors range from the incorrect inclusion or exclusion of aparameter to afundamentally flaw model.
A common response to apoorly fitting model is to respecify it. Since the
respecification often is based on the results of the initial model, the analysis
becomes then exploratory. A consequence is that the probability levels for
the tests of statistical significance for new models must be regarded as approximations [Bollen 1989b].
There are several ways to respecify models and to test the fit of the revised
models. In the first approach, we base on the theory and substantive knowledge revisions. For example, researchers could start with amore parsimonious model and if they failed to fit the model, then they might add secondary
parameters82. They might try to add more parameters if needed.
The second approach is based on exploratory tests such as likelihood ratio (LR), lagrangian multiplier (LM) and Wald (W) [Bollen 1989b]. For instance, one of the simplest modifications that could be made is to introduce
asingle new parameter (i.e. remove arestriction). Researcher could estimate
all possible identified models that eliminate one restriction at a time and
compute a2 estimate for each. If none of the new models fits, then all possible two-at-time restrictions could be freed and 2 estimated. Theoretically,
this process could continue until no identified alternative models remain.
82
Initial models do summarize the researchers current thinking, but the lack of fit can
stimulate researchers to reconsider their ideas. However, rethinking process is specific to the
problem at hand, so one can provide only broad guidelines on how to proceed. Relevant questions include: Could additional latent variables underlie the observed variables? Do some of
the measures load on more than one factor? If some of the observed variables taken from the
same source or constructed in the same fashion, is amethod factor likely? Are there other reasons to expect correlated errors of measurement?

318

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

Alternatively, the researcher could introduce restrictions one at time, two at


time, etc. to determine if asimple model with agood fit exists.
Comparisons of the nested models are possible with the LR test statistic.
The LM statistic provides an asymptotically equivalent alternative for removing restrictions that reduces the number of estimated models. Obviously, the
LM statistic could also be estimated for all possible two-at-time, three-at
time, etc., freed restrictions. Though LM statistic does drastically reduce the
number of estimated models, it still involves many LM estimates if the researcher allows all possible combinations of restrictions. So when searching
for alternatives, through the agency of LM, the most common option is to
examine its univariate LM (modification indices), to free the restriction that
leads to the largest reduction in the 2 estimate, and to repeat this process
with the revised model until an adequate fit is developed [Bollen 1989b].
The W test has also applications in exploratory mode, because it identifies which new restrictions (e.g. parameters set at zero) lead to the smallest
declines in the 2 estimate. Like the LM test, it is possible to compute W
estimates for all one-at-time, two-at-time, etc., restrictions and to find which
combination is best. But this is not practical. One strategy is to compute
univariate W statistics for each free parameter and to see which W is lowest. When the restriction is setting afree parameter to zero, the univariate
W tests are not needed since they are equivalent to the square of the usual
z-test for asingle-parameter estimates. Once the restriction with the least
significant W (or z) estimate is imposed, the lowest W estimate for asecond
restriction is sought. This process continues until no further restrictions can
be added without asignificant increase in the incremental 2 estimates.
Bollen [1989b, pp. 300301] when comparing LR, LM and W tests, explained that (except to their merits) they have also serious limits. One is
that the order in which parameters are free or restricted can affect the significance tests for the remaining parameters. A second difficulty is that the
probability levels associated with the W and LM statistics in the stepwise
procedures are not likely to be accurate. Third aspect relates with the fact,
that these tests asses the changes in 2 estimates, not the size of changes in
parameter estimates. A fourth limitation of the LM test it is amodel (if it is
valid, then the LM tests are biased versions of the LM tests that would be obtained for the same restrictions under the correct model). A fifth limitation
of W and LM tests is that they are most useful when the misspecification is
minor. Finally, the limitation of the empirical search procedures is that the
freed parameters introduced into amodel may have no clear interpretation.
For example, we can introduce correlated errors or an additional path which

Confirmatory factor analysis model CFA

319

may improve statistical fit, but without an explanation for them, the substantive gain of their introduction is ambiguous.
The other yet approach which helps the researcher in respecification of
the model is based on empirical procedures such as [Bollen 1989b]: residual
matrix, component fit measures and piecewise fitting strategies.
In first option, large positive residuals suggest that some parameter has
been omitted from the initial model and small residuals create the impression that the structure of the model that concerns the corresponding covariances is sound.
Component fit measures may be useful in locating the problems when
improper solutions or parameter estimates that are implausible (e.g. RX2 are
near zero or far higher than is likely, signs of coefficients counter to that predicted) can call attention to troublesome sectors of amodel. Like residuals
though, the problem can be located in apart different from where the suspicious component of fit appears.
Finally, apiecewise model fitting is the procedure which involves estimating components of the entire model in the attempt to isolate the sources of
misspecification. A piecewise approach is to break a poorly fitting model
into components and reestimate each part separately.

Sample size and distributional properties of observed variables


inCFA
Sample size
Unfortunately, there is no simple rule regarding the minimum or desirable
number of observations for CFA. Indeed, any rule regarding sample size
and CFA would need to include avariety of qualifications such as estimation
strategy, model complexity, scale on which indicators are measured, or distributional properties of indicators. Sample size in CFA model is multifaceted
and should be additionally considered in context of the statistical power and
reliability of model estimations. Also, the likelihood of afactor structure replication is at least partially afunction of the sample size. In general, factor
pattern that emerges from alarge sample confirmatory factor analysis will be
more stable than that emerging from asmaller sample.
A minimal CFA requirement is that sample size must be at least greater
than the number of covariances (or correlations) in the data matrix. Tanaka
[1987] proposed that, minimally, sample size determination should consider
model complexity. He recommended aratio of at least 4 observations per

320

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

free parameter in the model as arule of thumb for determining minimum


sample size. And some other yet authors suggested aminimum 10 observations per parameter estimated [Jreskog and Srbom 1989], although
guidelines as low as 5 observations per parameter estimated have also been
suggested [Floyd and Widaman 1995].
Depending on the estimation method, such as ML, in CFA sample should
be of 200 or more observations [Raykov and Widaman 1995]. Although this
recommendation in reference to the number of observations (with at least
200 observations) seems to be somewhat arbitrary, it is advisable for anything beyond the simplest CFA model. The number of at least 400 is strongly
preferable. However as Netemeyer et al. claimed these suggestions based on
the assumption more is better strategy may not always be appropriate because very large samples may show trivial differences between the observed
and implied covariance matrices (or parameter estimates) to be significant
[Netemeyer, Bearden and Sharma 2003].
Distributional properties skewness, kurtosis and outliers
The CFA is amultivariate statistical model and therefore, is based on multivariate distribution of the data that affects estimation and testing. Multivariate normality means all variables are univariate normally distributed, the
distribution of any pair of variables is bivariate normal, and all pairs of variables have linear and homoscedastic scatterplots [Kline 2000]. Although it is
difficult to assess all aspects of multivariate normality, checking for univariate normality and outliers should help the researcher to detect most cases
of multivariate non-normality. Of course, observed data are never perfectly
normally distributed and in many instances are dramatically non-normal
[Micceri 1989].
The process of checking and detecting a non-normal distribution may
be conducted in reference to significant skewness or kurtosis. Skewness
is ameasure of how asymmetric aunimodal distribution is. If most of the
scores are below the mean, then the distribution is positively skewed, whereas, if most of the scores are above the mean, then the distribution is negatively skewed. On the other hand, kurtosis is ameasure of how well the shape
of the so-called bell conforms to that of a normal distribution. Positive
kurtosis, or aleptokurtic distribution, occurs when the middle of the distribution has ahigher peak than expected for anormal distribution. Negative
kurtosis, occurs when the middle of the distribution is flatter than expected
for anormal distribution.

Confirmatory factor analysis model CFA

321

A number of procedures are available for assessing univariate or multivariate normality of the measured variables. These procedures depend
on the calculation of higher order moments. A moment is defined as
(1/n)(X )k, where n is asample size, X is an observed score, is the
population mean, and k is the order of moment (k = 1 for the first-order moment, k = 2 for the second-order moment, etc.). When univariate normality
is satisfied, only the the first- and second-order moments (mean and variance) are needed to describe fully the distribution of the measured variables,
the standardized third-order moment is 0 and standardized fourth-order
moment is technically 3 for anormal distribution83.
In the process of the normality verification, we should test whether each
observed variable in aset of other variables, has significant skewness or kurtosis by dividing the unstandardized skewness or kurtosis index by its corresponding standard error. This ratio is then interpreted as az-test of skewness
or kurtosis. Ratios greater than 1.96 would have p-values less than 0.05, and
ratios greater than 2.58 would have p-values less than 0.01 indicating significant skewness or kurtosis in the data.
An alternative to the ratio test is to interpret the absolute values of the
skewness and kurtosis indices, with absolute values of skewness greater than
3.0 indicating the distribution is extremely skewed and absolute values of
kurtosis greater than 10.0 suggesting a problem. Values greater than 20.0
indicate a potentially serious problem with kurtosis [DeCarlo 1997; Harrington 2009].
At last, CFA model can be biased through the agency of outliers [Yuan
and Bentler 2001]. Outliers are extreme data points that may affect the results of CFA model. They typically occur because of the errors in responding
process, committed by examinees or data errors, or because afew examinees
represent a different population area from the target population which is
under study.
Outliers have dramatic effects on the indices of model fit, parameter estimates, and standard errors. They can also potentially cause improper solutions, in which estimates of parameters are outside the range of acceptable
83
Univariate distributions that deviate from normality possess significant non-zero skewness and kurtosis that are reflected in the standardized third and fourth-order moments, respectively. Non-zero skewness is indicative of adeparture from symmetry. Negative skewness
indicates adistribution with an elongated left-hand tail. And positive skewness indicates adistribution with an elongated right-hand tail. Kurtosis indicates the extent to which the height of
the curve (probability density) differs from the normal curve [West, Finch and Curran 1995,
p.60].

322

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

values. Thus, outliers can be problematic because they may cause non-normality and may result in Heywood cases.
Problematic outliers can be dropped from the analysis [Meyers, Gamst and
Guarino 2006] if the sample size is sufficiently large to allow that as an option.
Possible corrective actions for outliers include: 1) checking and correction of
the data for the extreme cases, 2) dropping the extreme cases, 3)redefinition
of population of interest or 4) respecification of the CFA model.
Finally, there are also cases when we deal with univariate or multivariate
outliers. Univariate outliers have extreme scores on one variable and can be
detected by examining z-scores. Cases with z-scores greater than 3.0 in absolute value are unusual and may be considered as outliers [Kline 2000]. In
large data sets, using z-scores greater than 3.0 may be too conservative, and
using acut-point of 4.0 or greater in absolute value would identify outliers
more accurately.
As far as the multivariate outliers are concerned, they can be detected
by using Machalanobis distance, which indicates how unusual acase is on
the set of variables compared with the sample centroids (e.g. means or midpoints) for all the variables [West, Finch and Curran 1995].
The other yet proposition to detect outliers is to identify observed data
points that are extreme relative to their predicted value based on aspecific
model. Bollen and Arminger [1991] made up the method based on factor
scores, which represent each cases predicted score on the hypothetical factor. These factors, in turn, are used to estimate aset of predicted scores on
the measured variables for each case. Raw residuals representing the difference between the predicted and the observed scores for each case on each
measured variable are calculated. The residuals are standardized, by using
procedures prescribed in Bollen and Armingers way and then plotted on
the graph.

Reporting practices and final remarks about the process


of CFA model construction
The question is now what should be reported in CFA analysis. Jackson, Gillapsy and Stephenson [2009] recommended some check-list in CFA analysis (see Table 25)84. They have opted to organize these recommendations by
putting them into categories that reflect the activity surrounding aresearch
project in which CFA is to be used.
84

Other yet recommendations were provided by McDonald and Ho [2002].

Confirmatory factor analysis model CFA

323

Table 25. The CFA reporting guidelines check-list


Stages

Detaild description

Theoretical formulation Theoretical/empirical justification of models tested


and data collection
Number and type of models tested (correlated, orthogonal,
hierarchical)
Specification of models tested (explicit relationships between
observed and latent variables)
Graphic representation of models tested
Sample characteristics (justification, sampling method, sample
size)
Identification of equivalent and theoretically alternative models
Specification of model can models be tested?
Data preparation

Screening for univariate and multivariate normality and outliers


Analysis of missing data
Scale of observed variables (nominal, ordinal, interval, ratio)
Description of data transformations

Analysis decisions

Type of matrix analyzed (covariance, correlation)


Estimation procedure and justification given normality assessment (ML, WLS, etc.)
Scale of latent variables

Model evaluation

Inclusion of multiple fit indices (e.g. 2, df, p; RMSEA, CFI, TLI)

Source: Jackson, Gillaspy and Stephenson 2009, p. 23.

Generally, when we prepare CFA reports, we must decide what, in particular, should be reported, especially when voluminous output information
in CFA models is considered. For example, according to Boomsma [2000],
Hoyle and Panter [1995] it is strongly recommended to report parameter
estimates in the measurement model. Researchers should also provide some
indication of the variance accounted for in variables. However, in the first
row, we should report information concerning the process of CFA model
construction in relation to theoretical assumptions.
If there are equivalent models, Boomsma [2000] suggested to provide information on how these models were defined and justified for the objective
of the study. However, there is astrong encouragement not only to describe
on how we identified equivalent models but also how we identified possible
competing (alternative) theoretical models against which the fit of the model
of interest was compared [Bentler and Bonett 1980]. Any evaluation of the
plausibility of the results should also include adecision about the credibility
of the observed measures used to identify the factors under study [Jackson,
Gillaspy and Stephenson 2009].

324

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

Next stage is referred to the description of the process of data collection,


which includes ensuring adequate and appropriate sampling procedures,
and when it is possible, utilizes suitable power analysis to obtain estimates of
an adequate sample size [MacCallum, Browne and Sugawara 1996; Muthn
and Muthn 2002]. Another good practice in reporting is to show how the
data was prepared. Many activities fit under this activity, from assessing
data integrity to evaluating the distributional assumptions of the estimation
method to be used. Concerning the latter, the most common estimation procedure in CFA is maximum likelihood (ML), which carries with it the assumption of multivariate normality. Past research has found that the failure
to meet the assumption of multivariate normality leads to an overestimation
of the 2 statistic and, hence, to an inflated type 1 error [Curran, West and
Finch 1996; Powell and Schafer 2001] and downward biased standard errors [Kaplan 2000; Nevitt and Hancock 2001; Bandalos 2002; Yuan 2005].
It should be noted, however, that ML estimation performs well with mild
departures from multivariate normality [Chou, Bentler and Satorra 1991;
Hu, Bentler and Kano 1992; Fan and Wang 1998].
Another step related to description of data preparation concerns the analysis and treatment of missing data. The effects of missing data depend on
the method used to address it, which may include for example: methods of
incomplete data analysis such as: listwise deletion, pairwise deletion, or imputation methods, e.g. mean substitution, multiple imputation, expectation
maximization, and so on. However, the most common approach is listwise
deletion or available case analysis [McKnight et al. 2007], in which cases
with any missing data points involved in the analysis are removed. This is
generally only an acceptable approach when data are missing completely at
random [Schaefer and Graham 2002]. It is interesting that there is evidence
which shows parameter estimates can be biased [Brown 1994] or convergence failures become more likely [Enders and Bandalos 2001], depending
upon the manner in which missing data is dealt with, even when it is missing
at random.
At last, researchers should report (on data preparation stage) on how they
examined univariate and multivariate normality of the observed variables or
applied criteria for deleting multivariate outliers, as well as how they used
data transformations like the square-root transformation designed to improve the distribution of measured variables.
Once the data have been adequately prepared for analysis, the researcher
still has some decisions to be made. They may involve description of the
choice of input matrix and the estimation methods. The default choices

Measurement invariance and multi group confirmatory factor analysis MGCFA

325

tend to be the variance-covariance matrix with ML estimation. And even if


this is the case, these choices should be stated explicitly85.
As far as another reporting practice (pertaining to the model evaluation
and modification) is concerned, the researchers need to describe the process of model fit evaluation. Aside from the 2 goodness of fit test, there are
numerous ancillary indices to be reported such as GFI, AGFI [Jreskog and
Srbom 1989], CFI [Bentler 1990] or root-mean-square error of Approximation RMSEA [Steiger and Lind 1980]. Many of the indices have different
properties, and some have been recommended against [Bentler and Bonett
1980]. Hu and Bentler recommended relying on fit indices that have different measurement properties, such as an incremental fit index (e.g.CFI) and
a residuals-based fit index, i.e. standardized root-mean-square residual
SRMR [Jreskog and Srbom 1989; Bentler 1995]. Other findings (from
Monte Carlo studies) suggest yet that on the basis of effect size, direct measures of fit are more sensitive to model misspecifications than incremental fit
measures [Fan, Thompson and Wang 1999; Jackson 2007].
Finally, drawing conclusions from past studies [Marsh, Balla and Hau
1996; Fan, Thompson and Wang 1999; Hu and Bentler 1998, 1999; Jackson
2007; Marsh et al.1998], we should say that the following fit measures tend
to perform well with respect to detecting model misspecification and lack
of dependence on sample size: gamma hat [Steiger 1989]; RMSEA; centrality index CI [McDonald 1989]; SRMR; Tucker-Lewis index TLI [Tucker
and Lewis 1973]; non-normed fit index NNFI [Bentler and Bonett 1980];
relative non-centrality index RNI [McDonald and Marsh 1990]; CFI and
Bollens delta 2 [Bollen 1989a].

Measurement invariance and multi group confirmatory factor


analysis MGCFA
Measurement invariance relates to one of the categories of quality assessment in intercultural studies. Table 26 shows general types of invariance research occurring at all stages of the research. Invariance of measurement is
related to the assessment of the extent to which measurements are made in
85
MacCallum and Austin [2000] indicated that in 50% of the studies they reviewed, authors
analyzed acorrelation matrix, which requires the use of constrained estimation or an approach
that ensures the model is scale invariant [Kline 2010].

326

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

Table 26. General categories for intercultural and cross-groups invariance


measurements
Categories
of equivalence

Types of equivalence
in the category

Description

Invariance of the
research problem

Conceptual
Functional

The identity of the constructs examined.


The similarity function of concepts and actions, predictive validity

Translation
invariance

Lexical
Idiomatic
Grammar
Pragmatic

The importance of vocabulary terms


The importance of mobile and customary
terms
The adequacy of grammatical structures
The importance of colloquial words in everyday life and action

Measurement
invariance

Global
Structural
Metric
Scalar
Measurement errors

The similarity of the covariance matrix


The adequacy of measurement models
Comparability of measurement units
Similarity measurement scale
Homogeneity of the impact of specific factors

Sample invariance Sampling units


Representativeness

Comparability of sampling units


Compliance operational units, the dimensions
of socio-demographic social stratification

Data collection
invariance

The similarity of behavior patterns, the definition of private and public spheres
Commonality of questions of cultural context, the areas of social taboos and permissions
Consistency and similarity of responses
to the posed questions and themes nonresponse

Communication
with the respondent
Context
Style and attitude
response

Source: own construction based on Sagan 2005.

tested groups using e.g. the same units and measures relating to the same
characteristics of respondents from different conditions and context of observation. As aresult, measurement is characterized by invariance. In case of
invariance absence, any difference between individuals and groups cannot
be reasonably interpreted [Sagan 2005].
Cross-cultural methodologists have emphasized that group comparisons
in multicultural environment should assume invariance of the elements of
the measurement structure (i.e. factor loadings and measurement errors)
and of response biases [Billiet 2002; Little, Slegers and Card 2006; van de
Vijver and Leung 1997]. It appears that this sort of comparison helps to en-

Measurement invariance and multi group confirmatory factor analysis MGCFA

327

sure that potential differences can be interpreted in reliable way [Vandenberg and Lance 2000].
Muthn [1989] argued that, in most cases, within-society research continues implicitly to assume homogeneity of the population. This situation
happens especially in the field research with convenience samples of social,
educational, or occupational sub-groups. However these groups may differ
from one another or from the overall population with regard to particular
measurement or structural parameters. As aresult, in the worst case, researchers may measure different constructs in the groups. Hence withinsociety
studies should always assess possible lack of measurement invariance. Without this assumption, one cannot even claim that the construct is the same in
the different groups [Vandenberg and Lance 2000; Little, Slegers and Card
2006]. Thus, a comparison of means or structural relations across groups
requires invariance of the measurement structures [Ployhardt and Oswald
2004; Thompson and Green 2006].
Multi group confirmatory factor analysis (MGCFA) model [Billiet 2002;
Jreskog 1971a]86 permits testing for invariance by setting cross-group constraints and comparing more restricted with less restricted models. The
MGCFA permits testing for full or partial invariance of the measurement
and its parameters (e.g. factor loadings, error variances)87. Simultaneously,
both the intercepts of the indicators and the factor means can be estimated
and tested for invariance.
In MGCFA when we approach to invariance measurement, we address
the following questions: are the measurement parameters (such as: factor
loadings, measurement errors etc.) the same across groups; are there pronounced response biases in aparticular group; is the theoretical construct
measured the same in all groups? [Tarka 2012].
This analysis begins in accordance with the factor analytic tradition, i.e.
when we try to depict variation in the observed variable Xi due to factor j
86
Increasingly, scale developers are using MGCFA to test the invariance of their measures
across groups, samples. If evidence for invariance exists, the generalizability of the scale is enhanced [Bollen 1989b; Marsh 1995; Steenkamp and Baumgartner 1998].
87
Although desirable, the statistical significance invariance of factor loadings across multiple groups is rare in practical applications [Marsh 1995]. Such afinding is more likely as the
number of samples and the number of items becomes larger. As such, partial measurement
invariance has been advocated as acceptable for measurement invariance models [Byrne,
Shavelson and Muthn 1987]. Partial measurement invariance requires that some of, not all,
the factor loadings be invariant before examining the relationships among constructs in anomological model (i.e. validity). Thus, partial invariance models represent another class models
in the invariance hierarchy.

328

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

and error (or uniqueness) i. For asingle variable Xig in group, the causal
inference is given:
Xig =

ijg jg + i,

(6.117)

j =1

where
Xig the i-th observed variable in aset of that measure jg in the respective group g,
g
ij the factor loading linking Xig with jg,
g represents agroup index, g = 1.2, , G.
The Eq. (6.117) in its matrix notation is expressed as follows [Millsap and
Everson 1991, p.480]:
g = Xg g + g ,

(6.118)

where
Xg vector of observed variables Xig in the respective group g,
Xg matrix of loading factors in the respective group g,
g vector of common factors in the respective group g,
g vector of unique variables in the respective group g.
The identification conditions (discussed in previous sections describing general CFA model) also hold here, but now they are modified with the regard to
each group. In MGCFA each group has its own measurement model (6.118).
In MGCFA framework, the measurement structure must be equivalent
(invariant), albeit not perfectly. In MGCFA we test the invariance of the parameter matrices implied by (6.118) by constraining cross-group equality of
these matrices. This is usually done in astepwise approach, where each step
constrains agiven matrix (e.g. Xg -matrix) to be equal across all groups. In
MGCFA each restricted model is nested within aless restricted one.
Models in MGCFA are tested hierarchically, where hierarchy begins
with the least restrictive model the same pattern of fixed and non-fixed
parameters across groups. If model shows reasonable fit, i.e. shows that all
loadings are significant, and shows evidence of discriminant validity among
the factors, this model is then used as baseline for comparison with subsequent models in the hierarchy. The second model, tested in the hierarchy,
specifies invariant factor loadings across groups. If this model does not differ
from the baseline one, the factor loadings are invariant across groups. Next,
third model in the hierarchy specifies that factor loadings and the factor co-

Measurement invariance and multi group confirmatory factor analysis MGCFA

329

variances are invariant across groups. Once again, if the fit of model does not
differ from that baseline model, then factor loadings and factor covariances
are considered equal. Establishing invariance for the second and third model
is generally considered the most important criterion in measurement invariance testing across groups [Bollen 1989b].
The fourth model in the hierarchy specifies that factor loadings, the covariances among the factors, and factor variances are invariant across groups.
If its fit does not differ from that of baseline one, then factor loadings, factor
variances, and covariances are considered equal.
The last model in the hierarchy specifies that factor loadings, covariances
among factors, factor variances, and error terms for individual item loadings are invariant across groups (i.e. full measurement invariance model).
If the fit of this last model does not differ from that baseline model, then all
measurement parameters are statistically significant [Netemeyer, Bearden
and Sharma 2003].
In general, the evaluation of measurement invariance is related to carrying out aseries of tests hierarchically by checking hypotheses related to
the dispersions among the groups. These tests should be carried out in asequence, because the bad model fit makes another baseless testing measuring
the level of groups equivalence [Meredith 1993].
Lets explain now in afew words, specific components of MGCFA model
such as: configural, metric, scalar invariance and factor variance-covariance
or measurement errors invariance.
Configural invariance of the factor structure implies the same number of
factors in each group and the same pattern of constrained and free parameters.
It is aprerequisite for the other tests. It is the very basic form of invariance and
assesses whether we find the same patterns of loadings between indicators and
factors in both groups. The parameter restrictions only refer to the patterns of
loading and non-loading. If configural invariance is not supported empirically, there are fundamental distinctions in the measurement structure, which
simply means then observed variables measure different factors [Davidov,
Schmidt and Schwartz 2008; Davidov and Depner 2011].
In case of the metric invariance model, it is more stringent in comparison
to the configural invariance, as additional restrictions are adopted. Metric
invariance means that, in addition to the conditions of configural invariance for all groups, the factor loadings are equivalent. If model of metric invariance is maintainable, then observed variables measure the latent factors
equally well. And if the model fit of the metric invariance model does not
decrease significantly, metric invariance of all items can be assumed.

330

Exploratory (EFA) and confirmatory (CFA) factor analysis for scale development

For instance, in metric invariance, the loading parameter should be the


same in group A and B. In terms of Eq. (6.118), this is tested by imposing
equality constraints on the X matrices that contain factor loadings (i.e.
we assume the following hypothesis that H X : AX = BX = = GX where
superscripts refer to groups A, B and G). Equal factor loadings indicate that
groups calibrate their measures in the same way. Hence, the values on the
scale have the same meaning across groups.
Because metric invariance concerns the construct comparability, that
metric invariance is astricter condition of construct comparability. According to the common factor perspective, factor loadings indicate the strength
of the causal effect of the factor on its observed variables and can be interpreted as validity coefficients [Bollen 1989b]. Hence, significantly different
factor loadings imply adifference in the validity coefficients. This raises concerns about whether the constructs are the same across groups. A configural
invariance provides the evidence that construct is related to the same set of
indicators. However, metric invariance is necessary to infer that construct
has the same meaning, because it provides evidence about the equality of
validity coefficients.
Next aspect relates to the scalar invariance which describes invariance of
the item intercepts (in similar way as in the regression equations) linking observed variables Xig with their factors jg. Item intercepts should be interpreted
as systematic biases in the responses of agroup to an item. Hence, the observed
mean may be systematically higher or lower (upward or downward biased)
than one would expect due to groups latent means and the factor loadings.
Scalar invariance is present if the degree of up- or downward bias of the
observed variable is equal across groups. It is absent if one of the groups
differs significantly in one or more of the item intercepts. And in order to
test for scalar invariance, one constrains the tau-vectors to be equal across
groups, according to hypothesis H : A = B = = G.
In case of invariance of factor variance-covariance, the invariance of factor variance exists when groups have the same variances in their respective
latent factors. This is tested by constraining the diagonal of the phi-matrices
Aj = Bj = = Gjk, to be equal. The invariance of the factor covariances
refers to equality of the associations among the factors across groups. It is
tested by constraining the subdiagonal elements of the phi-matrices, i.e.
H : Aj = Bj = = Gjk, to be equal. Covariances among factors have
implications for the constructs meaning or validity [Cronbach and Meehl
1955]. Hence, unequal covariances raise concerns about equality of measured construct meanings [Cole and Maxwell 1985].

Measurement invariance and multi group confirmatory factor analysis MGCFA

331

Next component in MGCFA measures the errors invariance. It


is based on the hypothesis that the error in the observed variables
H : A = B = = G , will be the same across groups.
Finally, we also test for invariance of the latent means, that is, the differences which appear between groups in the latent means. This step if usually done in order to consider if there are mean differences (based on mean
structure analysis) across tracks on the factors. The problem of estimating
factor mean differences was considered by Srbom [1974].
Having based on Eq. (6.117), the mean structure can be expressed as follows:
E(
E(X
Xiigigg)) == iigigg ++ ijgijgijgE(
E(jgjgjg))jgjgjg == E(
E(X
Xggg)) == ggg ++
XgXgXgE(
E(ggg))ggg ,(6.119)
where:
E(Xig) expected value of the i-th observed variable in group g,
jg the mean of factor j in group g (representing latent mean),
g k-dimensional vector of latent means.
It is done through the agency of model expansion from Eq. (6.118). As aresult we obtain:
g = g + Xg g + g ,

(6.120)

where the intercept g was added to the model and estimated by including
a vector of the observed variables means in addition to the vector of observed variables [Bollen 1989b].
Now, as we can observe, amanifest mean does not depend on its latent
mean but also on the factor loading and the item intercept. A manifest mean
difference can be caused either by alatent mean difference or adifference
in the loadings, intercepts, or both. Therefore, atest of the latent mean difference requires the equality of both the factor loadings and item intercepts
[Srbom 1974]. And the equality of the latent means is tested by constraining the kappa matrices to be equal across groups:
H : Aj = Bj = = G.(6.121)

VII. SCALE DEVELOPMENT


FOR HEDONICCONSUMERISM VALUES

Scale construction underlying assumptions


Hedonism construct definition
As J. OShaughnessy and N. OShaughnessy explained [2002, pp. 526527]:
hedonism is derived from Greek word hedone, which means pleasure,
enjoyment or delight. In literature, hedonism values share also with power,
achievement and the interests of self. These values can be simultaneously
shared with stimulation, self-direction and with some unique emphasis on
openness to change. However, the value theory does not specify which emphasis is stronger. For example, according to Schwartz and Boehnkes [2004]
point of view and their empirical research, hedonism is not equally close
to self-enhancement and openness. Although it correlates significantly with
both, it is clearly closer to openness.
For most people, hedonic values focus more on freely experiencing pleasure
and less on pursuing pleasure competitively. Hedonism must be considered in
aview of that pleasure which includes avoiding pain and which is the only
good in life. In philosophical disclosure there are variations on this theme.
And in psychological area, hedonism claims that pleasure is the only possible
object of desire, because all motivation is based on the prospect of pleasure.
Psychology claims that pleasure is interpreted as selfish gratification, that is
consistent with atraditional view of man. More interestingly, systems of psychology such as behaviorism whose categories stress materialistic satisfactions

Scale construction underlying assumptions

333

are by definition hedonistic. In other words, hedonism is the pleasure-seeking


activity, though, to leave it at that, makes hedonists of us all, since apreference
of pleasure, in the very broad sense, is what structures our lives1.
In context of the ethical hedonism, a pleasure is what we ought to pursue.
Unlike universal hedonism (that lies behind utilitarianism) has the moral
edge in arguing that every man ought to act, in whatever manner, in order to
bring about the most pleasure to the greatest number in the long run. Finally,
rationalizing hedonism, we define it as the pursuit of pleasure which makes
our action rational by making it purposeful, that is, we assume the criterion
of rationality and intentional action which demands afoundation in terms
of pleasure.
These various point of views explain that hedonism is typically regarded
as aform of egoism where pleasure and the avoidance of pain dominate in
main motives of human action. In the market, consumers are assumed to
ask only does it feel good without even making any serious attempt to
calculate the full consequences of action. This is narrow hedonism which
is the hallmark of todays consumer society. In Bourdies view [1984] ethic
of hard work was replaced by narrow hedonism as the fun ethic of modern
day society, and the sellers exploit this trend by focusing on selling through
emotional words and images instead of product substance.
Hedonism, among critics of the consumer society is tied to popular usage since it is viewed as pleasure-seeking that is driven. Hedonism is seen as
something less than addiction, something more than ideology, something
that victimizes consumers, even though they may understand its dysfunctional consequences at the detached intellectual level [Rohatyn 1990]. If
hedonism dominates the consumer society, its pleasures are fleeting and
uncertain. According to Campbell, modern day hedonism differs from the
narrow hedonism used by critics of the consumer society. Campbell does not
suggest that modern hedonism equates with self-indulgence. He acknowledges that the self-illusory pleasure-seeker may be led in the direction of
1

Campbell [1987] distinguishes pleasure-seeking from satisfaction-seeking. Satisfactionseeking is to fulfill biological needs to relieve discomfort arising from deprivation (e.g. hunger). In contrast, pleasure-seeking aims for aquality of experience arising from certain patterns of sensation. For Campbell the pleasures of consumption reside mainly in imagination.
Consumers imaginatively anticipate the pleasure that anovel new product might bring, though
the reality never lives up to what they anticipate. It is atragic saga of continuous hope and continuous disappointment, with true pleasure typically lying only in the imagination. According
to Campbells point of view, understanding todays hedonistic buyers means understanding
how consumers use fantasizing to generate feelings.

334

Scale development for hedonicconsumerism values

idealistic commitment. Campbell thus moved from narrow hedonism to


abroader view [OShaughnessy and OShaughnessy 2002].

Hedonism in context of the consumerism


Consumerism and hedonism are part of rhetoric of reproval and the reprobation suggesting that selfish, irresponsible pleasure-seeking has come to
dominate humans life. These terms do not have definitive meaning but are
loose conceptual bundles, covering multiple diverse phenomena. They have
become rhetoric. They are not exact terms but emotional rhetorical brands.
However, they affect on attitudes as asocial force in equating the simple desire for acquisition with greed.
The term hedonic was first used in aconsumption sense by Hirschman
and Holbrook [1982, p.92]. Hedonic consumption referred to those: facets of consumer behavior that relate to the multisensory, fantasy and emotive aspects of ones experience with products. By multisensory Hirschman
and Holbrook [1982] meant the receipt of experience in multiple sensory
modalities including tastes, sounds, scents, tactile impressions and visual images. While consumer researchers typically assume these experiences to be
afferent (e.g. aproduct taste test), the hedonic perspective also posits efferent
experiencing of multisensory impulses as an important form of consumer
response. Individuals not only respond to multisensory impressions from
external stimuli (e.g. aperfume) by encoding these sensory inputs but also
react by generating multisensory images within themselves [Hirschman and
Holbrook 1982]2.
In addition to the development of multisensory imagery, another type of
response related to hedonic consumption involves emotional arousal. Emotions represent motivational phenomena with characteristic neurophysiological, expressive and experiential components [Izard and Buechler 1980].
They include feelings such as joy, jealousy, fear, rage and rapture [Freud
1955]. Emotive response is both psychological and physiological in nature,
generating altered states in both the mind and body [Schacter and Singer
1962]. It includes but also extends beyond the affect or preference variables
often studied by marketing researchers. Rarely in marketing research the full
scope of emotional response to products has been investigated. In research
2
For example, smelling aperfume may cause the consumer not only to perceive and encode its scent but also to generate internal imagery containing sights, sounds and tactile sensations, all of which are also experienced.

Scale construction underlying assumptions

335

on hedonic consumption, however, this range of feelings plays amajor role.


The seeking of emotional arousal is posited to be amajor motivation for the
consumption of certain product classes, e.g. plays and sporting events [Holbrook 1980]. Further (as was empirically evidenced) emotional involvement
is tied to the consumption of even simple products such as cigarettes, food
and clothing [Levy 1959].
In sum, hedonic consumption refers to consumers multisensory aspects,
images, fantasies and emotional arousal in using products. This configuration
of effects may be termed as hedonic response. Hedonic value or the hedonism concept is referred to the esthetic and experience-based subjective aspects
of consumption and means regarding products as symbols. The experiential
view associated with hedonism takes afar more holistic approach to the consumption process, right from involvement to post purchase usage. Emotional
arousal, seen as atype of consumer response related to hedonic consumption,
is considered amajor motivation for at least some products and hedonic value
as determining the level of involvement with the purchase of the products. It
reflects across all stages of decision-making, in the involvement (emotional as
opposed to thought based), in the task specification (experience oriented rather than problem-solving), in the motivation to search for information (more
affective than cognitive), and finally in terms of how products are perceived
and evaluated (symbolic meaning rather than feature based evaluation).

Relationship between consumption, hedonism and utilitarism


While the systematic study of hedonic consumption began in the late 1970s
[Hirschman 1982; Hirschman and Holbrook 1982; Holbrook and Hirschman
1982], it has also roots in the research on consumer motivation [Dichter
1960] and product symbolism [Levy 1959]. Theory describing the consumers consumption value is strongly derived from the shopping experience in
reference to hedonic or utilitarian values. A good example was provided by
Babin, Darden and Griffin [1994] when they developed ascale for consumers evaluations of ashopping experience along two dimensions: hedonism
and utilitarism. In their study, value was related to hedonic and utilitarian
gratification as aresult of earlier shopping experience. And hedonic values
were mainly associated with fun, playfulness, entertainment and all other
emotional background.
Hedonism (e.g. during shopping activities) represents, to some extent,
decisions related to the purchase of goods. People sometimes buy so they can

336

Scale development for hedonicconsumerism values

shop, not shop so they can buy. In this sense, hedonically rewarding shopping experiences are not akin to anegative sense of work. Here, increased
arousal, heightened involvement, perceived freedom, fantasy fulfillment, and
escapism may indicate ahedonically valuable shopping experience [Bloch
and Richins 1983; Hirschman 1983]. Furthermore, vicarious consumption
can provide hedonic value by allowing consumer to enjoy aproduct benefits
without even purchasing it. In sum, shopping with or without purchasing
provides hedonic value in many ways [Markin, Lillis and Narayana 1976].
In certain situations, the actual purchase act can produce hedonic value
and may serve as the climax of the buying process. In these instances, product acquisitions are not what they may appear to be and can be driven by
something other than tangible product attributes. Impulse purchases, for
example, result more from a need to purchase than a need for a product
[Rook 1987]. Compulsive shoppers likewise, gain intrinsic value from the
act of purchasing itself [Faber and OGuinn 1989]. A form of hedonic value
with less serious consequences results from purchases made by product enthusiasts [Bloch and Bruce 1984]. Quite often, product enthusiasts acquire
products for hedonic responses associated with self-concept enhancement
rather than for any utilitarian benefits.
Generally, hedonic shopping value refers to the sense of enjoyment and
pleasure that the consumer receives from the entire buying experience associated with shopping at astore [Griffin, Babin and Modianos 2000] and
this value perception could also vary depending on individual shopping
orientations, the cultural orientations as well as the economic and competitive environment in which the consumer shops [Woodruffe-Burton, Eccles
and Elliott 2002]. What is more interesting, hedonic value impacts the consumer need-satisfaction behaviors through consumption. Cheng-Lu Wang
et al. [2000] argued that consumers with stronger hedonic values tend to
consider consumption as more than satisfying basic or survival needs. Their
consumption behaviors are characterized by pursuing instant gratification,
spending expressively or symbolically, and seeking enjoyment and fun.
Now, while hedonic values are said to be related to the consummatory
gratification through the experiences of values such as fun, fantasy or play
fulness, utilitarian values are described as rational and concerned with expectations of consequences. For consumer value (derived from the shopping
experience) the utilitarian orientation has been described as atendency to
emphasize the perceived functional value or physical performance features
(e.g. quality and value) of products in choice behavior [Sheth, Newman and
Gross 1991]. Traditionally, the functional value is considered the primary

Scale construction underlying assumptions

337

driving force of consumer choice and is more likely to be adopted by consumers with traditional lifestyles. People with greater utilitarian values tend
to live simpler lifestyles and may consider consumption necessary for survival or as atool to reach higher-order life goals rather than as the terminal
goal of enjoyment. Consequently, they tend to be more value conscious and
have more positive value perceptions regarding prices paid [Feinberg, Kahn
and McAlister 1992; Wakefield and Barnes, 1996]. In contrast, those who
focus on hedonic values are modern consumers who tend to use surplus income to satisfy their ever-growing new desires for consumption [Campbell
1987]. As such, consumers with stronger hedonic values may not be satisfied
by the functional value of aproduct. Instead, they may be more concerned
with the expressive or emotional value of aproduct, such as brand, design,
appearance and packaging, than with the quality and price. They appear to
derive their gratification from the immediate hedonic pleasure experience of
consumption [Fischer and Arnold 1990].

Hedonic vs. utilitarian benefits and the sense of guilt


The distinction between hedonic and utilitarian alternatives may be also
characterized against the background of market goods, in this sense that both
are expected to offer benefits, and neither is reasonably expected to directly cause any obvious harm. This is consistent with Ravi and Wertenbrochs
[2000] conceptualization, in which both hedonic goods, such as audio tapes
and apartments with aview, and utilitarian goods, such as computer cds and
apartments close to work, are expected to deliver positive payoffs, but of different types. Hedonic (utilitarian) alternatives can be linked to relative vices
(virtues). However, a fundamental difference is that the payoffs from both
hedonic and utilitarian consumption lie primarily in the gains domain, and
any harm that may ensue in the future is speculative, ambiguous, and indirect. The payoffs from consuming the wants (vices) versus shoulds (virtues)
explicitly straddle the gain and loss domains. Differences in judgment and behavior in the gain versus loss domains are well documented [Kahneman and
Tversky 1979; Thaler 1980; Thaler and Johnson 1990]. The wants and shoulds
are defined explicitly in terms of the temporal trade-offs of benefits and costs.
As previously explained, both hedonic and utilitarian goods offer benefits
to the consumer, the former primarily in the form of experiential enjoyment
and the latter in practical functionality [Batra and Ahtola 1990; Hirschman
and Holbrook 1982; Mano and Oliver 1993]. Due to this difference, there

338

Scale development for hedonicconsumerism values

appears to exist often asense of guilt associated with hedonic consumption


[Kivetz and Simonson 2002a; Strahilevitz and Myers 1998]. And because of
this guilt, it is more difficult to justify spending on hedonic goods and easier to justify spending on utilitarian goods [Prelec and Loewenstein 1998].
Intuitively, guilt and justification are interrelated concepts, not competing
theories for explaining the choice of utilitarian over hedonic goods. A sense
of guilt may arise in anticipation or as a result of making an unjustifiable
choice. An alternative may seem unjustifiable if there is asense of guilt associated with it.
Since the utilitarian or hedonic consumption were both discretionary,
the difference between these two may be amatter of degree and perception.
Hedonic (utilitarian) consumption tends to be perceived as relatively more
discretionary (necessary) in nature. For example, the same product, such as
amicrowave, may be necessary to some and discretionary to others. Thus,
it is more difficult to justify spending on hedonic goods and easier to justify
spending on utilitarian goods [Prelec and Loewenstein 1998; Thaler 1980].
Two reasons for this relative difficulty in justifying hedonic consumption are
that: 1) there is asense of guilt associated with it and 2) its benefits are more
difficult to quantify.
Hedonic consumption evokes a sense of guilt [Kivetz and Simonson
2002b; Prelec and Loewenstein 1998; Strahilevitz and Myers 1998] and it is
often construed as wasteful, which may be areflection of aculture that values hard work and parsimony. When the sense of guilt is mitigated, hedonic
consumption increases. After that, consumers put effort into the acquisition
of hedonic goods, they believe that they have earned the right to indulge and
thus become more likely to consume.
On the other hand, bundling ahedonic purchase with apromised contribution to charity reduces the sense of guilt and facilitates hedonic purchases
[Strahilevitz and Myers 1998]. This basic idea also lies behind gift giving.
People enjoy receiving hedonic goods as gifts, even though they may not
make such purchases for themselves [Thaler 1980]. It can be argued that
guilt makes hedonic consumption more difficult to justify, but likewise,
asense of guilt may arise in anticipation or as aresult of making an unjustifiable choice. People try to construct reasons for justification [Shafir, Simonson and Tversky 1993], and it is easier to construct reasons for utilitarian
consumption than for hedonic consumption. Hedonic goods deliver benefits which may be more difficult to evaluate and quantify than the practical,
functional benefits that utilitarian goods deliver. Quantifiable reasons are
easier to justify. Because justifiable options are easier for people to choose

Scale construction underlying assumptions

339

[Hsee 1996; Simonson 1989], it should be easier for people to consume hedonic goods when the situation facilitates the justification.

Some other influence of hedonic values on consumers behavior


Studies conducted by Wang et al. [2000] have shown that hedonic values were
positively associated with novelty seeking, responsiveness to promotion stimuli,
brand consciousness, and preference for foreign brands. In the former case, novelty seeking refers to apropensity to seek new experiences and novel stimuli
to try new products or change brands for increasing stimulation and variety
[Katz and Lazarsfeld 1955; Leavitt and Walton 1975; Raju 1980; Hirschman
1980; McAlister and Pessemier 1982]. Consumers with stronger hedonic values tend to enjoy more colorful lifestyles. Consequently, their behaviors are
likely to be motivated by exploration, novelty and variety. It is expected that
these consumers are likely to be the pioneers in new product diffusion processes. This is because anew product or adifferent brand may provide them
with more ways to satisfy their needs for optimal stimulation [Raju 1980], innovation [Leavitt and Walton 1975] and sensation seeking [Zuckerman 1979].
On the other hand, consumers with weaker hedonic values tend to be value
consciousness [Netemeyer, Johnston and Burton 1990] and display propensities for repetitive behavior [Raju 1980]. They are less likely to change brands or
try new products and are more likely to be among the late-coming majority or
behave as laggards in the market [Robertson1967].
Researchers have further suggested that novelty seeking behavior by consumers is related to responsiveness to promotion stimuli or to being prone
to promotion, which refers to atendency to use promotion information as
abasis for making purchase decisions [Feinberg, Kahn and McAlister 1992;
Wakefield and Barnes 1996]. Individuals with agreater novelty-seeking tendency may be interested in promotion stimuli that offer stimulation and
added value beyond the typical utilitarian functions of products. Since consumers with stronger hedonic values seek novelty and variety in trying new
products, they are expected to be more sensitive and responsive to promotion influences in their product choice and brand-switching behavior.
Hedonism is also related to products, that is, all products have acertain
degree of hedonism. This is because they all have some degree of symbolic
meaning and arouse at least some degree of hedonic motivations among individuals [Hirschman and Holbrook 1982, Holbrook and Hirschman 1982].
If products are varying in the extent of inherent symbolism, then one can
expect that the hedonic value would vary across product categories. This is

340

Scale development for hedonicconsumerism values

supported by research examining the extent of hedonism in different products [Bloch, Sherrell and Ridgway 1986; Batra and Ahtola 1990; Lofman
1991; Babin, Darden and Griffin 1994].
Hedonic value across products seems to vary depending on the intrinsic
and extrinsic attributes of the product [Dodds and Monroe 1985]. Utilitarian value is associated with tasks that are easily completed. Thus, any product associated with simple routine task completion like purchase of coffee
or detergents is likely to be less in hedonic value as compared to aproduct
with higher degree information processing and involvement such as cellular
phones where the outlay is much larger and bargain seeking behavior may
impact product purchase [Monroe and Chapman 1987].
Since products have been known to possess symbolic or conspicuous consumption value in excess of their functional utility [Veblen 1899], hedonic consumption acts are based not on what consumers know to be real but rather on
what they desire reality to be [Hirschman and Holbrook 1982]. Aconsumers
choice of aproduct or abrand is frequently based on the congruency between
the consumers lifestyle and his/her consumption values and the perceived
symbolic meaning of the product or brand [Levy 1959, 1963; Hirschman and
Holbrook 1982]. Those consumers with stronger hedonic values are expected
to be more brand-conscious and to choose aproduct or abrand based on its
symbolic or expressive value more than its functional value.
Research also indicates that foreign brands often trigger cultural stereotypes and influence product perceptions and attitudes. For instance, the
French pronunciation of abrand name was found to have positive effect on
the perceived hedonic value (e.g. aesthetic sensitivity, refined taste and sensory pleasure) of products and on the attitude toward the brand [Leclerc,
Schmitt and Dube 1994]. For some products, such as clothing and cosmetics,
the symbolic or hedonic values associated with aforeign brand can be akey
determinant of brand selection. Since consumers often use brand names to
express their lifestyles, it is expected that consumers with strong hedonic
values also show apreference for foreign brands.
What is more interesting, studies in the field of luxury goods consumption have shown that luxury products are likely to provide more subjective
intangible hedonic benefits [Dubois and Laurent 1994] such as sensory
pleasure, aesthetic beauty, and excitement [Vigneron and Johnson 2004].
Hence, hedonism describes the attractive properties acquired from the purchase and consumption of aluxury brand as arousing feelings and affective
states received from personal rewards and fulfillment [Sheth, Newman and
Gross 1991; Westbrook and Oliver 1991].

Research methodology

341

Research methodology
All presented-above theoretical assumptions prove the presence of multidimensional construct (measuring hedonic-consumerism values) rather than
unidimensional one. In this section, we will focus on the methodological
aspects pertaining to development of the construct. This process (see Figure 35) was based on commonly accepted methods of scale development
in the consumers research [Churchill 1979] with some incorporated advancements in the evaluation of multi-item measures [Anderson and Gerbig
1988]. The data was analyzed in afew stages. Firstly, we tried to find the exact
contents of items that would fit the scale, where items would cover all necessary aspects of the hedonic-consumerism construct (HCV). When we hypothesized the strength and direction of the hedonic values, we considered
at least afew factors where two or more items would have loaded on each
factor. In consequence, in this study, we had explored HCV construct (on
the basis of exploratory factor analysis EFA), which was then purified and
confirmed on CFA analytical models. Obviously, the exploratory results
formed only some empirical hints which were further developed through
the agency of CFA models. Hence, we proposed the interplay between EFA
and its overall model fit which was examined by confirmatory factor analysis
(CFA), in order to determine whether the factors identified by the exploratory analysis were substantially related to the value dimensions indicated in
the structural model.

Purification and validation


Content analysis and categorization
Hedonic-consumerism values
description and their dimensionality
Initial pool of items
Expert judgement tasks in content
and face validity
Qualitative inquiry (depth interviews)
and literature review

Items analysis
Exploratory factor analysis (EFA)
EFA reliability assessment based on
alpha cronbachs coecient
Confirmatory factor analysis (CFA)
Reliability assessment based on CFA
Validity of construct (i.e. convergent,
discriminant and nomological
validity) in a view of CFA

Figure 35. Scale development process

342

Scale development for hedonicconsumerism values

Having based on the qualitative exploratory research studies, pilot test


and quantitative inquiries, ascale measuring hedonic-consumerism values
was developed. The major path of methodological process was follows:
investigate the area of hedonic-consumerism values,
develop and purify ascale for the measurement of hedonic-consumerism values (HCV)3,
validate the hedonic-consumerism values scale (HCV).

The rationale choice of young consumers as asample for personal


values analysis
In designing sample (both in quantitative and qualitative inquiry) we have
taken into account mainly youth population. This segment in the market
consists of prospective consumers who provide on large-scale benefits for
companies.
In the course of time, the structure of young consumers personal values
and their preferences have considerably evolved as compared to previous decades in the society development. Now these values are constantly changing
due to the modern technology, globalization, and wide-range exchange of international culture which has its strong impact on younger peoples lifestyles,
forcing them to throw away values such as: stabilization, family, rationality or
ascetism towards more contemporarily-based values as: materialism, egoism,
hedonism, where mentality to have instead to be takes the lead.
In youth market, one may also note modern individuality, narcissism
and excessive style of consumption, where most of young people search for
endless fulfillment of needs, permanent self-accomplishment and eternal
joy intertwined with claims to everything and everybody. There is no doubt
that such acomposition of values creates apromising land for companies
because it provides many opportunities to produce and sell more products
and services.
The question is, by what specific means and to what extent, hedonic values can be related to youth market behavior and economics in all? To answer this question afew facts about the importance of youth segment and
3

Scale purification was concerned with exploratory factor analyses, confirmatory factor analyses, and an initial assessment of item analyses, scale reliability and validity, etc. Here,
standard [Anderson and Gerbing 1988; Churchill 1979; Hair et al. 2010], as well as emerging
guidance in the literature was employed in item reduction and assessment of the resulting factor structure.

Research methodology

343

its growth within amarket, must be noticed. Firstly, over the past decade or
so, marketers have increasingly targeted the millions of young people. It has
been estimated that youth sector alone directly or indirectly influences over
$170 billion in annual sales. For instance, youth accounted for just under
9% of the 12.2 billion recorded music market in 2012 and was the only segment under age 34 to show any increase in purchase levels. Youth market is
also of sufficient significance to warrant annual conferences for marketers of
Fortune 500 firms interested in better understanding market trends. In that
case, identification of the youth values associated with market segment (representing amajor or at least considerable part of the market) and accurate
verification of these values that young people hold, can tell marketers about
the factors of successful market policy.
Why else is the understanding of youth values for market implications
so important? Values have the potential to help clarify the understanding of
consumers motivations and may point to the underlying rationality or ostensibly illogical decision processes. For example, consumers may prefer the
taste of one beer to another, despite these beers are identical in every respect,
except the values to which their marketing materials have been tied. Thus,
identification of values provides company an opportunity to develop better
advertising and communication programs that connect the product or other
benefits to consumers personal meanings and values at several, increasingly
meaningful levels of abstraction. Furthermore, efforts to measure advertising or communication effectiveness may be improved by assessing the communication meanings to personal values. Even if information is not directly
utilized in communication efforts, marketers can understand characteristics
of certain target segments (in this case youth segment) [Kahle 1986].
For even better understanding the significance of youth values, we need
also to consider some other facts as the young peoples interests in commercials or their preferences in products and product perception. They are
briefly outlined below [Churchill and Moschis 1979; Simmons and Wade
1985; 1993; Achenreiner 1997; Goldberg et al. 2003]:
gender, as reported elsewhere in literature, young girls are more market oriented than boys,
frequency of shopping, youths are selecting the most preferable types
of retail outlets, from supermarkets to store, bookstores, and the shopping mall,
product expertise, on the contrary to adults, youths are experts in
those products that are associated with food, clothing, technology or
entertainment,

344

Scale development for hedonicconsumerism values

interest in new products, youths are much more than adults oriented
and interested in testing or buying new products,
interest in commercials, youths appear to be more interested in TV
commercials than adults; it is also known that youths are likely to purchase aproduct after seeing afamous person speak on radio or on TV,
TV commercials and direct selling, youths reported that they had answered appeals in TV commercials that provided atelephone number
to purchase aproduct,
in-store promotions, it is also known that youths appear to be more
influenced by in-store promotions than older members of society;
a considerable part of youths always talk to their parents about, or
point to, signs/advertisements on shelves.

Exploratory interviews and pilot study in reference to initial pool


of items
Through areview of alarge base of relevant literature, alist of scale items was
initially identified. This list was derived from previous works of the following
authors [Rokeach 1973; Markin, Lillis and Narayana 1976; Hirschman and
Holbrook 1982; Hirschman 1983; Kahle 1983; Bloch and Richins 1983; Babin, Darden and Griffin 1994; Griffin, Babin and Modianos 2000; Cheng-Lu
Wang et al. 2000; Voss, Spangenberg and Grohmann 2003]. Then, all items
were modified (according to domestic requirements i.e. Polish culture, language, etc.). They were rewritten into short form of sentences (statements).
Additionally, the same list of items was supplemented with newly generated
ones that resulted from exploratory interviews which were conducted before
the final beginning of quantitative study.
Within the exploratory inquiries, author conducted in-depth interviews
with 10 members of university community based on the convenient sample.
At first, students were asked if they would be willing to participate in interview in all. They were pre-screened to ensure that the sample would include
respondents with differing points of view and backgrounds. Thus, we constructed asample, which consisted of young people (5 men and 5 women),
and who ranged from 19 to 24 years of age. For the examinees author provided abrief description of the goal of the interview. Next they were asked
to think about hedonism in general and then about hedonic-consumerism
relationship. Author probed the reasons and feelings in by asking extensive
follow-up questions. After that all the interviews were read thoroughly many

Research methodology

345

times, where recurring themes in the data were identified and listed. This
involved sorting themes into categories based on similar characteristics. The
goal at this point was to search for commonalities that allowed for the most
accurate representation of each hedonic-consumerism aspect.
In next stage, in the course of pilot study, there was conducted semantic
modification regarding already generated items in reference to their word
formation (especially language and stylistic improvements to obtain identical interpretation while reading by all examinees in asurvey). The items
were verified (in terms of their clarity) by randomly selected 30 experts from
across all 5 levels of years of studying at the Pozna Universities located in
Poland.
When preparing items we have focused, in particular, on declarative expressions of statements. We also tried to avoid colloquialisms and jargons,
allowing only those items with plain language form. This check point was
related to reading level of the examinees, e.g. their ability to understand all
items.
Because all items were positively-worded expressions, one did not reverse
the items in question. The strategy of reversed questions is usually applied if
there is aneed to avoid repetition and imprecise reading or skipping questions by the examinees. It is applied when we want to distract examinees
attention from routine and rhythmical mode of answering to questions. It is
also applied if we want to provide examinees adifferent possible perspective
in items perception according to analyzed problem.
Finally, having based on exploratory and pilot studies, we found the following categories of hedonic-consumerim values HCV. They were labeled
as follows: curiosity development and openness to change, self-enhancement, consumption style and entertainment and fun. For each facet of
hedonic-consumerism values, we tried to tap respective items as shown in
Table 27 which is representing theoretically-based HCV construct. As we
can observe these items were prepared mainly in context of the consumers
personal values, not only in aview of the market consumption.
Characteristics of particular items (especially their statements) were set
accordingly:
X1 I strive always for new experiences
X2 I like to earn more and spend more for consumption to enjoy myself
X3 C
 onsumption itself is an enjoyable experience in my life
X4 I want to be creative and act with imagination
X5 I explore new things and aspects of life

346

Scale development for hedonicconsumerism values

Table 27. Theoretically-based multidimensional construct of hedonic


consumerism values with respective facets
Facet
Curiosity development and openness to change

List of items
X1, X4, X5, X13

Self-enhancement

X6, X9, X10, X11, X12

Entertainment and fun

X7, X8

Consumption style

X2, X3

X6 I care more for myself than others


X7 I spend nicely time and have agood time
X8 I search for adventurous and exciting life
X9 I strive to achieve success in my life
X10 I respect and believe in those people who possess lots of money
X11 I make choices in my life for my own
X12 I like when I am praised and admired
X13 I learn constantly something that is new.
All items were accompanied by 5-point Likert scale ranging from:
1 = totally unimportant value,
2 = partially unimportant value,
3 = neither unimportant nor important value,
4 = partially important value,
5 = totally important value,
where examinees task was to simply choose and mark (on the paper and
pencil questionnaire) one answer. Such achoice was bipolar and symmetrical around a neutral point 3 = neither unimportant nor important value.
Besides, items with high numbers reflected higher intensity. Author did not
use scales where higher intensity of particular items would be coded by low
numbers such as: 1 = totally agree, and 5 = totally disagree (or respectively:
1 = totally important, and 5 = totally unimportant)4.
In sum, all the items of the HCV scale were face- and content-validated by
exploratory interviews. Apilot test helped to modify items in terms of their
clarity. Items were then checked according to their shape, location, dispersion and central tendency. The results of these analyses (based on selected
descriptive statistics) were presented in Tables 32 and 33, pp. 349 and 350.
Further application of reliability estimation procedures allowed to check
the internal consistency of the selected items and dimensionality (through
4

Author decided to apply the response choice format based on importance option, instead
of agreement.

Research methodology

347

the agency of EFA analyses) of the HCV construct. For the analysis we used
software Statistica 10, SPSS 21 and AMOS 21. We do not apply here some
other useful statistical software such as: LISREL, Systat or M Plus5.

Sample selected characteristics from data collection


In the course of empirical research, printed questionnaires had been handed out to a number of individuals (students) at Polish Universities e.g. in
Pozna University of Economics, Adam Mickiewicz University of Pozna
and Pozna University of Technology located in the area of Wielkopolska.
Sample for testing was selected on quota criteria, and it was n = 285.
Taking into account the sampling method, sample size and the lack of
full compliance of its structure with the structure of the population across
all academic levels of education in Poland, we cannot treat it as entirely representative. However, it should be noted that the whole work was based on
methodological aspects, not just descriptive. Nevertheless it would be much
more desired if examinees were as much representative as the whole population in Poland. Scale was developed on the level of university, which included observations students. Hence, aserious care must be taken by any
researcher who wants to use or adapt HCV scale on non-university samples (drawn out strictly from other populations located in Poland or some
other country)6.
Data was collected between May and June 2009. Information pertaining
to basic the sample characteristics is presented in Tables 2831, from which
we infer, that females prevailed over male examinees. Most of examinees
during studies lived in their: parents owned flat, or rented rooms and flats.
More interestingly, 1/3 of them combined working part-time activities with
studying.
5

As far as Statistica software is concerned, it is one of the most popular and commonly practiced software. Also SPSS is acomprehensive software system designed to handle all
steps in an analysis ranging from data listings, tabulations, and descriptive statistics to complex multivariate analysis. Both software differ in their graphical user interface for Windows,
menus and dialog boxes. More information pertaining to their applications in various fields
of science and practice with particular emphasis on the simple or advanced statistical solutions can be found in the following works: Statistica [Nisbet, Elder and Miner 2009; Hill and
Lewicki 2006; Luszniewicz and Saby 2008; Stanisz 2006], SPSS [Grniak 2008, 2010; Leech
et al. 2004, Leech and Barrett 2005; Landau and Everitt 2004], AMOS [Arbuckle 2007; Byrne
2010].
6
It is likely to happen that the new samples and responses will be different, perhaps because of the other factors.

348

Scale development for hedonicconsumerism values

Table 28. Sample characteristics gender


Categories

Frequency

Percent

Male

118

41.4

Female

167

58.6

Total

285

100.0

Table 29. Sample characteristics examinees parents


residence (place of living)
Categories

Frequency

Percent

Flat

114

40.0

House

171

60.0

Total

285

100.0

Table 30. Sample characteristics residence of


examinees during studying
Categories
Rented house
Dormitory
Other
Own flat
Rented flat
Rented room
Parents own flat
Total

Frequency
2
8
25
26
59
61
104
285

Percent
0.7
2.8
8.8
9.1
20.7
21.4
36.5
100.0

Table 31. Sample characteristics involvement of


examinees in additional activities except learning
Categories

Frequency

Percent

69

24.2

Only study

106

37.2

Work part-time

110

38.6

Total

285

100.0

Work full-time

349

Empirical analysis

Empirical analysis
Items and data preliminary screening
In the first row, we analyzed the specificity of thirteen items, with basic descriptive statistics. In both SPSS and Statistica software there is available
number of options such as: means, variance, frequencies or correlations
among items. Some of them were selected and discussed below.
The distributions of the collected responses are provided in Table 32 and
Figure 36. Taking into account the combined responses from category such
as: I agree and Totally agree the greatest value for examinees was observed
in items: X1 (83.5%), X4 (74.2%), X5 (80.7%), X7 (83.6%), X9 (80%) and X13
(64.2%). Other items were somewhat placed in the middle of 5-point scale
or drew closer towards the right side of the distribution (i.e. the response
Iagree).
When we analyzed descriptive statistics in reference to specific items (Table33), one might conclude at first glance that items X1, X4, X5, X7, X9, and X13
obtained the similar level of responses provided by examinees. For example,
Table 32. Analysis of 13 items according to categories and examinees responses

X1

1
1.0

2
2.1

Response categories
3
13.4

4
45.6

5
37.9

X2

3.2

16.7

43.9

28.1

8.1

X3

7.0

27.3

41.1

17.9

6.7

X4

2.0

3.7

20.1

50.2

24.0

X5

1.0

1.1

17.2

51.9

28.8

X6

5.3

21.4

45.6

23.5

4.2

X7

1.5

2.5

12.4

42.5

41.1

X8

2.2

11.2

33.3

31.9

21.4

Items

X9

0.4

1.0

18.6

52.6

27.4

X10

8.4

18.4

33.3

30.0

9.9

X11

1.8

12.6

34.0

40.4

11.2

X12

14.0

31.5

40.4

11.6

2.5

X13

1.4

3.9

30.5

48.4

15.8

Categories: 1 = totally disagree, 2 = disagree, 3 = neither disagree nor agree, 4 = agree, 5 = totally agree.

350

Scale development for hedonicconsumerism values

Table 33. Descriptive statistics 13 items

X1

Items

Mean

Median

Variance

X1

4.19

0.57

Standard
deviation
0.76

X2

3.21

0.86

0.93

X3

2.90

0.98

0.99

X4

3.96

0.53

0.73

X5

4.08

0.50

0.71

X6

3.01

0.83

0.91

X7

4.23

0.56

0.75

X8

3.59

0.79

0.89

X9

4.06

0.53

0.73

X10

3.13

0.84

0.92

X11

3.47

0.83

0.91

X12

2.57

0.90

0.95

X13

3.73

0.67

0.82

1 2.1

X2 3.2

13.4

45.6

16.7

X3

X4

2 3.7

43.9
41.1

20.1

5.3

21.4

X7

1.5 2.5

12.4

X8

2.2 11.2

X9 0.4 1
X10 8.4

50.2

24
23.5

42.5
31.9

21.4

52.6

X11 1.8 12.6

27.4

33.3

30

34

14

40.4

31.5

40.4

30.5

10
20
30
1 = totally disagree

4.2

41.1

33.3
18.4

6.7

28.8

45.6

18.6

X13 1.4 3.9

8.1
17.9

51.9

X6

28.1

27.3

X5 1 1.1 17.2

X12

37.9

48.4

9.9
11.2
11.6

2.5

15.8

40

50
60
70
80
90
100%
3 = neither disagree nor agree
2 = disagree
4 = agree
5 = totally agree

Figure 36. Graphical display of 13 items distribution by their categories and


examinees answers

Empirical analysis

351

the median of items was 4, which means that half 50% of examinees answers
lie below statement I agree and 50% above, in the data distribution. Ameasure
of the variability (associated with sample variation) ranges from 0.50 to 0.67
variance. Unlike items X2, X3, X6, X8, X10, X11 and X12 ranged in their variation
between 0.79 to amaximum level of 0.98. These variables were in fact the most
diverse. Just to remind here, azero variance indicates that all the item values
are identical and anon-zero variance is always positive. Asmall variance indicates that the data points tend to be very close to the mean of respective items
and hence to each other, while ahigh variance indicates that the data points
are very spread out from the mean and from each other.
According to homoscedasticity rule which means having the same scatter of data it does not look so promising in reference to all examined items.
However, taking into account of the same variance, homoscedasticity appears in two separate groups of items, that is, group one: X1, X4, X5, X7, X9,
X13 and group two with items: X2, X3, X6, X8, X10, X11 or X12. First group indicates the least amount of spread, and second group has the greatest spread.
Now if they both would be jointly analyzed, then they would exhibit heteroscedasticity which literally means different scatter. If we have decided to
search for only one-dimensional HCV construct (based on one factor), perhaps these items would be insufficiently represented. However here, we assumed at least two,- or more factors from which HCV scale was deliberately
developed. Hence, we can accept more or less these two bundles of items as
preliminary to be evaluated in further EFA models.

Items correlation and their adequacy for factorial model


In next stage, we tried to find out if the items were sufficiently intercorrelated
to produce solid factors. This interrelatedness was assessed from overall and
individual item perspectives, on the basis of: anti-image correlation matrix
(containing numbers opposite to partial correlation coefficients), and antiimage covariance matrix (containing ones opposite to partial covariances).
In both matrices (see Table 34) if selected model points agood fit of the factors structure, most of elements lying off the diagonal will be small. Anti-image correlation matrix provides the negative value of the partial correlation.
In both cases (partial or anti-image correlations) larger values indicate that
data matrix is perhaps not well suited to factor a nalysis.
Turning to Table 35, it appears we have decent correlation levels to justify the application of common factor analysis. If there were found only low

[352]

X1
X2
X3
X4
X5
X6
X7
X8
X9
X10
X11
X12
X13

0.08
0.720a
0.65
0.10
0.04
0.06
0.01
0.02
0.04
0.03
0.08
0.04
0.02

0.816a
0.08
0.09
0.22
0.08
0.01
0.01
0.02
0.05
0.01
0.02
0.00
0.22

X2

0.05
0.45
0.29
0.05
0.02
0.03
0.01
0.01
0.02
0.02
0.04
0.02
0.01

X1

0.79
0.05
0.05
0.16
0.06
0.01
0.01
0.01
0.03
0.01
0.02
0.00
0.16

X3

0.09
0.65
0.701a
0.04
0.00
0.06
0.01
0.02
0.03
0.06
0.10
0.13
0.13

0.05
0.29
0.44
0.02
0.00
0.03
0.00
0.01
0.02
0.03
0.05
0.07
0.07
0.22
0.10
0.04
0.824a
0.17
0.00
0.11
0.20
0.04
0.15
0.10
0.08
0.07

0.16
0.05
0.02
0.66
0.12
0.00
0.08
0.13
0.03
0.10
0.06
0.05
0.05

X4

0.08
0.04
0.00
0.17
0.825a
0.21
0.07
0.09
0.03
0.08
0.00
0.06
0.11

0.06
0.02
0.00
0.12
0.78
0.15
0.06
0.07
0.02
0.06
0.00
0.05
0.08

X5

0.01
0.06
0.06
0.00
0.21
0.866a
0.06
0.05
0.08
0.16
0.16
0.24
0.08

0.01
0.03
0.03
0.00
0.15
0.62
0.04
0.03
0.05
0.11
0.10
0.16
0.05

X6

0.01
0.01
0.01
0.11
0.07
0.06
0.571a
0.44
0.08
0.07
0.00
0.02
0.12

0.01
0.01
0.00
0.08
0.06
0.04
0.75
0.30
0.06
0.05
0.00
0.02
0.09

X7

0.02
0.02
0.02
0.20
0.09
0.05
0.44
0.712a
0.05
0.11
0.08
0.06
0.11

0.01
0.01
0.01
0.13
0.07
0.03
0.30
0.65
0.04
0.07
0.05
0.04
0.08

X8

0.05
0.04
0.03
0.04
0.03
0.08
0.08
0.05
0.828a
0.15
0.24
0.06
0.23

0.03
0.02
0.02
0.03
0.02
0.05
0.06
0.04
0.71
0.10
0.16
0.04
0.16

X9

Table 34. Anti-image matrices of 13 items


X10

0.01
0.03
0.06
0.15
0.08
0.16
0.07
0.11
0.15
0.778a
0.24
0.13
0.03

0.01
0.02
0.03
0.10
0.06
0.11
0.05
0.07
0.10
0.72
0.16
0.09
0.02

X11

0.02
0.08
0.10
0.10
0.00
0.16
0.00
0.08
0.24
0.24
0.834a
0.14
0.02

0.02
0.04
0.05
0.06
0.00
0.10
0.00
0.05
0.16
0.16
0.65
0.09
0.01

X12

0.00
0.04
0.13
0.08
0.06
0.24
0.02
0.06
0.06
0.13
0.14
0.876a
0.03

0.00
0.02
0.07
0.05
0.05
0.16
0.02
0.04
0.04
0.09
0.09
0.68
0.02

X13

0.22
0.02
0.13
0.07
0.11
0.08
0.12
0.11
0.23
0.03
0.02
0.03
0.821a

0.16
0.01
0.07
0.05
0.08
0.05
0.09
0.08
0.16
0.02
0.01
0.02
0.70

Legend: (a) Measures of Sampling Adequacy (MSA). In the main diagonal, there is required aminimum level of 0.5 in order to start agood model. In our data
they all exceeded this level.

X1
X2
X3
X4
X5
X6
X7
X8
X9
X10
X11
X12
X13

Anti-image covariance

Anti-image correlation

[353]

X1
X2
X3
X4
X5
X6
X7
X8
X9
X10
X11
X12
X13

X1

1.00
0.19
0.12
0.36
0.25
0.19
0.09
0.20
0.18
0.03
0.18
0.13
0.35

X2

0.19
1.00
0.72
0.29
0.21
0.35
0.08
0.17
0.19
0.21
0.25
0.34
0.25

X3

0.12
0.72
1.00
0.25
0.19
0.34
0.06
0.16
0.21
0.22
0.20
0.37
0.29

X4
0.36
0.29
0.25
1.00
0.33
0.22
0.24
0.41
0.13
0.04
0.22
0.22
0.28

0.25
0.21
0.19
0.33
1.00
0.32
0.03
0.23
0.16
0.00
0.16
0.13
0.30

X5
0.19
0.35
0.34
0.22
0.32
1.00
0.03
0.18
0.33
0.34
0.41
0.45
0.31

X6
0.09
0.08
0.06
0.24
0.03
0.03
1.00
0.47
0.09
0.04
0.10
0.07
0.00

X7
0.20
0.17
0.16
0.41
0.23
0.18
0.47
1.00
0.10
0.03
0.19
0.18
0.22

X8

X9
0.18
0.19
0.21
0.13
0.16
0.33
0.09
0.10
1.00
0.32
0.42
0.28
0.35

Table 35. Correlation matrix of 13 items


X10
0.03
0.21
0.22
0.04
0.00
0.34
0.04
0.03
0.32
1.00
0.39
0.33
0.10

X11
0.18
0.25
0.20
0.22
0.16
0.41
0.10
0.19
0.42
0.39
1.00
0.38
0.24

X12
0.13
0.34
0.37
0.22
0.13
0.45
0.07
0.18
0.28
0.33
0.38
1.00
0.19

X13
0.35
0.25
0.29
0.28
0.30
0.31
0.00
0.22
0.35
0.10
0.24
0.19
1.00

354

Scale development for hedonicconsumerism values

c orrelations, or all correlations were equal (denoting that no structure exists


to group variables), we might question the application of factor analysis. In
other words, if particular item had very small correlations with all others, we
might consider eliminating it in the next run. In our case, visual inspection
of the Table 35 reveals correlations greater than 0.30. Fortunately most of
the 13 items exceeded the minimum level of 0.30 that is required [Hair et al.
2010, p.103]. Obviously, the higher correlations, the better factor loadings
will be obtained.
Having examined data with KMO index (proposed by Henry Kaiser,
Kenneth Mayer and Ingram Olkin) we proved higher level of adequacy of
the selected items for EFA model. The KMO shows the ratio of item correlation to their partial correlation. The index obtained level of 0.784 for all 13
items, which indicates a good structure of correlation matrix, as the literature guidelines suggest. For example, Hair et al. [2010] accepted the following KMO levels: 0.9 excellent (meaning avery good structure correlation
matrix as an input data), 0.8 recommendable, 0.7 decent, 0.6 average,
and 0.5 indicating poor structure.
Table 36. Bartletts test of sphericity
Test components
Approximated 2
Df
Significance

Values
940.0
78
0.000

Finally, we investigated Bartletts test [1950, 1951] (Table 36). This test is
sometimes termed as test of sphericity and indicates the suitability of data
for structure detection. It simply tests the hypothesis that correlation matrix
is an identity matrix, which would indicate that observable variables are uncorrelated and therefore unsuitable for structure detection. Small value (less
than 0.05 of the significance level 0.000) indicates agood starting point for
factor analysis.

Exploration of dimensionality and number of components factors


In this section we will consider the dimensionality of data, in case, if there
were any subscales factors. With this objective in mind, we used EFA
model, studying the dimensionality of all 13 items. At this point, we actually began the explorative process of factors extraction. In defining the

Empirical analysis

355

optimum number of factors we profited from subjective criteria associated


with heuristic rules (where only those factors are retained which have adeep
sense and meaning, and which can be interpreted on the basis of theoretical assumptions of the model) and objective criteria. The subjective criteria
were as follows: scree plot, proportion of the variance explained, eigenvalue
approach and half approach. Some authors even claim that using all these
methods might be for researcher misleading, because they exclude each
other. However, for the objectiveness of different solutions and methodological approach to data analysis, we implemented all of them. At last, we
introduced test of significance in reference to 2 statistic in order to test the
adequacy of k-factor model.
On the basis of principal components analysis (PCA), we could have
extract as many factors (principal components) as they appear in original
items of the data set. In that case, the number of eigenvalues equals the number of measured items. If so, there would virtually be no need to search for
factor(s) in the model. Hence, we assumed aproportion of variance (explained by respective components7) at 61.27% level, although it is thought
that more than 80% or even 95% should be expected [Thompson 2004]. And
according to eigenvalue approach8, we left only those components that exceeded 1.0, (see the last fourth component which had 1.21 value). The fifth
component obtained 0.82 which was below 1.00 level.
Tables 3739 show basic statistics for each component before and after
varimax rotation and simultaneously before and after the components were
extracted. After rotation, values placed in column labeled as rotation sums
of squared loadings in PCA model changed (for comparison see Table 39
and 38). They differ from initial eigenvalue extraction, though their total
cumulative percentage will be the same.
Also, as observed from Table 37 (before rotation) for principal components, the initial eigenvalue and extraction sum of squared loadings are the
same. They are the same for PCA extraction, but will be different for other
7

The percentage of total variance attributable to each component is displayed in the column labeled percentage of variance. Together, the first four components account for 61.27%
of the original thirteen-item set.
8
An eigenvalue is obtained by the number of measured items. It indicates the proportion
of information which is explained in the matrix of analyzed relationships that agiven component respectively reproduces. The ratio of eigenvalues is the ratio of explanatory importance
of the components with respect to the items. If acomponent has alow eigenvalue, then it is
contributing little to the explanation of variances in the items and may be ignored as redundant with more important components.

356

Scale development for hedonicconsumerism values

factor analysis extraction methods. Usually eigenvalues after extraction are


lower than their initial counterparts. For comparison, see Table 38 with
the extraction methods such as: principal axis factoring (PAF), generalized least squares (GLS) and maximum likelihood (ML). This comparison
proves, that in PAF, there are two factors, while in GLS and ML there are
three general factors, instead of four, as it was in PCA. Notice that the initial
eigenvalues are the same as for aPAF, GLS or ML, because PAF, GLS or ML
starts with aPCA use eigenvalues greater than 1 to determine the number
factors.
When we performed varimax rotation in PCA the percentage of total
variance accounted for by first four components (61.27%) did not change,
but the percentage accounted for by each component did change. So the distances between eigenvalues (especially, between eigenvalues of component
no. 1 and 2) decreased. As compared to previous option with no rotation
(Table 37), now eigenvalues for components (Table 39) no. 2, 3 and 4 increased: from 1.69 up to 2.04 (for component no. 2); from 1.24 up 1.93 (component no. 3) and from 1.21 to 1.64 (component no. 4).
Once again, considering the eigenvalues for the multivariate space of the
original variables (Table 37 with PCA solution), we placed each of these
components in the scree plot (Figure 37). In scree plot components on the
Table 37. Total variance explained in extraction: PCA before varimax rotation
Component
1
2
3
4
5
6
7
8
9
10
11
12
13

Initial eigenvalues
total
3.82
1.69
1.24
1.21
0.82
0.74
0.65
0.60
0.55
0.50
0.47
0.43
0.26

percentage
of variance
29.40
13.00
9.58
9.28
6.34
5.69
5.02
4.64
4.23
3.86
3.63
3.30
2.04

cumulative (%)
29.40
42.41
51.98
61.27
67.60
73.29
78.31
82.95
87.18
91.04
94.67
97.96
100.00

Extraction sums of squared


loadings
percentage cumulatotal
of variance tive (%)
3.82
29.40
29.40
1.69
13.00
42.41
1.24
9.58
51.98
1.21
9.28
61.27

357

Empirical analysis

Table 38. Total variance explained in extraction: PAF, GLS and ML methods
before varimax rotation
Initial eigenvalues

Factor

total

percentage
of variance

1
2
3
4
5
6
7
8
9
10
11
12
13

3.82
1.69
1.24
1.21
0.82
0.74
0.65
0.60
0.55
0.50
0.47
0.43
0.26

29.40
13.00
9.58
9.28
6.34
5.69
5.02
4.64
4.23
3.86
3.63
3.30
2.04

1
2
3
4

3.82
1.69
1.24
1.21

29.40
13.00
9.58
9.28

1
2
3
4

3.82
1.69
1.24
1.21

29.40
13.00
9.58
9.28

cumulative (%)
PAF
29.40
42.41
51.98
61.27
67.60
73.29
78.31
82.95
87.18
91.04
94.67
97.96
100.00
GLS
29.40
42.41
51.98
61.27
ML
29.40
42.41
51.98
61.27

Extraction sums of squared


loadings
percentage cumulatotal
of variance tive (%)
3.30
1.14
0.83
0.63

25.40
8.79
6.38
4.81

25.40
34.19
40.57
45.38

2.38
1.89
1.13
0.72

18.34
14.51
8.73
5.55

18.34
32.85
41.58
47.13

2.56
1.62
1.11
0.68

19.10
12.45
8.53
5.25

19.10
32.45
40.98
46.23

Table 39. Total variance explained after varimax rotation and PCA extraction
Component
1
2
3
4

total
2.35
2.04
1.93
1.64

Rotation sums of squared loadings


percentage of
cumulative (%)
variance
18.05
18.05
15.70
33.75
14.88
48.62
12.65
61.27

358

Scale development for hedonicconsumerism values


4

Eigenvalue

0
1

10

11

12

13

Component Number

Figure 37. Scree plot for 13 items as components

shallow slope contribute little to the solution9. The last drop occurs between
the 3 and 4 component, so using the first 3 components would be an easy
choice. However, we dared to sustain amodel containing 4 components as
observed from eigenvalues in Table 37. A scree plot in this case only helped
us to select visually acut point, that is, where arelatively large interval between components occurred (for example between first and second one).
However, as Kaiser explained, in practice we should retain only those components with eigenvalues greater than or equal 1.0. This situation exactly
happens in our PCA model (where plot of eigenvalues is set above the 1.0
level). Thus, we need a four-component model.
9
The Cattells scree simply plots the components as the X-axis and the corresponding eigenvalues as the Y-axis. As one moves to the right, toward later components, the eigenvalues
drop. When the drop ceases and the curve makes an elbow toward less steep decline, Cattells
scree test says to drop all further components after the one starting the elbow. This rule is
sometimes criticized. That is, as picking the elbow can be subjective because the curve has
multiple elbows or is asmooth curve, the researcher may be tempted to set the cut-off at the
number of factors desired by his or her research agenda.

359

Empirical analysis

Table 40. Communalities by PCA extraction method


Items
X1
X2
X3
X4
X5
X6
X7
X8
X9
X10
X11
X12
X13

4 components
extracinitial
tion
1.00
0.53
1.00
0.81
1.00
0.82
1.00
0.58
1.00
0.47
1.00
0.53
1.00
0.76
1.00
0.70
1.00
0.55
1.00
0.62
1.00
0.60
1.00
0.50
1.00
0.54

5 components
extracinitial
tion
1.00
0.61
1.00
0.83
1.00
0.85
1.00
0.58
1.00
0.74
1.00
0.67
1.00
0.78
1.00
0.70
1.00
0.63
1.00
0.62
1.00
0.61
1.00
0.54
1.00
0.64

6 components
extracinitial
tion
1.00
0.84
1.00
0.83
1.00
0.87
1.00
0.65
1.00
0.81
1.00
0.67
1.00
0.80
1.00
0.72
1.00
0.74
1.00
0.62
1.00
0.62
1.00
0.64
1.00
0.71

7 components
extracinitial
tion
1.00
0.90
1.00
0.85
1.00
0.87
1.00
0.65
1.00
0.92
1.00
0.67
1.00
0.83
1.00
0.74
1.00
0.75
1.00
0.78
1.00
0.62
1.00
0.81
1.00
0.80

In order to find out, on how agiven item was accounted for, we calculated
communalities derived from PCA model. Notice that in PCA, communalities
are moderate and some of them are high (X2, X3). Although, no statistical
rules indicate exactly what is large or small [Hair et al. 2010]. Practical
considerations dictate alower level of 0.50 as bad for communalities. In contrast, higher communality values indicate that alarge amount of the variance
in variables has been extracted and that happens in X2 and X3. If communality would be very low (that is below 0.50), one should extract another component (as presented in Table 40 with 5, 6 and 7 more components)10. On
the other hand, small values indicate variables that do not fit well with factor
solution, and should be dropped from the analysis.
Additionally, in order to extract the optimum number of factors, we used
2 test. This test might guarantee that the residual correlations would have
obtained smaller values as more and more common factors would have been
increased. Thus, we would obtain smaller 2 values relative to the number
of degrees of freedom and a number of additional common factors, that
10

The other possible factor solutions (based on extraction methods such as PAF, GLS or
ML, etc.) would produce communalities (after extraction) that would be somewhat smaller
(at least on average) than the PCA, because they reproduce common variance and the PCA
reproduces total variance.

360

Scale development for hedonicconsumerism values

would increase endlessly. However, solution with more than five or six factors would lead to non-interpretable factors, which in fact would have alittle
sense and utility for general inquiry. Hence, for us, the most important and
easily interpretable four-factor model was much more useful than afive-, or
six-factor solution.
Table 41. Goodness-of-fit test by GLS method

65

Significance
0.000

No. of
factors
1

139.64

53

0.000

76.68

42

0.001

36.81

32

0.258

No.

Hypotheses

Df

H0: No common factor


H1: At least one common factor
H0: Two factors are sufficient
H1: More factors are needed
H0: Three factors are sufficient
H1: More factors are needed
H0: Four factors are sufficient
H1: More factors are needed

209.20

2
3
4

In order to verify this point of view, we conducted 2 test, where the number of extracted factors would be changed iteratively in respect to 2 test
values. With this objective in mind, we set four hypotheses11 for the presence
of common factors. The results are presented in Table 41 and as can be ob 11

In factor analysis as McDonald argued [1985, p.55]: there is agreat need to keep our
account of data as simple as possible gives us adesire to affirm the most restrictive hypothesis
that is tenable. It should preferably be reasonable and interpretable as well as statistically tenable. However, failure to reject arestrictive hypothesis usually means only that we do not have
alarge enough sample to reject it.
This author also commented that [McDonald 1985, pp. 5556]: in EFA, the user may not
have ahypothesis as to the number of factors. Indeed, to develop such ahypothesis genuinely
and just choose arbitrary number, the user would need to classify tests into sets (possibly overlapping) on substantive grounds and hypothesize the number of factors equal to the number
of sets. That is, logically we can postulate how many factors we have without postulating what
they are. In that case our hypothesis is detailed enough to permit the immediate application
of the confirmatory factor analysis. Supposing, nevertheless, that the user insists that nothing
is known about the tests that are to be factor analyzed and the data are to be used to determine how many factors to extract. If we start with arbitrary small number of common factors until the chi-square is not significant at our favorite conventional level (5%), we shall not
have much idea of the probability to be associated with the entire nested sequence of statistical
decisions. It is, however, known that the probability that we would thereby decide to fit more
than true number of factors is less than our chosen significance level.

Empirical analysis

361

served, four-factor solution is the best. Other alternatives with one-common


factor, or two or three factors, are not enough. In this case, hypothesis zero,
no. four cannot be rejected.

Comparison of extraction methods and rotations in exploratory


factors analysis EFA
In the next phase, in order to estimate factor loadings reflecting correlations
of the items with factors, we implemented various methods according to
two types of orthogonal rotation e.g. varimax and quartimax. These methods
were as follows:
principal components (PC),
unweighted least squares (ULS),
MINimum RESiduals (MINRES),
principal axis factoring (PAF),
centroid (C),
alpha factoring (AF),
maximum likelihood (ML),
image factor analysis (IFA).
In context of oblique rotations, e.g. direct oblimin [Clarkson and Jennrich
1988] we used generalized least squares (GLS) method. In both Statistica
and SPSS software, the general dialog boxes differ slightly in modules configuration (by either different extraction methods or rotational techniques).
The basic rotation techniques available in SPSS or Statistica are varimax,
quartimax, and equamax. But in contrast to SPSS, rotation techniques available in Statistica appear to be much more developed because it possesses raw
and normalized rotation solutions. However, SPSS has oblimin and promax
rotations, meaning the resulting factors can be correlated.
At first, we explored factor loadings matrix before rotation, using just two
methods: PCA and PAF. In both approaches, we searched for the least number of factors which could be accounted for by the common variance of aset
of variables.
From Table 42 we notice that raw and unrotated structure is difficult in
interpretation. The unrotated factor loadings structure may be misleading,
because their values are mixing each other. As our example shows, before
rotation, different loadings were crossed with different types of factors and it
was hard to find which factor and set of items made one consistent structure.
It also meant that one item was described by afew factors at the same time.

362

Scale development for hedonicconsumerism values

Table 42. Factor loadings structure matrix before rotation by PCA and PAF
method
Items

Principal components method


Fj
Fj
Fj
Fj

Items

Fj

Principal axis factoring


Fj
Fj
Fj

X6

0.68

0.24

0.01

0.10

X3

0.66

0.19

0.55

X2

0.65

0.09

0.40

0.46

X2

0.65

0.13

0.46

0.08

X3

0.64

0.13

0.41

0.48

X6

0.62

0.18

0.14

0.07

X11

0.61

0.25

0.37

0.17

X11

0.56

0.16

0.35

0.07

X12

0.61

0.28

0.12

0.17

X12

0.55

0.19

0.08

0.12

X13

0.57

0.12

0.22

0.40

X13

0.51

0.08

0.06

0.33

X4

0.55

0.53

0.06

0.01

X4

0.50

0.43

0.04

0.11

X9

0.55

0.26

0.26

0.34

X9

0.48

0.16

0.30

0.05

X5

0.47

0.29

0.28

0.30

X5

0.40

0.19

0.00

0.29

X1

0.45

0.34

0.14

0.40

X1

0.39

0.23

0.07

0.28

X8

0.45

0.59

0.34

0.19

X8

0.42

0.55

0.08

0.22

X10

0.42

0.58

0.32

0.03

X10

0.39

0.44

0.25

0.21

X7

0.24

0.48

0.58

0.37

X7

0.22

0.45

0.11

0.47

0.11

As anticipated, first factor explained the largest amount of variance and simultaneously corresponding to it, anumber of items (according to their factor loadings). The second, third and fourth factor was somewhat unclear,
where afew items had ahigh loading (high loading was defined as greater
than 0.50)12.
Because the structure of items in Table 42 with their respective factor
loadings is difficult in interpretation and theoretically is less meaningful,
therefore, we proceeded to rotate factor matrix to redistribute the variance
from earlier factor to the later factors. Rotation leads to asimpler and theoretically more meaningful structure of factors. And for orthogonal rotation
12

In literature, one golden rule of thumb in factors analysis indicates that loadings should
be 0.70 or higher to confirm that variables which are being identified are represented by agiven
factor. On the rationale, the 0.70 level corresponds to about half of the variance in the indicator
being explained by the factor. However 0.70 is ahigh one and in real-life data may well not meet
this criterion, which is why for exploratory purposes, there are used lower levels as 0.50 or 0.30.
With asample size of say 100 examinees, loadings of 0.30 or higher can be considered significant. With much larger samples, even smaller loadings could be considered salient, but in
language research, researchers typically take note of loadings of 0.30 or higher [Kline 2002,
pp. 5253].

363

Empirical analysis

we chose Kaisers varimax and quartimax solution. Just to remind, varimax


tends to produce multiple group factors, and quartimax tends to produce
a general factor and additionaly smaller multiple group factors [Gatnar
2003; Jennrich and Sampson 1966; Jennrich 2002].
Comparing now the varimax and quartimax rotations (where high loadings were indicated in bold entries in Tables 4350) one would rather say,
we obtained the same factors structure, regardless of the rotation technique
being used. Each item equally loaded on each separate factor and all items
were correlated with their respective factors. Minor differences in their size
appeared only occasionally, when we changed the extraction method (that
is, PCA, MINRES, ULS, C, PAF, ML, IFA). For example, as far as maximum
likelihood (ML), image factor analysis (IFA) is here concerned, each invoked
iteration as part of the extraction process. However, the results obtained
from IFA differed somewhat from other methods and their results. The IFA
is focused on minimizing the influence of factors involving particular items
not reflected in the images of the other measured items. The IFA is based
on the correlation matrix of predicted variables rather than actual variables,
where each variable is predicted from the others using multiple regression.
Having applied varimax and quartimax rotations we simply searched
for a simple structure, which by Bryant and Yarnolds definition [1995,
Table 43. Factor loadings structure matrix after varimax by PCA and ULS
method
Items
X10
X11
X9
X6
X12
X1
X13
X5
X4
X3
X2
X7
X8

Principal components method


Fj
Fj
Fj
Fj
0.75
0.74
0.69
0.58
0.56
0.09
0.25
0.04
0.01
0.16
0.15
0.08
0.06

0.18
0.17
0.28
0.31
0.05
0.69
0.68
0.67
0.55
0.13
0.16
0.10
0.27

0.15
0.05
0.04
0.32
0.41
0.01
0.13
0.15
0.24
0.88
0.87
0.01
0.09

0.06
0.16
0.01
0.01
0.13
0.12
0.04
0.04
0.47
0.04
0.07
0.86
0.78

Items
X10
X11
X6
X9
X12
X3
X2
X13
X4
X1
X5
X7
X8

Unweighted least squares


Fj
Fj
Fj
Fj
0.64
0.64
0.54
0.54
0.50
0.20
0.22
0.24
0.04
0.10
0.09
0.06
0.06

0.13
0.06
0.23
0.04
0.28
0.86
0.74
0.14
0.18
0.04
0.12
0.02
0.08

0.12
0.20
0.31
0.24
0.14
0.16
0.21
0.55
0.53
0.51
0.51
0.01
0.32

Note: Bolded values represent cut-off level of factor loadings which was set on 0.50.

0.03
0.13
0.02
0.02
0.10
0.05
0.08
0.01
0.36
0.10
0.06
0.70
0.65

364

Scale development for hedonicconsumerism values

Table 44. Factor loadings structure matrix after varimax by MINRES and PAF
method
Items
X10
X11
X6
X9
X12
X13
X2
X3
X1
X5
X8
X7
X4

Fj

MINimum RESiduals
Fj
Fj
Fj

0.64
0.64
0.54
0.54
0.50
0.24
0.22
0.20
0.10
0.09
0.06
0.06
0.04

0.12
0.20
0.31
0.24
0.14
0.55
0.21
0.16
0.51
0.51
0.32
0.01
0.53

0.13
0.06
0.23
0.04
0.28
0.14
0.74
0.86
0.04
0.12
0.08
0.02
0.18

0.03
0.13
0.02
0.02
0.10
0.01
0.08
0.05
0.10
0.06
0.65
0.70
0.36

Items
X10
X11
X6
X9
X12
X3
X2
X13
X4
X1
X5
X7
X8

Fj

Principal axis factoring


Fj
Fj
Fj

0.64
0.64
0.54
0.54
0.50
0.20
0.21
0.24
0.04
0.10
0.09
0.06
0.06

0.13
0.06
0.24
0.04
0.28
0.85
0.75
0.14
0.18
0.04
0.12
0.02
0.08

0.12
0.20
0.31
0.24
0.13
0.16
0.21
0.55
0.53
0.51
0.51
0.01
0.32

0.03
0.13
0.02
0.02
0.10
0.05
0.08
0.01
0.36
0.10
0.06
0.69
0.66

Note: Bolded values represent cut-off level of factor loadings which was set on 0.50.

Table 45. Factor loadings structure matrix after varimax by C and AF method
Items
X13
X1
X4
X5
X9
X6
X8
X2
X11
X3
X12
X7
X10

Fj
0.68
0.47
0.46
0.45
0.30
0.27
0.26
0.20
0.19
0.18
0.08
0.01
0.05

Centroid
Fj
Fj
0.20
0.09
0.04
0.10
0.51
0.54
0.06
0.21
0.65
0.20
0.53
0.05
0.62

0.12
0.05
0.20
0.14
0.02
0.26
0.09
0.73
0.04
0.84
0.30
0.01
0.12

Fj
0.04
0.17
0.42
0.08
0.03
0.07
0.71
0.10
0.15
0.04
0.12
0.62
0.08

Items
X11
X10
X9
X6
X12
X3
X2
X13
X4
X1
X5
X7
X8

Fj
0.64
0.64
0.54
0.52
0.52
0.20
0.20
0.24
0.02
0.10
0.08
0.06
0.05

Alpha factoring
Fj
Fj
0.07
0.13
0.02
0.27
0.31
0.82
0.75
0.11
0.20
0.04
0.13
0.01
0.09

0.20
0.11
0.26
0.32
0.12
0.16
0.21
0.57
0.54
0.51
0.53
0.01
0.31

Note: Bolded values represent cut-off level of factor loadings which was set on 0.50.

Fj
0.14
0.03
0.02
0.02
0.11
0.04
0.08
0.01
0.37
0.11
0.06
0.68
0.66

365

Empirical analysis

Table 46. Factor loadings structure matrix after varimax by ML and IFA method
Items
X10
X11
X6
X9
X12
X3
X2
X4
X5
X13
X1
X7
X8

Fj

Maximum likelihood
Fj
Fj
Fj

0.65
0.65
0.55
0.53
0.51
0.20
0.24
0.05
0.10
0.23
0.11
0.05
0.06

0.12
0.04
0.21
0.07
0.26
0.93
0.69
0.16
0.10
0.17
0.03
0.02
0.08

0.13
0.21
0.32
0.22
0.16
0.15
0.22
0.55
0.52
0.52
0.51
0.01
0.34

0.02
0.12
0.01
0.03
0.09
0.05
0.08
0.34
0.04
0.01
0.09
0.73
0.63

Items
X11
X10
X6
X9
X12
X3
X2
X13
X4
X1
X5
X8
X7

Fj

Image factoring analysis


Fj
Fj
Fj

0.51
0.48
0.46
0.44
0.44
0.24
0.24
0.23
0.08
0.11
0.11
0.07
0.03

0.09
0.13
0.23
0.07
0.26
0.63
0.62
0.15
0.18
0.07
0.12
0.09
0.02

0.20
0.02
0.27
0.22
0.15
0.19
0.21
0.42
0.40
0.39
0.39
0.27
0.06

0.13
0.03
0.07
0.05
0.11
0.07
0.10
0.06
0.32
0.14
0.11
0.46
0.43

Note: Bolded values represent cut-off level of factor loadings which was set on 0.50.

Table 47. Factor loadings structure matrix after quartimax by PCA and ULS
method
Items
X10
X11
X9
X6
X12
X1
X13
X5
X4
X3
X2
X7
X8

Principal components method


Fj
Fj
Fj
Fj
0.75
0.74
0.68
0.60
0.58
0.10
0.27
0.06
0.02
0.22
0.21
0.08
0.07

0.19
0.17
0.27
0.31
0.06
0.69
0.68
0.67
0.57
0.16
0.19
0.06
0.30

0.11
0.00
0.09
0.28
0.37
0.04
0.09
0.13
0.23
0.87
0.85
0.01
0.08

0.06
0.15
0.01
0.01
0.12
0.09
0.07
0.01
0.44
0.03
0.06
0.87
0.77

Items
X10
X11
X6
X9
X12
X4
X13
X1
X5
X3
X2
X7
X8

Unweighted least squares


Fj
Fj
Fj
Fj
0.65
0.65
0.56
0.54
0.53
0.06
0.25
0.11
0.11
0.28
0.28
0.06
0.08

0.12
0.20
0.32
0.24
0.15
0.57
0.55
0.52
0.52
0.21
0.25
0.04
0.37

0.08
0.01
0.17
0.02
0.23
0.14
0.08
0.00
0.07
0.83
0.70
0.01
0.05

Note: Bolded values represent cut-off level of factor loadings which was set on 0.50.

0.03
0.11
0.01
0.00
0.09
0.31
0.06
0.06
0.02
0.03
0.06
0.69
0.63

366

Scale development for hedonicconsumerism values

Table 48. Factor loadings structure matrix after quartimax by MINRES and
PAF method
Items
X10
X11
X6
X9
X12
X2
X3
X13
X1
X5
X8
X7
X4

Fj

MINimum RESiduals
Fj
Fj
Fj

0.65
0.65
0.56
0.54
0.53
0.28
0.28
0.25
0.11
0.11
0.08
0.06
0.06

0.12
0.20
0.32
0.24
0.15
0.25
0.21
0.55
0.52
0.52
0.37
0.04
0.57

0.08
0.01
0.17
0.02
0.23
0.70
0.83
0.08
0.00
0.07
0.05
0.01
0.14

0.03
0.11
0.01
0.00
0.09
0.06
0.03
0.06
0.06
0.01
0.63
0.70
0.31

Items
X10
X11
X6
X9
X12
X4
X13
X1
X5
X3
X2
X7
X8

Fj

Principal axis factoring


Fj
Fj
Fj

0.65
0.65
0.56
0.54
0.53
0.06
0.25
0.11
0.11
0.28
0.28
0.06
0.08

0.12
0.20
0.32
0.24
0.15
0.57
0.55
0.52
0.52
0.21
0.25
0.04
0.37

0.08
0.01
0.17
0.03
0.23
0.15
0.08
0.00
0.08
0.81
0.72
0.01
0.05

0.03
0.11
0.01
0.00
0.09
0.32
0.06
0.06
0.02
0.03
0.06
0.69
0.63

Note: Bolded values represent cut-off level of factor loadings which was set on 0.50.

Table 49. Factor loadings structure matrix after quartimax by C and AF method
Items
X13
X4
X1
X5
X8
X9
X6
X2
X3
X11
X12
X7
X10

Fj
0.67
0.50
0.48
0.46
0.31
0.29
0.27
0.24
0.22
0.19
0.09
0.04
0.07

Centroid
Fj
Fj
0.23
0.07
0.11
0.12
0.08
0.51
0.57
0.29
0.28
0.66
0.56
0.06
0.63

0.06
0.16
0.01
0.11
0.06
0.04
0.19
0.70
0.81
0.03
0.25
0.00
0.07

Fj
0.09
0.39
0.14
0.04
0.69
0.00
0.04
0.09
0.03
0.13
0.11
0.62
0.08

Items
X11
X10
X6
X9
X12
X4
X13
X1
X5
X3
X2
X7
X8

Fj
0.65
0.65
0.55
0.54
0.52
0.05
0.26
0.11
0.10
0.28
0.27
0.07
0.07

Alpha factoring
Fj
Fj
0.20
0.11
0.33
0.25
0.14
0.58
0.57
0.52
0.50
0.21
0.26
0.04
0.36

0.00
0.08
0.20
0.04
0.26
0.16
0.05
0.00
0.09
0.79
0.71
0.00
0.07

Note: Bolded values represent cut-off level of factor loadings which was set on 0.50.

Fj
0.11
0.03
0.01
0.00
0.09
0.33
0.06
0.07
0.02
0.02
0.05
0.68
0.63

367

Empirical analysis

Table 50. Factor loadings structure matrix after quartimax by ML and IFA
method
Items
X10
X11
X6
X9
X12
X4
X5
X13
X1
X3
X2
X7
X8

Fj

Maximum likelihood
Fj
Fj
Fj

0.66
0.65
0.57
0.53
0.53
0.07
0.11
0.25
0.12
0.28
0.30
0.06
0.08

0.14
0.21
0.32
0.21
0.17
0.59
0.53
0.52
0.52
0.20
0.26
0.05
0.39

0.07
0.03
0.14
0.01
0.20
0.12
0.06
0.12
0.01
0.89
0.65
0.02
0.05

0.02
0.10
0.02
0.01
0.07
0.29
0.00
0.05
0.05
0.03
0.05
0.73
0.60

Items
X6
X11
X12
X9
X10
X13
X4
X1
X5
X3
X2
X8
X7

Fj

Image factoring analysis


Fj
Fj
Fj

0.57
0.55
0.52
0.49
0.47
0.38
0.27
0.24
0.26
0.47
0.47
0.21
0.09

0.11
0.03
0.00
0.07
0.17
0.33
0.35
0.34
0.33
0.11
0.12
0.23
0.04

0.06
0.08
0.11
0.08
0.01
0.04
0.11
0.01
0.05
0.51
0.50
0.03
0.00

0.02
0.09
0.07
0.01
0.06
0.04
0.31
0.12
0.09
0.05
0.07
0.45
0.43

Note: Bolded values represent cut-off level of factor loadings which was set on 0.50.

pp.132133]: meets the condition in which variables load at near 1 (in absolute value) or at near 0 on an eigenvector (factor). Variables that load near
1 are clearly important in the interpretation of the factor, and variables that
load near 0 are clearly unimportant.
According to Thurstones five criteria, asimple structure means that each:
1) variable should produce at least one zero loading on some factor13,
2) factor should have at least as many zero loadings as there are factors,
3) pair of factors should have variables with significant loadings on one
and zero loadings on the other14,
4) pair of factors should have alarge proportion of zero loadings on both
factors (if there are say four or more factors total),
13

One rule of thumb for azero loading [Gorsuch 1983, p.180] is that zero loadings includes any that will fall between 0.10 and +0.10.
14
The process of identification of significant loadings should be started with the rule given,
e.g. by Hair et al. [2010, p. 119]: We start with the first variable on the first factor and moving
horizontally from left to right, looking for the highest loading for that variable on any factor.
When the highest loading is identified, it should be underlined []. Attention then focuses
on the second variable and, the process repeats. This procedure should continue for each variable until all variables have been reviewed for their highest loading on afactor.

368

Scale development for hedonicconsumerism values

5) pair of factors should have only afew complex variables [Thurstone


1947].
The question is, however now, to what extent achieving that simple structure should be important in case of our data? There is no doubt, that some
experts in factor analysis think that an abbreviated version of simple structure is important. For example, Kline [2002, p.66] said: I am in agreement
with Cattell [1978] and all serious workers in factor analysis that the attainment of simple structure is essential to factor analysis. Where this has not
been done there is little reason to take the results seriously. However, Kline
[2002, p.65] also added afew more comments and was considerably more
flexible when he said that: Thurstone proposed five criteria for deciding on
simple structure, although two of these are of overriding importance, namely
that each factor should have afew high loadings with the rest of the loadings
being zero or close to zero. Certainly, the strict Thurstonian approach is no
longer followed. To resolve the apparent contradiction in Klines views, one
needs only realize that he is no doubt referring to the less strict definition of
simple structure in both statements.
In Thurstonians classic point of view, afactor structure represents asimple to the extent that each variable loads heavily on one and only one factor.
Thats why rotation is necessary to achieve this simple structure, if it can be
achieved at all. However, in real-life data, most of factor solutions do not
result in simple structure solution. In practice, researcher may find that one
or more variables have moderate-size loadings (cross loadings) on several
factors, all of which are significant. Such abeing case, it makes the task of
factor interpretation even more difficult [Hair et al. 2010].
In our set of items (based on varimax and quartimax rotations) we clustered items according to four different factors. These factors were defined
as follows: factor curiosity development and openness to change loaded with
variables: X1, X4, X5, X13, factor self-enhancement loaded with 5 items (X6,
X9, X10, X11, X12), factor consumption style with items X2, X3, and factor entertainment and fun X7, X8. And though, factor loadings associated with
those items differed to some extent, they made up aconsistent configuration
within astructure of orthogonal factors. Thus, they composed four distinct
subscales of the hedonic-consumerism scale.
Next, after the orthogonal rotation, we analyzed these factors once again,
but this time, we used oblique rotation direct oblimin (based on generalized least squares GLS method). Direct oblimin is the standard approach
if one wants to attain anon-orthogonal (oblique) solution in which factors
are allowed to be correlated. It produces varimax-looking factors, but which

369

Empirical analysis

are oblique. This will result in higher eigenvalues but diminished interpretability of the factors)15. Also when applying direct oblimin we set parameter
(delta) to control the extent of obliqueness amongst the factors. Negative
values decrease factor correlations, 0 is the default and positive values
(which should not go over 0.80) permit additional factor correlation. For
our items, we set initially delta parameter on 0 level. This inspiration came
from Harmans [1976] work as he recommended that delta should be either
0 or possibly negative. Later on, we set different (on comparison purposes) levels of delta (see Tables 5457). The results were presented below, and
when we have examined them, what explicitly appeared, was that the factor
Table 51. Factor loadings structure matrix
after direct oblimin (at delta 0) with Kaiser
normalization by GLS method
Items
X3
X2
X11
X6
X10
X9
X12
X7
X8
X4
X13
X5
X1

F1
0.98
0.75
0.24
0.39
0.25
0.24
0.40
0.08
0.20
0.29
0.33
0.22
0.16

Generalized least squares


F2
F3
0.36
0.37
0.67
0.63
0.63
0.57
0.57
0.08
0.15
0.17
0.33
0.19
0.19

0.11
0.15
0.17
0.08
0.02
0.08
0.14
0.74
0.68
0.43
0.08
0.13
0.18

F4
0.27
0.33
0.30
0.42
0.02
0.30
0.26
0.08
0.42
0.61
0.58
0.55
0.54

Legend: F1 consumption style, F2 self-enhancement, F3 entertainment and fun, F4 curiosity and change.
Note: cut-off level of factor loadings was set on 0.50
15
In general, oblique rotation does lead to simpler structures in most cases, but it is far
more important to note that oblique rotations result in correlated factors, which are sometimes
difficult to interpret. One downside of an oblique rotation method is that if the correlations
among the factors are substantial, then it is sometimes difficult to distinguish among factors
by examining the factor loadings. In such situations, one should investigate the factor pattern
matrix, which is amatrix of the standardized coefficients for the regression of the factors on
the observed variables.

370

Scale development for hedonicconsumerism values

loadings structure across different deltas remained rather unchanged (to the
exception of delta at 0.4 level). Factors intercorrelation increased as the delta
values switched towards more positive values (Table 58).
A pattern matrix with oblique rotation in Table 52, reveals the unique
contribution of each oblique factor to account for each observed variable.
For example, this matrix shows that only factor no. 1 has an appreciable
unique contribution to reproducing X3. However, looking at the factor loadings matrix in Table 51 (which has simple correlations between variables
Table 52. Pattern matrix after direct oblimin (at
delta 0) with Kaiser normalization by GLS method
Items
X3
X2
X10
X11
X9
X6
X12
X7
X8
X5
X13
X1
X4

F1

Generalized least squares


F2
F3

0.99
0.70
0.05
0.08
0.03
0.09
0.18
0.00
0.03
0.04
0.10
0.04
0.11

0.05
0.06
0.68
0.67
0.54
0.52
0.48
0.05
0.00
0.01
0.14
0.04
0.06

0.01
0.02
0.03
0.09
0.01
0.05
0.06
0.77
0.61
0.02
0.09
0.04
0.29

F4
0.05
0.07
0.24
0.11
0.15
0.24
0.03
0.14
0.25
0.53
0.53
0.53
0.51

Note: Rotation converged in 10 iterations

Table 53. Factor correlation matrix direct


oblimin (at delta 0) with Kaiser normalization
byGLS method
Factors

1
2
3
4

1.00
0.42
0.14
0.34

1.00
0.10
0.31

1.00
0.27

Delta
level
0

1.00

Legend: F1 consumption style, F2 self-enhancement, F3 entertainment and fun, F4 curiosity and change.

Empirical analysis

371

and factors, where loadings contain both the variance between variables and
factors and the correlations among factors), it reminds us that factor 2 has
anon-zero correlation with X3 (0.36 is well above the cutoff of 0.30). Asimilar trend can be seen with variables X2, X6, X12, X13 or X4.
From direct oblimin rotation (as compared to previous orthogonal rotations), it appears that factor loadings structure remains unchanged. The
factor matrix displays some positive correlation between grouping items: X2,
X3 (represented by factor no. 1, entitled as consumption style 0.42) and items:
X6, X9, X10, X11, X12 loading on factor no. 2 self-enhancement16. Alower
correlation (0.34) appears between factor no. 1 consumption style and no. 4
curiosity development and openness to change which included items: X1, X4,
X5, X13. Moreover, factor no. 2 self-enhancement indicated a positive correlation (0.31) with factor no. 4 curiosity and change. The smallest correlations appeared between factor entertainment and fun (X7, X8) and all other
taken into account factors such as: no. 1 (0.14), no. 2 (0.10) and no. 4 (0.27).
This particular factor is also independent from all other considered items,
although its relationship is not so strong.
So when we ran afour-factor EFA model (followed by direct oblimin rotation), the resulting factors correlation matrix produced the highest correlation 0.42 and then (in descending order): 0.34, 0.31, 0.27, 0.14, 0.10.
Since first two correlations exceeded the Tabachnick and Fiddells threshold
of 0.32, the solution does not seem to be completely orthogonal for factors
such as: consumption style, self-enhancement and curiosity and change17. The
orthogonality appears to be rather strong between factor entertainment and
fun and all other factors. On the other hand, in an oblique rotation, there
16
An r = 0.42 means r = 0.18 or these factors 1 and 2 share about 18% of their variance,
and 82% of each factors variance is independent of the other.
17
Tabachnick and Fiddell [2007, p.646] argued that: perhaps the best way to decide between orthogonal and oblique rotation is to request oblique rotation [e.g. direct oblimin or
promax] with the desired number of factors [Brown 2009] and look at the correlations among
factors. If factor correlations are not driven by the data, the solution remains nearly orthogonal.
Tabachnick and Fiddell [2007] claimed also that if there is factor correlation matrix for correlations around 0.32 and above, then there is 10% (or more) overlap in variance among factors,
enough variance to warrant oblique rotation unless there are compelling reasons for orthogonal rotation. Very large values suggest overfactoring, i.e. there are more factors. However, as
Kim and Mueller [1978, p.50] put it: even the issue of whether factors are correlated or not
may not make much difference in the exploratory stages of analysis. It even can be argued that
employing amethod of orthogonal rotation (or maintaining the arbitrary imposition that the
factors remain orthogonal) may be preferred over oblique rotation, if for no other reason than
that the former is much simpler to understand and interpret.

372

Scale development for hedonicconsumerism values

Table 54. Factor loadings structure matrix after direct oblimin with Kaiser
normalization by GLS method at 0 and 0.8 delta levels
Items
X3
X2
X11
X6
X10
X9
X12
X7
X8
X4
X13
X5
X1

Generalized least squares


(atdelta 0)
F1
F2
F3
F4
0.98
0.75
0.24
0.39
0.25
0.24
0.40
0.08
0.20
0.29
0.33
0.22
0.16

0.36
0.37
0.67
0.63
0.63
0.57
0.57
0.08
0.15
0.17
0.33
0.19
0.19

0.11
0.15
0.17
0.08
0.02
0.08
0.14
0.74
0.68
0.43
0.08
0.13
0.18

0.27
0.33
0.31
0.42
0.02
0.30
0.26
0.08
0.42
0.61
0.58
0.55
0.54

Items
X3
X2
X11
X10
X6
X9
X12
X7
X8
X4
X13
X5
X1

Generalized least squares


(atdelta 0.8)
F1
F2
F3
F4
0.98
0.73
0.22
0.23
0.37
0.22
0.38
0.07
0.18
0.27
0.31
0.21
0.15

0.31
0.34
0.67
0.63
0.62
0.57
0.56
0.07
0.13
0.14
0.31
0.18
0.17

0.11
0.15
0.18
0.02
0.09
0.09
0.14
0.74
0.68
0.44
0.09
0.14
0.18

0.23
0.30
0.29
0.05
0.40
0.28
0.23
0.04
0.39
0.59
0.57
0.55
0.53

Table 55. Factor loadings structure matrix after direct oblimin with Kaiser
normalization by GLS method at 0.6 and 0.4 delta levels
Items
X3
X2
X11
X10
X6
X9
X12
X7
X8
X4
X13
X5
X1

Generalized least squares


(atdelta 0.6)
F1
F2
F3
F4
0.98
0.73
0.22
0.23
0.37
0.22
0.38
0.07
0.18
0.27
0.31
0.21
0.15

0.31
0.34
0.67
0.63
0.62
0.57
0.56
0.07
0.13
0.14
0.31
0.18
0.17

0.11
0.15
0.18
0.02
0.08
0.08
0.14
0.74
0.68
0.43
0.08
0.13
0.18

0.24
0.30
0.29
0.04
0.40
0.28
0.24
0.05
0.40
0.59
0.57
0.55
0.53

Items
X3
X2
X11
X10
X6
X9
X12
X7
X8
X4
X13
X5
X1

Generalized least squares


(atdelta 0.4)
F1
F2
F3
F4
0.98
0.73
0.22
0.23
0.37
0.23
0.38
0.07
0.18
0.28
0.31
0.21
0.15

0.32
0.35
0.67
0.63
0.62
0.57
0.56
0.08
0.13
0.15
0.32
0.18
0.18

0.11
0.14
0.17
0.02
0.08
0.08
0.14
0.74
0.68
0.43
0.08
0.13
0.18

0.25
0.31
0.29
0.04
0.40
0.29
0.24
0.06
0.40
0.60
0.58
0.55
0.53

373

Empirical analysis

Table 56. Factor loadings structure matrix after direct oblimin with Kaiser
normalization by GLS method at 0.2 and 0.2 delta levels
Items
X3
X2
X11
X10
X6
X9
X12
X7
X8
X4
X13
X5
X1

Generalized least squares


(atdelta 0.2)
F1
F2
F3
F4
0.98
0.73
0.23
0.24
0.37
0.23
0.39
0.07
0.19
0.28
0.32
0.22
0.15

0.34
0.36
0.67
0.63
0.62
0.57
0.57
0.08
0.14
0.16
0.32
0.19
0.18

0.11
0.14
0.17
0.02
0.08
0.08
0.14
0.74
0.67
0.43
0.08
0.13
0.18

Items
X3
X2
X11
X6
X10
X12
X9
X7
X8
X4
X13
X5
X1

0.26
0.32
0.30
0.03
0.41
0.29
0.25
0.07
0.41
0.60
0.58
0.55
0.54

Generalized least squares


(atdelta 0.2)
F1
F2
F3
F4
0.98
0.74
0.29
0.42
0.27
0.42
0.28
0.10
0.23
0.32
0.35
0.25
0.19

0.41
0.41
0.67
0.64
0.62
0.58
0.57
0.09
0.17
0.20
0.35
0.21
0.21

Table 57. Factor loadings structure matrix after


direct oblimin with Kaiser normalization by GLS
method at 0.4 delta level
Items
X3
X2
X11
X6
X12
X9
X10
X8
X7
X4
X13
X5
X1

Generalized least squares (at delta 0.4)


F1
F2
F3
F4
0.95
0.73
0.40
0.50
0.48
0.36
0.30
0.36
0.22
0.43
0.43
0.32
0.28

0.53
0.51
0.67
0.66
0.60
0.57
0.56
0.31
0.20
0.33
0.42
0.29
0.28

0.39
0.38
0.36
0.32
0.34
0.27
0.14
0.69
0.66
0.54
0.28
0.27
0.30

0.45
0.47
0.42
0.51
0.38
0.38
0.11
0.50
0.20
0.63
0.59
0.53
0.52

Legend: Factors did not converge on delta 0.6 and 0.8 level

0.13
0.16
0.19
0.10
0.01
0.15
0.09
0.74
0.68
0.44
0.10
0.14
0.19

0.31
0.36
0.33
0.44
0.00
0.28
0.31
0.10
0.44
0.62
0.59
0.55
0.54

374

Scale development for hedonicconsumerism values

Table 58. Factor correlation matrix direct oblimin with Kaiser


normalization by GLS method at different delta levels
Factor

1.00

0.33

1.00

0.12

0.09

1.00

0.27

0.24

0.24

1.00

0.34

1.00

0.12

0.09

1.00

0.28

0.25

0.24

1.00

0.35

1.00

0.12

0.09

1.00

0.29

0.26

0.24

1.00

0.37

1.00

0.13

0.09

1.00

0.30

0.28

0.25

1.00

0.51

1.00

0.19

0.14

1.00

0.42

0.38

0.32

1.00

0.76

1.00

0.65

0.60

1.00

0.72

0.68

0.67

Delta level
0.8

1.00
0.6

1.00
0.4

1.00
0.2

1.00
0.2

1.00
0.4

1.00

Legend: Factors did not converge on Delta 0.6 and 0.8 level.

may be demonstrated adiscriminant validity, if the correlation between factors is not so high (e.g. > 0.85 as to lead one to think the two factors overlap
conceptually) [Anderson and Gerbing 1988]. However, such a being case
cannot presented here. These correlations truly exist, but they are not that
high to question adiscriminant validity (which is more in-depth discussed
in further section of book, with CFA models).

375

Empirical analysis

Hierarchical exploratory factor analysis HEFA


The oblique factors may be sometimes difficult in understanding and interpretation18. Therefore, we apply the procedure which is based on hierarchical exploratory factor analysis HEFA, where clusters of items are
identified and then axes are rotated through those clusters. Further, correlations between oblique factors are calculated and the correlation matrix of
oblique factors is factor-analyzed in order to yield aset of orthogonal factors that divide the variability in the items due to shared or common variance (secondary factors), and variance due to the clusters of similar variables
(items) in the analysis (primary factors).
Here in this section, HEFA procedure was used in the re-examination of
all 13 items, and as it showed, there might be two secondary factors and four
primary factors (see Tables 5961). First and fourth cluster correlates with
secondary factor (SF1) at the level of 0.64 and 0.66. Second cluster correlates
with secondary factor (SF2) at 0.58. Cluster three correlates nearly on the
same level with SF1 and SF2.
Table 59. Correlations among oblique factors at PCA method and varimax
rotation
Factors
Self-enhancement
Entertainment
and fun
Curiosity and
change
Consumption
style

Selfenhancement
(PF1)

Entertainment
and fun
(PF2)

Curiosity and
change
(PF3)

Consumption
style
(PF4)

1.00
0.18

1.00

0.37

0.33

1.00

0.45

0.14

0.38

1.00

Table 60 presents correlations of group variables with secondary and primary factors at PCA method and varimax rotation. Their examination leads
to the conclusion there exist two secondary factors SF1 and SF2. However,
18

For example, we may have four factors, but in some of them items designed to measure
targeted factor(s) would be equally affected by the other factors. In other words, these items
might have some cross-loadings. An oblique rotation will likely produce correlated factors
with less-than-obvious meaning, for example, with generally many cross-loadings.

376

Scale development for hedonicconsumerism values

in terms of factor loadings and their size, they affect unequally the measured items. SF1 has mainly impact on items X2, X3, X6, X9, X10, X11, X12 and
it affects less items X5 and X13. On the other hand, secondary factor SF2
affects items such as: X1, X4, X7, X8 and also, to lesser extent, items: X5, X13.
Itappears therefore that we have two general areas of hedonic-consumerism
scale. From one side, SF1 can be described by factors self-enhancement
consumption style and their respective items (X2, X3, X6, X5, X9, X10,
X11,X12,X13). From the other side, SF2 will be covered by primary factors
such as: entertainment and fun curiosity and change (X1, X4, X5, X7, X8).
The primary factor, entitled as curiosity and change, may, however, cause
the largest problem in interpretation, because two of its items are simultaneously correlated with two secondary factors, i.e. SF1 and SF2. It simply has
(as compared to other factors) items with nearly equal cross-loadings, which
seem to appear in the items: X5, X13. Owing to the lack of consistency with
other measures, items X5 and X13 should be eliminated. They had no clear
relationship neither with SF1 nor SF2.
Table 60. Correlations of group variables with secondary and primary factors
atPCA method and varimax rotation
Factors
Secondary factor 1 (SF1)

Cluster 1
0.64

Cluster 2
0.11

Cluster 3
0.44

Cluster 4

Secondary factor 2 (SF2)

0.19

0.58

0.46

0.15

Primary factor 1 (PF1)


Primary factor 2 (PF2)
Primary factor 3 (PF3)

0.74
0.00
0.00

0.00
0.81

0.00

0.00
0.00
0.77

0.00
0.00
0.00

Primary factor 4 (PF4)

0.00

0.00

0.00

0.74

0.66

In general, when we consider pros and cons of the primary four factormodel we should approve of the following factors content:
Factor self-enhancement with items:
X6 I care more for myself than others,
X9 I strive to achieve success in my professional life,
X10 I respect and believe in those people who possess lots of money,
X11 I make choices in my life for my own,
X12 I like when I am praised and admired.
Factor entertainment and fun:
X7 I spend nicely time and have agood time,
X8 I search for adventurous and exciting life.

377

Empirical analysis

Factor curiosity and change:


X1 I strive always for new experiences,
X4, I want to be creative and act with imagination.
Factor consumption style:
X2 I like to earn more and spend more for consumption to enjoy
myself,
X3 Consumption itself is an enjoyable experience in my life.
However, when we consider HEFA analysis, (indicating the presence of two
general factors), we should combine secondary factor SF1 with two primary
factors self-enhancement and consumption style. This general factor represents akind of internal, personal motivation of people that helps them to
feel good about themselves, where astrong preference for positive self-views
over negative ones prevails. And as there is avariety of strategies that people use to enhance their sense of personal worth, there is one particularly
important, i.e. consumption. For example, people can spend more time and
money for buying products, (not only to enjoy themselves or to feel anew
experience during the consumption e.g. in the shopping mall), but also people can do that in order to increase their self-ego. People (consumers) search
for and buy product so that they can compare themselves with the other class
of people. They undertake agreat pain when they hunt for products and
Table 61. Secondary and primary factor loadings at PCA method and varimax
rotation
Items
X1
X2
X3
X4
X5
X6
X7
X8
X9
X10
X11
X12
X13

Secondary factors
SF1
SF2

PF1

0.43
0.14
0.12
0.46
0.28
0.18
0.42
0.52
0.16
0.03
0.21
0.15
0.26

0.00
0.04
0.03
0.12
0.07
0.39
0.05
0.02
0.55
0.62
0.58
0.39
0.12

0.14
0.58
0.58
0.28
0.29
0.54
0.01
0.15
0.40
0.40
0.45
0.51
0.39

Primary factors
PF2
PF3

PF4

0.57
0.02
0.01
0.39
0.55
0.16
0.19
0.12
0.16
0.25
0.04
0.08
0.54

0.10
0.67
0.69
0.13
0.04
0.14
0.03
0.02
0.18
0.02
0.10
0.24
0.01

0.01
0.02
0.01
0.32
0.05
0.06
0.74
0.62
0.05
0.06
0.09
0.07
0.13

Note: Bold values represent high factor loadings, underlined values refer to cross-loadings.

378

Scale development for hedonicconsumerism values

if they complete this task successfully, they feel abetter state of satisfaction,
increasing simultaneously their self-esteem.
The other two primary factors entertainment and fun and curiosity and
change might be explained by general factor (SF2). Reviewing this factor
meaning in context of entertainment, fun, curiosity and openness to change,
one may fairly conclude, that people with such values look for some endless
and sometimes purposeless activity that provides the best effect of pleasure, particularly in leisure activities. Although it is associated with recreation and play, it may also be encountered during work, social functions, and
even seemingly mundane activities of daily living. It is also an activity in
which one strives to sustain the present good state of pleasure. To achieve it,
aperson must be open-minded to changes and looking for new experiences
which stimulate agood feeling, that is, afinal pleasure. In context of shopping experiences and buying products, hedonic consumers will look mainly
for ongoing amusement, spontaneous fun, playful and active events which
should be renewable.
In sum, in the theoretical sense, both secondary factors resemble akind of
epicureanism trend, which commands man to believe only in the testimony
of the senses, to enjoy life and enjoy his joy. Just like by pure illusion, agood
is the pleasure in itself, where no pain, sadness and bitter life exists at all.

Scale reliability in reference to exploratory factor analysis


Now we can focus on reliability of particular sub-scales (factors) by checking their internal consistency. For this purpose one has introduced reliability
estimates such as standard alpha coefficient. We also performed an interitem correlation analysis for previous set of items which has contributed to
construction of four primary factors PF1, PF2, PF3, PF4 and two secondary
factors (SF1 and SF2).
Before the beginning of reliability estimation we assumed that items
(which were loaded on respective factors/subscales) were of homogenous
property. In the internal consistency analysis, we interposed the following
(based on EFA models) configuration of items composing respective factors.
For two secondary factors (SF1 and SF2) reliability analysis was conducted
in reference to items such as: SF1 (X2, X3, X5, X6, X9, X10, X11, X12, X13), SF2
(X1, X4, X5, X7, X8). And for primary factors, items denoted in brackets were
like: PF1 (X6, X9, X10, X11, X12), PF2 (X7, X8), PF3 (X1, X4), PF4 (X2, X3). Because SF1 and SF2 were equally loaded with items X5 and X13, these items, as

379

Empirical analysis

anticipated, were eliminated from primary factors. As observed, they do not


appear in primary factors. However still, these items do appear in secondary
factors SF1 and SF2, as they were present in hierarchical analysis.
According to Cronbachs reliability coefficients (Table 62) it appears that,
they all were high, with the exception of factor PF3. Obtained values, set in
descending order were as follows: 0.837 for factor PF4; 0.783 SF1; 0.738
PF1; 0.643 SF2; 0.618 PF2; 0.528 PF3. Secondary factors SF1 and SF2
attained generally higher coefficients than their counterparts, i.e. primary
factors. But, each of secondary factors was the result of the joint combination between two primary factors. In consequence they have increased,
while primary factors coefficients have decreased. The exception which visibly appears here refers to the primary factor PF4. Moreover from Table 62
we also infer that first general factor SF1 was much stronger as compared to
factorSF2.
Assuming the literature guidelines, we should have recommended alpha
coefficients above the level 0.70. However, in practice (as it is in our case)
it will be fair if we accept lower alphas of the factors (e.g. in the 0.600.69
range), especially they were represented only by a handful (two) of items
falling per one factor. On the other hand, too high alpha (e.g. greater than
0.90) would mean that the items were repetitious or that we had more items
in the scale.
If alpha coefficients were equal zero, there would be no true score but
hypothetically only errors in the analyzed sets of items corresponding to factors [Kelly 1958; Peter 1979]. From the data, we can infer that this problem
stays far beyond the scope of our factors values.
Presented in Table 62 standardized alphas make no difference. We might
have used them in interpretation, if items in the respective subscale would
have obtained quite different means and standard deviations.
Table 62. Reliability statistics for SF1, SF2, PF1, PF2, PF3 and PF4
Factor

Cronbachs alpha

SF1
SF2
PF1
PF2
PF3
PF4

0.783
0.643
0.738
0.618
0.528
0.837

Cronbachs
alpha based on
standardized items
0.783
0.638
0.744
0.636
0.529
0.838

Number of items
9
5
5
2
2
2

380

Scale development for hedonicconsumerism values

Table 63 produced summary item statistics and as we observe, general


factor SF1 had higher item variance (0.81) than factor SF2 (0.66). A slight
difference appears in the item means which were obtained at 3.35 level for
SF1 and at 4.01 for SF2. Almost no difference appears within inter-item correlations (0.29 for scale SF1, and 0.26 for scale SF2).
What is interesting, are the inter-item correlations. Clark and Watson
[1995] argued that inter-item correlation is a more useful index of internal consistency reliability than coefficient alpha. As they claimed, arange of
inter-item correlations depend upon the specificity of the construct being
measured. That is, if the researcher is measuring, abroad, higher-order construct, Clark and Watson [1995] suggested amean of inter-item correlation
as low as 0.15 to 0.30. In contrast, agood measure of construct with amuch
more narrow focus might require ahigher inter-correlation range of perhaps
0.40 to 0.50. So, the former case takes exactly place in our two secondary
factors where inter-item correlations were on comparable levels: 0.29 for SF1
and 0.26 for SF2. On the other hand, primary factors PF2 and PF4 generated
higher correlations. Only the factors PF1 (0.37) and PF3 (0.36) obtained values at lower limit of0.40.
After all, when we examined item-total statistics (Table 64), comparing
each item to its respective factor, that is: the secondary factors SF1 and SF2,
we were able to find items which did not fit these factors at all. For example,
we could check internal consistency of the measured factors and property
of their respective items, analyzing the extent to which items intercorrelated
with one another. If they failed to intercorrelate, then such items would not
sufficiently represent afactor. As aresult, such items would be candidates
for deletion. On the other side, items with higher intercorrelations should
be retained19.
Spector [1992, p.31] delivered two general strategies for deciding which
items to retain or delete on the basis of item-to-total correlations. He said, if
it is decided that the scale should have p items, then p items with the largest
coefficients should be chosen. Alternatively, acriterion for the coefficients
(e.g. 0.40) can be set, and all items can be used together that is, retaining
up to p items, providing they have aminimum-sized coefficients. However,
19

Statistics included in Table 64 refer to scale mean and variance (if the item were to be
deleted from the scale), correlation between the item and the scale that is composed of other
items. There is also calculated Cronbachs alpha if the item were to be deleted from the scale.
These statistics show, to what extent Cronbachs alpha reliability coefficients (for secondary
factors SF1 and SF2) would be changed by removing i-th item. However, the most important
for factors interpretation, are item-total correlations.

381

Empirical analysis

he also warned researchers against adding too much items [Spector 1992,
p.31]: there is atrade-off between the number of items and the magnitude
of the item-to-total correlations. The more items, the lower the coefficients
can be and still yield agood, internally consistent scale (factor).
Some other researchers indicated more advanced rules, on how and when
to use corrected item-to-total correlations. For example, Tian, Bearden and
Manning [2001] in the item refinement stages of their development of the
three-factor consumer need for uniqueness scale, deleted items that did not
have corrected item-to-total correlations above 0.50 for each items appropriate dimension. Obermiller and Spangenberg [1998] deleted items with
corrected item-to-total correlations below 0.50 in the development of their
Table 63. Summary item statistics for SF1, SF2, PF1, PF2, PF3 and PF4

1.52
0.67

Max /
Min
1.59
2.33

Variance
0.28
0.05

0.72
4.23
1.02

0.72
0.64
0.52

163.80
1.18
2.02

0.01
0.07
0.04

0.03
2.57
0.53

0.47
4.06
1.18

0.44
1.49
0.65

15.98
1.58
2.22

0.02
0.31
0.05

0.37
3.91
0.79

0.28
3.59
0.57

0.45
4.23
1.02

0.17
0.64
0.46

1.59
1.18
1.81

0.00
0.20
0.10

0.47
4.08
0.59

0.47
3.96
0.57

0.47
4.19
0.61

0.00
0.24
0.04

1.00
1.06
1.07

0.00
0.03
0.00

0.36
3.05
0.93

0.36
2.90
0.86

0.36
3.21
0.99

0.00
0.31
0.14

1.00
1.11
1.16

0.00
0.05
0.01

0.72

0.72

0.72

0.00

1.00

0.00

Factors Item statistics

Mean

Min

Max

Range

Item means
Item variances
Inter-item
correlations
Item means
Item variances
Inter-item
correlations
Item means
Item variances
Inter-item
correlations
Item means
Item variances
Inter-item
correlations
Item means
Item variances
Inter-item
correlations
Item means
Item variances
Inter-item
correlations

3.35
0.81

2.57
0.51

4.08
1.18

0.29
4.01
0.66

0.00
3.59
0.51

0.26
3.24
0.86

SF1

SF2

PF1

PF2

PF3

PF4

No. of
items
9

382

Scale development for hedonicconsumerism values

scale to measure consumer skepticism toward advertising. Bearden, Hardesty


and Rose [2001] used a decision rule greater than 0.35 to retain items in
their first two developmental studies. Also Netemeyer, Boles and McMurrian [1991] considered items for retention that showed initial item-to-total
correlations in the range from 0.50 to 0.80.
For the multidimensional HCV scale, items that did not have high correlations with the factor to which they were hypothesized to belong, were
candidates for deletion20. This task was performed in case of item X5, which
value was abit low for SF1 and SF2 (Table 64). Alpha would increase up to
0.78 level (SF1) and up to 0.63 (SF2) if this item would be deleted. The same
situation is repeated with item X13 (assuming it is loading only on the secTable 64. Item-total statistics for SF1 and SF2
Items

X2
X3
X5
X6
X9
X10
X11
X12
X13

Scale mean Scale variance


ifitem deleted if item deleted
26.93
27.25
26.06
27.14
26.09
27.02
26.68
27.58
26.41

Corrected
item-total
correlation
SF1

Squared
multiple
correlation

Cronbachs
alpha if item
deleted

0.53
0.53
0.28
0.60
0.46
0.42
0.51
0.52
0.38

0.54
0.55
0.17
0.38
0.28
0.25
0.33
0.31
0.13

0.75
0.75
0.78
0.74
0.76
0.76
0.76
0.75
0.78

0.43
0.52
0.42
0.51
0.31

0.26
0.29
0.24
0.33
0.15

0.52
0.53
0.51
0.53
0.63

18.91
18.52
21.68
18.55
20.48
19.12
19.16
18.86
20.45
SF2

X1
X4
X7
X8
X5

15.87
16.10
15.83
16.47
15.98

5.07
4.42
5.06
3.74
5.25

Note: Values in bold are candidates for deletion.


20

Sometimes deleting bad items tends to raise the level of alpha, but reducing the number of items tends to generally lower it too. Deleting many weak items may or may not raise
the coefficient alpha, depending upon how many items are left and how weak the deleted items
virtually were.

Empirical analysis

383

ondary factor SF1). More importantly, their squared multiple correlations


were also low, hence both items should be deleted from HCV scale.
Generally (with the exception of items X5 and X13) results (Table 64) are
consistent with those obtained from hierarchical exploratory factor analysis,
that is, items that did not load on their intended EFA factors also showed low
corrected item-to-total correlations. On the other hand, those items which
had consistently high loadings also had corrected item-to-total correlations
above the level of 0.40.
Reliability analysis if more items would be added
If for some reason, an initial level of internal consistency of the measured
factor (subscale) would be insufficient, i.e. small, then additional items need
to be written or more data need to be collected. This situation happens when
there are too few items initially prepared, or when many of them were of
poor quality or if intended construct and its underlying dimension would
be too broad or too vaguely defined. For example, suppose we had asubscale which contained four related items, but some of these items might fail
to intercorrelate, and the item analysis might have suggested retaining only
two items, or there is too few items to achieve internal consistent subscale
[Spector 1992].
When the number of additional items is required to achieve desirable level of alpha, we can use Spearman-Brown prophesy formula21. This formula
indicates the effects on alpha when we increase or decrease the number of
items:
N =

krtt
,
1 + (k 1)rtt

(7.0)

where:
N the alpha reliability coefficient of the new length,
rtt reliability of the old length,
k factor by which the scale was increased or decreased.
The question, however, rises, under what conditions will the added item
would yield astronger alpha and or atighter confidence interval versus e.g.
the overall weaker rtt yielding alower alpha. Iacobucci, Coughlan and Du 21

In other words, this formula gives us an estimate of level of coefficient alpha that will be
achieved if the number of items is increased or decreased by aspecific factor.

384

Scale development for hedonicconsumerism values

1 + k + rtt (k 1) 1
rtt k
,
= preserves
= when
+
hachek [2005] found that the critical value of c that
alpha
2(1 + k rtt )
2 2(1 + k rtt )
making this addition is:
1 + k + rtt (k 1) 1
rtt k
,
= +
2(1 + k rtt )
2 2(1 + k rtt )

1
< c < 1.
(7.1)
2
1 + k + rtt (k 1) 1
rtt k
,
=
= +
If we add abad item with ac value greater than c causes
alpha to increase.
1 + k + rtt (k 1) 1
r2(1
2
2(1
)
k
r
+

tt k + k rtt )1
tt
,
1

=
=
+
<
<
With ac value less than c alpha decreases.
c
2(1 + k rtt )
2 2(1 + k rtt )
2

c =

1
<
2

1
2

Table 65. Additional items for primary factor PF3


Reliability level, if more added
items
2 items

3 items
4 items

Alpha
0.691
0.736
0.770

Turning once again back to the analysis of the set of items (but now with
afocus on primary factor denoted as PF3 and which included only two items
with alpha at 0.528 level), we predicted hypothetically ahigher level of reliability when more items were added.
Take alook at reliability levels (Table 65) which are changing every time,
when we add 2, 3 and 4 items. With two additional items, reliability coeffi1 + k + rtt (k 1) 1
rtt k
1
= +
< c < 1
cient of PF3 factor would increase to c = 0.691 and
consequently
for
three,
1 + k + rtt (k 1) 1
rtt2(1
k + k 1 r+ k1) + rtt 2(k 2(1
1) + k1 r ) rtt k2
1
tt
tt
,
+ items c =, 0.770.
< Note,
= in
+ the
<
more items: c = 0.736 and with= four
c < 1 that
2 significantly,
2(1 + k rtt ) therefore
2(1
2 + k we
2 have
2(1 + k rtt )
2
k rincrease
rtt )should
tt )
last two cases, alpha2(1
did+ not
stopped the analysis at two additional items.

Confirmatory factory analysis CFA


When EFA and items analysis were brought to end, the next step referred to
confirmatory factor analysis (CFA), by which we will construct alternative
models of the factor structure and statistically tested their fit. Because CFA
is strongly dependent on theoretical assumptions, while EFA is grounded on
empirical approach to data analysis, the CFA models seem to be simpler than
EFA. The EFA is sensitive to criteria pertaining to factors extraction, type of
rotation, etc.22. The EFA may generate factors which underlie the observed
22

For more comments, please read [Fabrigar et al.1999].

Empirical analysis

385

variables by statistical rather than logical means. Thus, factors are discovered and often difficult to interpret since the observed variables cannot be
grouped apriori on substantive grounds but, instead, are related to all other
identified factors to varying degrees. As aresult, in EFA we do not obtain
clear scores about how many factors need to be retained for final interpretation [Mueller 1996].
In contrast, CFA is much more conservative analysis as compared to more
liberal EFA. In CFA we need simply to know apriori theoretical assumption
on the factor structure. On the basis of substantive knowledge, CFA allows
to specify aset of items that are related theoretically to aset of underlying
factors23.
Before the CFA, we specify measurement parameters, which are the characteristics of observed variables to be estimated in the model as well as the
number of factors. If CFA results indicate the acceptable data-model, the
researcher proceeds with a more in-depth assessment of validity or reliability24. However, in order to talk about the construct validity on the basis
of confirmatory analysis we need to consider two categories of definitions
(syntactic and semantic). As part of the syntactic definition and its rules we
define relationships with regard to theoretical constructs within of particular theoretical system. Semantic definitions define rules of the relationships
of theoretical constructs along with the domain of observable indicators
(items) of these constructs. Both definitions fit the model of confirmatory
factor analysis. In CFA, syntactic dimension is represented by the covariance
matrix of factors, and semantic dimension is represented by apattern matrix/
or factorial structure matrix. Thus, the factorial accuracy reflects theoretical
validity, i.e. in accordance with syntactic dimension. In other words, avalidity of measurement instrument will be demonstrated, if the theoretically-expected pattern of correlations between the factors is also confirmed in data
set. On the other hand, in terms of the semantic dimension, factorial validity
23

We need to specify patterns of each item in the data set a priori. In other words we need
to know about how each item would be loaded onto hypothesized factor(s) such that each item
would have had its unique pattern of non-zero factor loadings and zero loadings. This, virtually, is one of critical differences between CFA and EFA, because in the latter, all items possibly
will load onto all factors [Matsunaga 2010, p. 104].
24
For instance, we can estimate validity of the latent constructs indicator by the magnitude of the factors loading linking the observed and latent variable. Also unstandardized factor
loadings can be used to compare validity results across different samples, while standardized
structural coefficients can be used when we compare the validity of different observed variables based on data from the same sample [Bollen 1989b, pp. 198200].

386

Scale development for hedonicconsumerism values

of the measurement instrument is confirmed, if the theoretically-predicted


pattern of correlations between the indicators and extracted factors is also
confirmed [Konarski 2009, pp. 233234].

Structures and types of CFA models


In the process of CFA testing we used two models:
confirmatory model no. 1 based solely on the theoretical assumptions
with regard to hedonic-consumerism values,
confirmatory model no. 2 for factors based either on the theoretical
and empirical findings obtained from EFA analysis.
Structures of model 1 and 2 (with their respective indicators items) were
presented on Figure 38. All calculations were conducted in IBM Amos
graphics interface. Amos graphics follow the conventions of structural equation modeling diagrams.
As far as the estimation method was concerned we chose mainly ML.
And although, there is a number of estimators in literature, such as GLS,
ULS, ADF, the ML25 is the most standard, and most often utilized in the vast
majority published CFA studies [Fan, Thompson and Wang 1999; Thompson 2004; Kline 2010].
For each considered model (1 and 2) we scaled factors and assigned scales
to disturbances errors according to following rule. We set aunit loading
identification constraint, so the patch coefficient for the direct effect of the
measurement error the unstandardized residual patch coefficient was
fixed to equal the constant 1.0. In Figure 39 (representing model 1), this
specification can be distinguished by the number 1 that appears next to the
effect of the measurement error on the corresponding variable.
Next we imposed unit loading identification constraints for all four factors. Thus, we fixed the unstandardized coefficients (loading) for direct effect on respective indicators to equal 1.0. In Amos we can do it by finding
asingle-headed arrow leading away from each unobserved variable in the
path diagram, and by fixing the corresponding regression weight to value as
1.0 [Arbuckle 2007].
On the other hand, we might have used the option, known as unit variance
identification constraint, which fixes factor variance to 1.0 and standardizes
the factor and when afactor is scaled, all factor loadings are free parameters.
The choice between unit variance identification constraint and unit loading
25

For discussion over ML estimation method, please refer to previous chapter.

387

Empirical analysis

X1

X4

X5

13

X13

X6

X9

10

X10

11

X11

12

X12

X7

X8

11

41

1
51
13.1
21

92

62

10.2

11.2

31

12.2

41
32

73
3

42

83
43

X2

X3

24
4
34

Legend: Ellipses (circles) in grey color represent factors. Squares in grey and simultaneously in white
color represent indicators / items of the respective factor in theoretical model1.
Squares in white color represent deleted indicators / items (i.e. X5, X10 and X13) of the respective factor
in empirical EFA model 2.
Measurement error variances are the single arrowhead lines associated with indicators / items.
Values of factor loadings on the single arrowhead lines represent relationship between factor and indicators, items.
Covariances (correlations) are represented between factors, with acurved or straight line with two arrowheads.

Figure 38. Two models: CFA model 1 in reference to theoretical construct of


HCV scale and CFA model 2 derived from exploratory factor analysis EFA

388

Scale development for hedonicconsumerism values

e1
e4
e5
e13

e6
e9
e10
e11
e12

e7
e8

e2
e3

1
1
1
1

1
1
1
1
1

1
1

1
1

X1
X4
Curiosity

X5
X13

X6
X9
X10
X11

Self
1

X12

X7
X8

Entertainment

Consumption

X2
X3

Figure 39. Identification (based on unit loading identification


constraint) of CFA model

identification yields generally the same overall fit of the model. This choice
is therefore based on the relative preferences of the researcher in analyzing
factors whether in standardized versus unstandardized form. When theCFA
model is considered in asingle sample, either method is acceptable. Fixing
the variance of afactor to 1.0 to standardize it has the advantage of simplicity
[Kline 2010].

389

Empirical analysis

Now, we can examine the results of analysis pertaining to model 1 and


model 2. However, in the first row we would like to know whether theoretical
model 1 belongs to identified, under-identified or over-identified solution.
If we were completely unable to tell whether such amodel is identified, we
could try fitting the model by using software (such as Amos). This empirical
approach works quite well in practice, although there are objections to it in
principle [McDonald and Krane 1977], and it is no substitute for an apriori
(based on theory) understanding of the identification status of amodel.
By turning to Table 66 including information about the parameters summary, we will notice 49 parameters, where: 17 are fixed at achosen constant
value, 9 are unlabeled weights factor loadings; 6 covariances and 17 are variances. Unlabeled parameters that are neither fixed nor labeled are free to take
on any value. The most important are 32 free parameters to be estimated, and
which include 9 weights factor loadings, 17 variances (of four factors and
thirteen items), and finally 6 factor covariances between four factors.
Table 66. Parameters summary CFA model 1
Para
meters
Fixed
Labeled
Unlabeled
Total

Weights
17
0
9
26

Covariances
0
0
6
6

Variances

Means

Intercepts

Total

0
0
17
17

0
0
0
0

0
0
0
0

17
0
32
49

Another Table 67 presents the degrees of freedom calculated as the difference between the number of distinct sample moments and the number of
distinct parameters that have to be estimated. The number of distinct sample moments26 includes variances and covariances. And in counting up the
number of distinct parameters to be estimated, several parameters that are
constrained to be equal to each other count as asingle parameter. Parameters
Table 67. Computation of degrees of freedom CFA model 1
Characteristics
Number of distinct sample moments
Number of distinct parameters to be estimated
Degrees of freedom (9132)
26

Df
91
32
59

They are also defined as the elements in the sample covariance matrix.

390

Scale development for hedonicconsumerism values

that are fixed at aconstant value do not count at all. This is why the number
of distinct parameters to be estimated can be less than the total number of regression weights, variances, covariances, means and intercepts in the model.
When reviewing CFA model 1 (see Figure 40), we find the data points we
have to work with (i.e. how much information we have with respect to our
data). These constitute the variances and covariances of the observed varie1

X1

e4

X4

e5

X5

0.27
0.41

X13

e6

X6

e9

X9

e10

X10

e11

X11

e12

X12

e7

X7

e8

X8

Curiosity

0.26 0.51
0.31

e13

0.52
0.64

0.56

0.47
0.29

0.53

0.69

0.54

0.26 0.51
0.65
0.42
0.39

Self

0.49

0.62
0.21

0.18
1.21

0.45

0.42

0.53

Entertainment

1.10
0.18

e2

X2

e3

X3

0.75
0.69

0.87

Consumption
0.83

Figure 40. Standardized estimates for CFA model 1


2 = 116.77, degrees of freedom = 59, probability level = 0.00

Empirical analysis

391

ables, so with p variables, there are p(p + 1)/2 such elements. Because we have
13 items, thus we obtain 13(13+1)/2 = 91 data points. Prior to this identification, we determined atotal of 32 unknown parameters, and assuming 91 data
points and 32 parameters to be estimated, we have an over-identified model.
A difference between number of distinct sample moments and number of
distinct parameters to be estimated is obtained at 59 degrees of freedom27.
The problem which may appear in model 1 refers to too low number of
items indicators per one factor, especially in two factors entertainment and
fun and consumption style. Hair et al. [2010, p.698] explained that in theory there should be many items which would fully represent asingle factor.
However, these authors also argued that more items are not necessarily better. Alarge number of items requires larger samples and makes it difficult to
produce truly unidimensional factors. On the other hand, too few items that
fall per one factor require the highest level of the items adequacy to represent
factor. It appears that using three items per factor, preferably four would
be the best option.
The more specific rules, in reference to number of required items per factor, were provided by Kline [2010, pp. 137145] who claimed that: for standard CFA models that specify unidimensional measurement (that is where
every indicator loads on just one factor and there are no errors correlations),
with a single factor, we should accept at least three indicators in order to
identify model. But if standard CFA model is equal 2 or has more factors and
has 2 or more indicators per factor, the model is identified too. This sort of
heuristic judgment for single-factor models is known as the three-indicator
rule, and the second heuristic judgment for models with multiple constructs
is two-indicator rule28. In general, two indicators per factor are the minimum and this rule was fulfilled in two models: 1 and 2.
Kenny, Kashy and Bolger [1988, p.254] mentioned some other rules for
non-standard CFA models, that specify multidimensional measurement
where some items load on more than asingle factor or some errors correlate.
These rules (Table 68) define requirements that must be satisfied by each
factor (rule 1a), pair of factors (rule 1b) and indicator (rule 1c) in order to
27
In order to interpret this result correctly we need to recollect three requirements of the
model identification. Because model 1 has fewer free parameters than observations/distinct sample moments dfM > 0 therefore this model is over-identified. A just-identified model is identified
when it has the same number of free parameters as distinc sample moments, dfM = 0. And underidentified model is one for which it is not possible to uniquely estimate all of its parameters.
28
Still, CFA models which factors have only two indicators are more prone to problems
in the analysis.

392

Scale development for hedonicconsumerism values

identify measurement models with error correlations. Rule (1, 2,3) assume
that all factor covariances are free parameters and that there are multiple
indicators of every factors.
Table 68. Identification rules
Rule

Characteristics

1. For non-standard CFA models


with measurement errors

For nonstandard CFA model with measurement error correlations to


be identified, all three of the conditions listed below must hold:
For each factor, at least one of the following must hold (rule 1a):
1. There are at least three indicators whose errors are uncorrelated
with each other,
2. There are at least two indicators whose errors are uncorrelated and
either: a) the errors of both indicators are not correlated with the
error term of athird indicator for adifferent factor, or b) an equality constraint is imposed on the loadings of the two indicators
For every pair of factors, there are at least two indicators, one from
each factor, whose error terms are uncorrelated (rule 1b).
For every indicator there is at least one other indicator (not necessarily
of the same factor) with which its error terms is not correlated (rule1c).

2. For multiple
loadings of complex indicators in
nonstandard CFA
models

Factor loadings
For every complex indicator in anonstandard CFA model:
In order for the multiple factor loadings to be identified, both following must hold:
1. Each factor on which the complex indicator loads must satisfy
rule1a for aminimum number of indicators.
2. Every pair of those factors must satisfy Rule 1b that each factor has
an indicator that does not have an error correlation with acorresponding indicator on the other factor of that pair

3. For error correlations of complex indicators

Error correlations
In order for error correlations that involve complex indicators to be
identified, both of the following must hold:
1. Rule 2 is satisfied
2. For each factor on which acomplex indicator loads, there must be
at least one indicator with asingle loading that does not have an
error correlation with the complex indicator.

Source: Kenny, Kashy and Bolger 1988, p.254.

Kenny, Kashy and Bolger [1988, p. 254] also mentioned rules for indicators in non-standard models that load on two or more factors. These indicators are usually defined as complex indicators.

Empirical analysis

393

CFA Model 1 fit evaluation and rejection


Now we proceed to check the goodness-of-fit of model 1. Obtained 2 value
(denoted in Amos as CMIN) is equal 116.77 at 59 degrees of freedom was
less than p = 0.00 (see Table 70). As aresult, it indicated apoor fit of the
model to the data. The theoretical model should be rejected. Hypothesis that
model 1 is correct cannot be accepted here. Model 1 provides insufficient
fit to the data, as evidenced by 2. The other index such as RMSEA (0.06),
also did not indicate aclose fit. However, Jreskog and Srboms goodness
of fit indices provided asubstantially better fit (especially GFI = 0.94 and its
adjusted form, AGFI = 0.91). The GFI or AGIF indices are pertaining to the
general absence of any model. By the lack of model, we simply mean amodel
that assumes all elements of the variance-covariance matrix, including variances, are equal to zero. The GFI in terms of interpretation is similar to the
regression determination coefficient, which tells us about the percentage
of variability of the independent variable which is explained by the model.
Because GFI is increasing when the model takes into account an additional parameter, we used two other adjusted indices such as AGFI and PGFI
(parsimonious goodness-of-fit index). The AGFI adjusted the model for the
number of degrees of freedom. As such, this index addresses the issue of parsimony by incorporating apenalty for inclusion of additional parameters.
On the other hand, PGFI provided a more realistic evaluation of the
hypothesized model 1. This index has two interdependent pieces of information the goodness of fit of the model (as measured by GFI) and the
parsimony of the model.
Finally, RMR root mean square residual index (which represents the
square root of the average squared amount by which the sample variances
and covariances differ from their estimates, obtained under the assumption
that the model is correct), was smaller than 0.10. The smaller RMR reflects
better model. An RMR of zero indicated aperfect fit.
In Table 71 we notice statistics which provide the information about the
non-centrality parameter estimate NCP. In the 2 statistic, we focused on the
extent to which the model was tenable and could not be rejected. Now, we
can look more closely at this situation, and check if hypothesized model 1 is
incorrect. In this circumstance, the 2 statistic has anon-central 2 distribution, with anon-centrality parameter denoted as that is, afixed param2
eter with associated degrees of freedom will be denoted as (df,
). Simply put,
it is regarded as apopulation badness-of-fit which represents discrepancy
between and ().

394

Scale development for hedonicconsumerism values

The 2 is a special case of the non-central 2 distribution when = 0.


Turning now to interpretation of results in Table 71, we observe that model1
yielded anon-centrality parameter of 57.77, which represents 2 minus its
degrees of freedom (116.77 59 = 57.77). Information about confidence interval indicates, that we can be 90% sure the population value of non-centrality parameter lies between 30.86 and 92.46.
Because no statistic can universally be accepted as a perfect index of
model adequacy, a few additional measures of fit were considered. Usually in this group of indices, one may compare ahypothesized model with
the independence (baseline) model. The indices such as NFI (The BentlerBonetts normed fit index), RFI (Bollens relative fit index, PNFI parsimony
normed fit index as the result of applying the James, Mulaik and Bretts
parsimony adjustment to the NFI) are calculated in the similar way as GFI,
AGFI and PGFI. On the other hand, TLI Tucker-Lewis index is calculated similarly to RFI and IFI (incremental fit index), like NFI, but in such
amanner that statistics are less sensitive to the sample size and degrees of
freedom taken into account. The CFI Bentlers comparative fit index and
PCFI parsimonious comparative fit Index is similarly calculated as NFI and
PNFI [Bedyska and Ksiek 2012, pp.187188]29.
Table 69. Model fit summary FMIN CFA model 1 with all items
Model
Default model
Saturated model
Independence model

FMIN
0.41
0.00
3.37

F0
0.20
0.00
3.10

LO 90
0.11
0.00
2.76

HI 90
0.33
0.00
3.46

Table 70. Model fit summary CMIN CFA model 1 with all items
Model
Default model
Saturated model
Independence model

NPAR
32
91
13

CMIN
116.77
0.00
957.42

DF
59
0
78

P
0.00

CMIN/DF
1.98

0.00

12.27

Table 71. Model fit summary NCP CFA model 2 with all items
Model
Default model
Saturated model
Independence model
29

NCP
57.77
0.00
879.42

LO 90
30.86
0.00
783.40

HI 90
92.46
0.00
982.87

Bentler [1990] suggested the researchers to choose rather CFI, instead of NFI, because
NFI shows atendency to underestimate fit in small samples.

395

Empirical analysis

Reviewing the values of indices: NFI = 0.88; CFI = 0.93; TLI = 0.91;
RFI = 0.84; IFI = 0.94 in model 1 we can say that they lie nearby the upper
limit of each index. Minimum values in the TLI, NFI, RFI, IFI, and CFI,
entitle us to accept only initially the model. For example, values of CFI, TLI
and IFI are close to 0.95, but not enough close. They are below the level of
0.95, although CFI = 0.93 indicates that model fits the data well, in that sense
that the hypothesized model adequately describes the sample data. Simultaneously, IFI index (0.94) is nearly consistent with CFI index. In contrast, the
NFI, RFI are even below the level of 0.90. Here NFI = 0.88 value suggests that
the model fit is inadequate. So, if this model would truly exhibit an adequate
fit with regard to all indices (assuming NFI/CFI/TLI/RFI/IFI values greater
than 0.95) we would have aright to claim that it represents the latent variable factor structure underlying the data quite well.
Finally, turning to indices which assume the model complexity (i.e. number of estimated parameters PGFI = 0.61; PNFI = 0.66; PCFI = 0.71) we
could say, they take lower values than their unadjusted counterparts. Mulaik et al. [1989, p.439] explained that parsimony-based indices have lower
values than the threshold level generally perceived as acceptable for other
normed indices of fit. For PGFI, PNFI and PCFI indices this level is usually
equal 0.85.
Table 72. Model fit summary RMSEA CFA model 1 with all items
Model
Default model
Independence model

RMSEA
0.06
0.20

LO 90
0.04
0.19

HI 90
0.07
0.21

PCLOSE
0.17
0.00

Table 73. Model fit summary HOELTER CFA model 1 with all items
Model
Default model
Independence model

HOELTERs critical N for


asignificance level of 0.05
190
30

HOELTERs critical N for


asignificance level of 0.01
213
33

Table 74. Model fit summary RMR, GFI, AGFI, PGFI CFA model 1
with all items
Model
Default model
Saturated model
Independence model

RMR
0.04
0.00
0.19

GFI
0.94
1.00
0.55

AGFI
0.91

PGFI
0.61

0.48

0.47

396

Scale development for hedonicconsumerism values

Table 75. Model fit summary baseline comparisons CFA model 1 with all items
Model
Default model
Saturated model
Independence model

NFI
delta1
0.88
1.00
0.00

RFI
rho1
0.84
0.00

IFI
delta2
0.94
1.00
0.00

TLI
rho2
0.91
0.00

CFI
0.93
1.00
0.00

Table 76. Model fit summary parsimony-adjusted measures CFA model 1


with all items
Model
Default model
Saturated model
Independence model

PRATIO
0.76
0.00
1.00

PNFI
0.66
0.00
0.00

PCFI
0.71
0.00
0.00

Due to the fact that model 1 has to be rejected, we can undertake the following three alternative actions [Arbuckle 2007]:
firstly, we can point out that statistical hypothesis testing is apoor tool
for choosing such model30,
secondly, we can start from scratch, that is, with aconstruction of new
model in order to substitute the rejected one,
thirdly, we can try to modify the rejected model in small ways so that
it fits the data better.
The last tactic seems to be reasonable for our study. As Kline [2010, p. 240]
explained: in CFA it often goes that an initial model does not fit the data
very well. However, the respecification of CFA model is even more challenging because there are more possibilities for change. For example, the number
of factors, their relations to the indicators, and the patterns of error correlations are all candidates for modification.
Modification indices (labeled in Amos as M.I.) allow us to evaluate potential modifications which can be conducted in asingle analysis. They provide
suggestions for model that are likely to pay off in smaller 2 values. However,
as Arbuckle [2007, p.234] said, even in trying to improve amodel, researcher should not be guided exclusively by modification indices. A modification should be considered only if it makes theoretical or common sense.
30

Jreskog [1967] explained that amodel can be only an approximation at best, and fortunately, amodel can be useful without being true. In this view, any model is bound to be rejected on statistical grounds if it is tested with e.g. abig enough sample. In this case the rejection of amodel on purely statistical grounds is not necessarily acondemnation.

Empirical analysis

397

Therefore, aslavish reliance on modification indices without its limitation


amounts to sorting through avery large number of potential modifications
in search of one that provides abig improvement in fit. Such astrategy is
prone, through capitalization on chance, to produce an incorrect and absurd
model that has an acceptable 2 value. This issue is broadly discussed by
MacCallum [1986] and MacCallum, Roznowski and Necowitz [1992].
Also Byrne [2010, p.89] explained that: once ahypothesized CFA model
has been rejected, this spells the end of the confirmatory factor analytic approach, in its true sense. Although CFA procedures continue to be used in
any respecification and re-estimation of the model, these analyses are exploratory in the sense that they focus on the detection of misfitting parameters
in the originally hypothesized models. Technically speaking, modification
indices propose some kind of improvement in a model by increasing the
number of parameters in such away that the 2 statistic falls faster than its
degrees of freedom [Arbuckle 2007]. Although this option can be misused, it
has alegitimate place in exploratory studies. Changes made to the model on
the M.I. are data-driven, and as such, making changes based on M.I. moves
the researcher from the realm of confirmatory factor analysis to exploratory
analysis [Harrington 2009].
For model 1 modifications were reported in Tables 77 and 78 where largest
ones31 were obtained for the following error covariances of item X10 and factor curiosity, then between errors of items such as X5 <> X6 or X13<>X9.
If we repeat the analysis considering now the covariance between X10 (e10)
and factor curiosity, the discrepancy should fall by at least 12.26 and its estimate (see column par change32) will become smaller by approximately 0.1.
Also when we repeat analysis with the covariance between items X5 (e5) and
X6 (e6), the discrepancy will fall by at least 10.58, but its estimate will become larger by approximately 0.09. The same situation repeats with error
31

Usually, modification index is expressed by excluding the path or, for example, by excluding the covariance in the model. Modification index is also expressed as astatistical test,
which checks whether the constrainment imposed on the parameter is true in the population.
If value of index exceeds 4, then the null hypothesis should be rejected. That means, when we
release arespective parameter, the model fit will significantly be better. Modification sometimes takes the values much greater than 4 for some parameters, and that exactly happened in
model 1. According to literature sources, its not worth to take into account all the parameters,
so we need to consider only those with the highest values of the index modification [Bedyska
and Ksiek 2012, pp.192193]. In case of the model 1, we decided to accept the upper level
of this index which was equal to 10.
32
Par change is the estimated parameter change that would be obtained if this change was
made to the model.

398

Scale development for hedonicconsumerism values

covariance between items X13 (e13) and X9 (e9). However, item X10 cannot
be reasonably associated with construct curiosity, because it has completely
different theoretical meaning and relation. It would be better for it to load
on the theoretical factor self-enhancement, than curiosity. Therefore this item
will be subject of elimination than modification, what has been taken into
account in respecified model no. 2a.
The modification indices, for error covariances, will decrease 2 value,
if we add error covariance between items: X5 <> X6 (10.58 which means
10 points less in 2 from total), or items X13 <> X9 (13.61) with the largest
modification index in model 1. In general these last two modifications reduce 2 value about 1013 points from 116.77, so they will do us alittle favor
from the perspective of the model improvement.
Table 77. Modification indices error covariances CFA model 1 with all items
e10
e10
e11
e11
e11
e12
e4
e4
e4
e5
e13
e13

Error covariances
<> Curiosity
<> e8
<> Consumption
<> e3
<> e9
<> Consumption
<> Entertainment
<> e7
<> e10
<> e6
<> e9
<> e4

M.I.
12.26
4.11
4.27
5.07
4.36
4.75
5.43
4.54
8.41
10.58
13.61
5.10

Par change
0.10
0.10
0.07
0.07
0.06
0.08
0.08
0.06
0.11
0.09
0.10
0.06

Probably one of the best modifications, worth considering, lies between


item X13 I learn constantly something that is new and item X9 I strive
to achieve success in my life. These two variables might be plausibly correlated. Striving for success entails the openness for learning and experiencing.
However, as Harrington explained [2009, p.71]: it is important to note that
one can always come up with such ajustification, hence one should be cautious about stretching this logic too far.
At this point, it is worth to refer to previous results which we obtained
from the analysis of the EFA and reliability, which showed explicitly that

399

Empirical analysis

the factor loadings of two items X5 and X13 have loaded on different factors.
They had strong cross-loadings. Then, we decided to delete them from the
HCV scale, and now we should proceed with them in the same way.
We also claim that items X5, X10, X13 are disturbing yet from another point
of view (see table 78). M.I. proposes adding paths between observed variables such as: X5 < X6; X13 < X9; X10 < X8; X10 < X4; X4 < X10; or
X13 < X9, which, in fact, belong to other (contrary) factors. In accordance
to previously conducted EFA analyses, self-enhancement factor was more related with consumption style.
M.I. proposes also adding suspicious paths from factor entertainment or
curiosity to item X10, which is being part of construct self-enhancement.
The other modification indices (as can be observed from Tables7778)
are significantly below 10 points for respective items and factors, and it does
not look as though much would be gained by allowing them to be correlated.
In conclusion, items such as X5, X10 or X13 were too much complex and
we should try to omit these variables from model 1. However, in order to
Table 78. Modification indices regression weights CFA model 1 with items
X5, X10, X13 deleted
Regression weights between items
or factors
X6 < X5
X9 < X13
X10 < Entertainment
X10 < Curiosity
X10 < X8
X10 < X4
X10 < X5
X11 < X3
X12 < X3
X4 < Entertainment
X4 < X7
X4 < X8
X4 < X10
X5 < X6
X13 < X7
X13 < X9

M.I.

Par Change

11.39
9.91
8.22
12.60
8.85
15.52
5.96
4.75
4.13
4.25
5.68
4.69
8.50
4.92
5.13
14.40

0.21
0.15
0.13
0.52
0.17
0.29
0.23
0.10
0.10
0.07
0.16
0.08
0.11
0.09
0.13
0.22

[400]

X2
X3
X7
X8
X6
X9
X10
X11
X12
X1
X4
X5
X13

0.00
0.00
0.26
0.01
0.48
1.02
0.36
0.73
0.89
0.48
0.40
0.01
0.23

X2

0.00
0.02
0.03
0.68
0.51
0.05
1.41
1.46
1.38
0.1
0.23
1.05

X3

0.00
0.00
0.59
0.63
0.14
0.77
0.27
0.17
1.84
1.14
1.73

X7

0.00
0.43
0.35
2.42
0.76
0.59
0.84
1.43
0.37
0.91

X8

0.00
0.58
0.19
0.50
0.32
0.01
0.18
2.24
1.68

X6

0.00
0.76
1.11
0.86
0.62
0.92
0.25
3.13

X9

0.00
0.96
0.11
1.78
3.55
2.24
0.79

X10

0.00
0.38
0.03
0.07
0.23
0.80

X11

0.00
0.67
0.07
0.69
0.15

X12

0.00
0.47
0.17
0.99

X1

X4

0.00
0.08
1.16

Table 79. Standardized residual covariances CFA model 1 with all items

0.00
0.23

X5

0.00

X13

401

Empirical analysis

confirm this decision, we will yet analyze standardized residual covariances


which capture adiscrepancy between the restricted covariance matrix, implied by the hypothesized model, and the sample covariance matrix. We did
it between all 13 items, and then, once again, we analyzed residuals after
excluding items X5, X10 and X13 in forthcoming model 2b.
Standardized residuals are fitted residuals divided by their asymptotically
(large sample) standard errors [Jreskog and Srbom 1993]. As such, they
are analogous to z scores and are therefore the easier to interpret. In essence, they represent estimates of the number of standard deviations the observed residuals are from the zero residuals that would exist if the model fit
was perfect. From point of view of the Jreskog and Srbom [1993] values
greater than 2.58 should be considered to be large.
In Table 79 (with standardized residual covariances for CFA model 1),
we observe only two pairs of items, such as X13 X9 or X4 X10 which exceeded the suggested value of 2.58. Here, the residual value of covariance
was at the level of 3.13 between X13 X9 and at the level of (3.55) between
items X4 X10. Some other yet suspicious items with high values of residuals
that exceeded the upper limit, e.g. 2.00 (despite the fact they were below the
value of 2.58), refer to pairs of items such as: X10 X8 (2.42); X5 X6 (2.24),
X5 X10 (2.24). In fact, they were approximating to 2.58. What is more
important, in all these pairs of variables, there appear specific variables (X5,
X10, X13) which spoil model 1 the most. Therefore, if we eliminate them (see
final model 2b), we will obtain standardized residual covariances, not only
below the level of 2.58, but also residuals which will not exceed the level of
2.00 (see Table 80).
Table 80. Standardized residual covariances CFA model 2b with all items
X2
X3
X7
X8
X6
X9
X11
X12
X1
X4

X2

X3

X7

X8

X6

X9

X11

X12

X1

X4

0.00
0.00
0.22
0.00
0.29
0.94
0.73
0.59
0.42
0.15

0.00
0.41
0.08
0.59
0.36
1.33
1.26
0.48
0.26

0.00
0.00
1.27
0.17
0.18
0.40
0.64
0.37

0.00
0.01
0.56
0.45
0.12
0.41
0.08

0.00
0.24
0.24
0.16
0.89
0.25

0.00
1.64
0.68
1.41
0.80

0.00
0.30
0.94
0.13

0.00
0.06
0.09

0.00
0.00

0.00

402

Scale development for hedonicconsumerism values

CFA model 2 with further split into model 2a and 2b comparison


Although CFA model 1 was based on the well defined items (i.e. they were
initially face- and content-validated with regard to HCV concept), model
alone did not statistically fit well. Due to this fact, we decided to reject it in
favor of CFA model 2 which was constructed on empirical findings from
EFA. This model was further split into two additional models: model 2a
(with deleted items X5 and X13) and model 2b where we removed one more
item X10. As proved later, this choice was agood one. If we now re-examine
two items (X5, X13) and verify them on the basis of modification indices (Tables 8182 for model 2a) we will find out, that deleting items X5, X13 of the
scale, would have significantly improved 2 value and simultaneously overall
model fit along with other indices. In spite of fact, that modification indices
still suggest adding covariance between curiosity (10.03) and X10, as well as
adding error covariance between item X4 (e4) and X10 (e10) at 9.00 level, as
well as adding paths to X10 from quite opposite factors such as entertainment, curiosity, and paths from items X8 (9.38), X4 (15.04). However, these
Table 81. Modification indices error covariances
CFA model 2a with items X5, X13 deleted
e10
e10
e11
e11
e11
e4

Error covariances
<> Curiosity
<> e8
<> Consumption
<> e3
<> e9
<> e10

M.I.
10.03
4.61
4.92
5.14
4.91
9.00

Par change
0.12
0.10
0.08
0.07
0.06
0.11

Table 82. Modification Indices regression weights


CFA model 2a with items X5, X13 deleted
Regression weights between
items or factors
X10 < Entertainment
X10 < Curiosity
X10 < X8
X10 < X4
X11 < X3
X4
< X10

M.I.

Par change

9.78
14.69
9.38
15.04
4.82
7.43

0.21
0.41
0.17
0.28
0.10
0.10

403

Empirical analysis

these additional covariances and paths would be misleading for the HCV, so
we will ignore these hints.
By removing items X5, X13 and X10 from model 2b, we caused significant
2
value decrease from the level of p = 0.02 in model 2a to level of p = 0.36
in model 2b.
Owing to methodological description we turned back to information obtained from CFA model 2a and 2b. As noted in Tables 83 and 84, both models are over-identified. However, anumber of degrees of freedom obtained in
model 2b (df = 29) was slightly lower in comparison with model2a (df = 38).
Quite interesting are 2 values, which reached for model1: 2 = 116,77 at
df = 59, p = 0.00; for model 2a, 2 = 58,99 at df = 38, p = 0.02 level and for
model 2b (with 2 = 31.14 at df = 29 and p = 0.36 which is greater than
p = 0.05 value). In consequence, last model 2b cannot be rejected.
Before we reject model 2b in favor of the model 2a, lets take alook at
FMIN discrepancy function, which measures the discrepancy between the
observed variance-covariance matrix and theoretical variance-covariance
Table 83. Parameters summary CFA model 2
Parameters

Weights

Fixed
Labeled
Unlabeled
Total

15
0
7
22

Fixed
Labeled
Unlabeled
Total

14
0
6
20

Co
Variances
Means
variances
Items X5, X13 deleted (model 2a)
0
0
0
0
0
0
6
15
0
6
15
0
Items X5, X10, X13 deleted (model 2b)
0
0
6
6

0
0
14
14

0
0
0
0

Intercepts

Total

0
0
0
0

15
0
28
43

0
0
0
0

14
0
26
40

Table 84. Computation of degrees of freedom CFA model 2


Characteristics

Df (model 2a)
Items X5,
X13 deleted

Df (model 2b)
Items X5, X10,
X13 deleted

Number of distinct sample moments


Number of distinct parameters to be estimated
Degrees of freedom

66
28
66 28 = 38

55
26
55 26 = 29

404

Scale development for hedonicconsumerism values

matrix and which is derived from the model and estimated parameter values.
Because FMIN measure represents akind of misfit, hence the smaller value
is, the better model we obtain. In Table 85, values of FMIN (obtained after
estimation) are much lower for model 2b (0.11) than for model 2a (0.21). In
the saturated model 2b it even reached the level of 0.00. On the other hand,
values in the column CMIN 2/DF do not exceed marginal critical value
(in model 2a, CMIN 2/DF equals 1.55 and in model 2b CMIN 2/DF is
1.07), i.e. where the typical critical values are usually set at 2 or 5.
Table 85. Model fit summary FMIN CFA model 2
Model

FMIN

F0

LO 90

HI 90

Items X5, X13 deleted (model 2a)


Default model

0.21

0.07

0.01

0.16

Saturated model

0.00

0.00

0.00

0.00

Independence model

2.78

2.59

2.28

2.92

Items X5, X10, X13 deleted (model 2b)


Default model

0.11

0.01

0.00

0.07

Saturated model

0.00

0.00

0.00

0.00

Independence model

2.46

2.30

2.01

2.61

If only model 2b is indeed correct, the associated parameter estimates


should be preferred over those obtained under model 1 and also model2a.
Model2a is located somewhat in the borderline of the 2 test and its probability. It has no acceptable fit but it has a higher probability level than
model1 in all.
The CMIN 2 statistic (as already mentioned) measures how well agiven
model describes the relationships within asample, whereas often the purpose of CFA modeling is to describe the relationships in the population. Discrepancy function value between the model variance-covariance matrix and
the population variance-covariance matrix is given in Amos as F0 function.
In an ideal model, these matrices should be equal. If the confidence interval
for F0 contains 0, it can be inferred that model reproduces well the population variance-covariance matrix which means it is describing well the true
relationship between the variables (Table 86).
As we can observe from Table 85, the model 2a obtained F0 statistic of
0.07 value in confidence interval 0.010.16. In turn, in the model 2b, F0 was
calculated at the level of 0.01 with 90% confidence interval being equal to

405

Empirical analysis

Table 86. Model fit summary CMIN CFA model 2


Model

NPAR

CMIN

DF

CMIN/DF

0.02

1.55

0.00

14.37

0.36

1.07

0.00

15.50

Items X5, X13 deleted (model 2a)


Default model

28

58.99

38

Saturated model

66

0.00

11

790.36

55

Independence model

Items X5, X10, X13 deleted (model 2b)


Default model

26

31.14

29

Saturated model

55

0.00

Independence model

10

697.38

45

0.000.07. Both models contain the desired value of 0, however, in model 2b


this value appears to be much better than in the model 2a.
A drawback of CMIN statistic (namely, its dependence on sample size)
can be overcome if we take into account the N Hoelters statistic, which tells
us about desired size of the sample, assuming the adequacy of model fit and
where would be no reason to reject the null hypothesis. Model will be regarded as adequately matched, when this statistic will exceed 200. However,
in many studies, it has been proved that the threshold should be much higher [Bedyska and Ksiek 2012]. In our case, this statistic exceeded the level
of 200 (with confidence level of 95% or 99%) see Table 87.
Finally, we examined the other indices. Profiting once again from literature33 we noticed that either model 2a or model 2b fit well. For instance,
when we report a confidence interval around the RMSEA value, we may
Table 87. Model fit summary HOELTER CFA model 2
Model

HOELTERs critical N for


asignificance level of 0.05

HOELTERs critical N for


asignificance level of 0.01

Items X5, X13 deleted (model 2a)


Default model
Independence model

257

295

27

30

Items X5, X10, X13 deleted (model 2b)


Default model
Independence model
33

389

453

26

29

They are for example: RMSEA close to 0.05 or less [Browne and Cudeck 1993]; CFI close
to 0.95 or greater, and TLI close to 0.95 or greater.

406

Scale development for hedonicconsumerism values

check the closeness of fit PCLOSE (p of close fit Table 88), which defines
ap value for testing the null hypothesis that the population RMSEA is no
greater than 0.05: H0 : RMSEA 0.05. Actually PCLOSE is used for testing
the hypothesis that the RMSEA is zero, hence H0 : RMSEA = 0.05. Because
the RMSEA value 0.05 or less indicates aclose fit, so when we set this level,
PCLOSE provided us a test of close fit. Jreskog and Srbom [1993] suggested p value for this test at greater than 0.50 value.
In Table 88, RMSEA value is slightly lower in model 2b (0.02) than in
model 2a (0.04). In model 2b the upper or lower bound of the 90 confidence
interval for this statistic lies between 0.000.05 and the p value for the test of
closeness of fit equals 0.96 as compared to p value for model 2a, which equals
0.65. In consequence, the close-fit of the model 2b seems to be much better.
Zero value is visible in the lower bound of interval (Table 89).
The other good news is RMR which is derived from the fitting of the
variance-covariance matrix for hypothesized model to the variance-covariance matrix of the sample data. For model 2a RMR was very low (0.04) and
for model 2b it was even lower (0.03) (see Table 90). Besides, model 2a has
Table 88. Model fit summary RMSEA CFA model 2
Model
Default model
Independence model
Default model
Independence model

RMSEA
LO 90
HI 90
Items X5, X13 deleted (model 2a)
0.04
0.02
0.07
0.22
0.20
0.23
Items X5, X10, X13 deleted (model 2b)
0.02
0.23

0.00
0.21

0.05
0.24

PCLOSE
0.65
0.00
0.96
0.00

Table 89. Model fit summary NCP CFA model 2


Model
Default model
Saturated model
Independence model
Default model
Saturated model
Independence model

NCP
LO 90
Items X5, X13 deleted (model 2a)
20.99
4.06
0.00
0.00
735.36
648.18
Items X5, X10, X13 deleted (model 2b)
2.14
0.00
652.38

0.00
0.00
570.60

HI 90
45.86
0.00
829.98
19.85
0.00
741.60

407

Empirical analysis

Table 90. Model fit summary RMR, GFI CFA model 2


Model
Default model
Saturated model
Independence model
Default model
Saturated model
Independence model

RMR
GFI
AGFI
Items X5, X13 deleted (model 2a)
0.04
0.96
0.94
0.00
1.00
0.21
0.58
0.50
Items X5, X10, X13 deleted (model 2b)
0.03
0.00
0.20

0.98
1.00
0.60

PGFI
0.55
0.48

0.96

0.52

0.51

0.49

Table 91. Model fit summary baseline comparisons CFA model 2


Model
Default model
Saturated model
Independence model
Default model
Saturated model
Independence model

NFI
RFI
IFI
Delta1
rho1
Delta2
Items X5, X13 deleted (model 2a)

TLI
rho2

0.93
0.89
0.97
1.00
1.00
0.00
0.00
0.00
Items X5, X10, X13 deleted (model 2b)

0.96

0.96
1.00
0.00

0.93
0.00

0.99
1.00
0.00

0.00
0.99
0.00

CFI
0.97
1.00
0.00
1.00
1.00
0.00

yielded anon-centrality parameter of 20.99 (with 90% confidence interval


between 4.0692.46) and model 2b has obtained NCP of 2.14 (with confidence interval of 0.0019.85) (Table 89).
On the basis of the GFI one can say that model 2a explained 96% of
variability variance-covariance matrix. In contrast, model 2b explained the
variability at slightly higher level, i.e. 98%. However, values of both indices exceeded the threshold of acceptability. The AGFI yielded value 0.94 for
model 2a and 0.96 for model 2b, which also exceeded threshold of acceptability of the model.
Indices such as NFI, IFI and CFI ranged between 0.99 and 0.96 for model2b, and for model 2a they oscillated on the verge of 0.970.93. In other
words, they also exceeded suggested cutoff values (Table 91). Further imposition of adjustments in order to check the model complexity made RFI
index fall to the level of 0.89 (in model 2a) and 0.93 (in model 2b). However,

408

Scale development for hedonicconsumerism values

Table 92. Model fit summary parsimony-adjusted measures CFA model 2


Model
Default model
Saturated model
Independence model
Default model
Saturated model
Independence model

PRATIO
PNFI
Items X5, X13 deleted (model 2a)

PCFI

0.69
0.64
0.00
0.00
1.00
0.00
Items X5, X10, X13 deleted (model 2b)

0.67
0.00
0.00

0.64
0.00
1.00

0.62
0.00
0.00

0.64
0.00
0.00

for the second model 2b, it was still an acceptable level. Model 2a remained
on the line of adequacy. Its RFI was lower than 0.90.
Finally, with regard to parsimony adjusted indices, such as PGFI we observed aminor decrease from the level of 0.55 (in model 2a) down to 0.52
(model 2b). Also decrease was present in PNFI index, from 0.64 (model 2a)
to 0.62 (model 2b). The same situation repeated with PCFI index where its
values decreased from 0.67 in the model 2a to 0.64 in model 2b (Table 92).

CFA models (2a and 2b) parameter estimates


When constructing models 2a and 2b, we also reviewed parameter estimates
in reference to the following criteria: feasibility of the parameter estimates,
the appropriateness of the standard errors and the statistical significance
of the parameter estimates. Good model fit alone is insufficient to support
aproposed model. The researcher must also examine the individual parameter estimates. Especially, models should be considered to the extent that the
parameters are statistically significant and lie in the predicted direction or
are non-trivial. The last characteristic can be checked, for example, by using
standardized loading estimates [Hair et al. 2010].
The feasibility of parameters estimates is considered in case if any parameter estimate (from the list of other items) exhibits the incorrect sign or size
or whether it falls outside admissible range. This abnormality causes that we
obtain the bad model. It may also signal that input matrix lacks sufficient
information.
Byrne mentioned some classic examples of parameters which exhibit unreasonable estimates of correlations (greater than 1.00) or display negative

409

Empirical analysis

variances [Byrne 2010]. These problems are, however, beyond the scope of
models 2a and 2b.
On the other hand, within standard errors we check the precision with
which aparameter has been estimated (labeled in Amos as S.E.). Too small
or excessively too large values suggest often apoor model fit. For example,
if standard error approaches zero, the test statistic for its related parameter
cannot be defined. Likewise standard errors that are extremely large indicate parameters that cannot be determined [Jreskog and Srbom 1989].
Because standard errors are influenced by the units of measurement in observed and factors, as well as magnitude of the parameter estimate itself,
no definite criteria small and large have been established so far [Byrne
2010]. However, small values of standard errors suggest accurate estimation,
which is reflected in the precision with which parameter is established.
Now as we can observe factor loadings (placed in Tables 9394) estimate
the direct effects of factors on items and are interpreted as regression coefficients. For example, if we choose model 2a and its unstandardized factor
loading between item X11 and factor self-enhancement, if this factor goes up
by 1, then item X11 will go up by 0.99. In model 2b, if we choose factor loading describing relationship between item X1 and factor curiosity, when curiosity goes up by 1, then X1 goes up by 0.55. Loadings fixed to 1.0 to scale
the corresponding factor remain, in the unstandardized solution and are not
tested for statistical significance because they have no standard errors.
Table 93. Regression weights CFA model 2a
Regression weights between
items and factors
X4
< Curiosity
X1
< Curiosity
X12 < Self-enhancement
X11 < Self-enhancement
X10 < Self-enhancement
X9
< Self-enhancement
X6
< Self-enhancement
X8
< Entertainment
X7
< Entertainment
X3
< Consumption
X2
< Consumption

Estimate
1.00a
0.54
1.00a
0.99
0.95
0.64
0.98
1.00a
0.42
1.00a
0.99

S.E.

C.R.

Label

0.12

4.61

***

par_1

0.13
0.14
0.09
0.12

7.84
6.87
6.77
8.44

***
***
***
***

par_2
par_3
par_4
par_5

0.09

4.50

***

par_6

0.10

9.64

***

par_7

Legend: *** significantly different from zero at the 0.001 level; a not estimated when loading set to fixed
value (i.e. 1.0)

410

Scale development for hedonicconsumerism values

Table 94. Regression weights CFA model 2b with items X5, X10, X13 deleted
Regression weights between
items and factors
X4
< Curiosity
X1
< Curiosity
X12 < Self-enhancement
X11 < Self-enhancement
X9
< Self-enhancement
X6
< Self-enhancement
X8
< Entertainment
X7
< Entertainment
X3
< Consumption
X2
< Consumption

Estimate
1.00a
0.55
1.00a
0.94
0.61
0.96
1.00a
0.41
1.00a
0.99

S.E.

C.R.

Label

0.12

4.76

***

par_1

0.13
0.09
0.13

7.46
6.41
8.17

***
***
***

par_2
par_3
par_4

0.09

4.42

***

par_5

0.10

9.57

***

par_6

Legend: *** significantly different from zero at the 0.001 level (two-tailed); not estimated when loading
set to fixed value (i.e. 1.0)

On the other hand, standardized factor loadings are the estimated correlations between agiven item and its factor. Simultaneously squared standardized loadings are expressed as proportions of explained variance, or
R2SMC. For example, if value of standardized loading will be 0.68 between
item X6 and factor self-enhancement in model 2a (Table 95) this factor will
explain 0.682 = 0.46, or 46% of the variance of the item. Ideally, CFA model
Table 95. Standardized regression weights CFA model 2
Regression weights between
Estimate
items and factors
Items X5, X13 deleted (Model 2a)
X1 < Curiosity
X4 < Curiosity
X12 < Self-enhancement
X11 < Self-enhancement
X10 < Self-enhancement
X9 < Self-enhancement
X6 < Self-enhancement
X8 < Entertainment
X7 < Entertainment
X3 < Consumption
X2 < Consumption

0.80
0.45
0.63
0.65
0.53
0.53
0.68
0.91
0.51
0.82
0.88

Regression weights between


Estimate
items and factors
Items X5, X10, X13 deleted (Model 2b)
X1 < Curiosity
X4 < Curiosity
X12 < Self-enhancement
X11 < Self-enhancement
X9 < Self-enhancement
X6 < Self-enhancement
X8 < Entertainment
X7 < Entertainment
X3 < Consumption
X2 < Consumption

0.79
0.45
0.64
0.63
0.51
0.69
0.92
0.51
0.82
0.88

411

Empirical analysis

Table 96. Squared multiple correlations CFA model 2


Items
Estimate
Items X5, X13 deleted (Model 2a)
X2
X3
X7
X8
X6
X9
X10
X11
X12
X1
X4

0.77
0.68
0.26
0.82
0.46
0.28
0.28
0.43
0.40
0.20
0.64

Items
Estimate
Items X5, X10, X13 deleted (Model 2b)
X2
X3
X7
X8
X6
X9
X11
X12
X1
X4

0.77
0.68
0.26
0.84
0.47
0.26
0.39
0.41
0.21
0.63

should explain the majority of the variance, R2SMC > 0.50 of each item [Kline
2010]. However, in practice, this can be rare situation, what can be proved
by almost 3/4 of observed items X1, X6, X7, X9, X10, X11, X12 in model 2a (see
Table96 with squared multiple correlations SMC). Similar situation is present in model 2b.
The squared standardized factor loadings, either for model 2a or 2b, are also
shown in Figures 43 and 44, where values displayed above the observed variables represent exactly the square multiple correlations for the individual items.
Standardized factor loadings are displayed on the single arrowhead lines.
In the end, we considered statistical significance of the parameter estimates, where test statistic was based on the critical ratio (denoted in Amos
as C.R. and which represented the parameter estimate divided by its standard error)34. Test of the parameters significance provides guidelines which
particular patch, covariance, or bad indicator should be removed. The non
significant parameters, with the exception of error variances, are considered
as unimportant to the model. If one takes acloser look at the results presented in Tables 9394 (with regression weights), Table 97 (with covariances
between factors), and Table 99 (with variances for factors and measurement
errors), one will notice that, most of values were statistically significant,
however at different levels for respective items and factors.
34
This statistic operates as az-statistic in testing that the estimate is statistically different
from zero. Based, for instance, on the probability level of 0.05, test statistic needs to be greater
that 1.96 before the hypothesis (that the estimate equals 0.0) can be rejected.

412

Scale development for hedonicconsumerism values

e1

X1

e4

X4

e6

X6

e9

X9

e10

X10

e11

X11

e12

X12

e7

X7

e8

X8

0.20

0.45

Curiosity

0.54 0.60

0.46
0.28

0.37

0.68

0.53

0.28 0.53
0.65
0.43
0.40

Self

0.41

0.63
0.25

0.26
0.82

0.55

0.51

0.53

Entertainment

0.91
0.21

e2

X2

e3

X3

0.77
0.68

0.88

Consumption
0.82

(CMIN) 2 = 58.99, df = 38, probability level = 0.02

Figure 41. Standardized estimates CFA model 2a without items X5 and X13

Three stars (***) placed in column, labeled as P, mean they are significant
at level of greater than 0.001. For example, in model 2a (Table 93), for factor loading between item X11 and factor self-enhancement, the probability of
getting acritical ratio as large as 7.84 in absolute value was less than 0.001.
In other words, regression weight for self-enhancement in the prediction of
X11 was significantly different from zero at the 0.001 level (two-tailed). Also
for covariance (Table 97 model 2a) explaining relationship between factors curiosity and self-enhancement the probability of getting acritical ratio
as large as 3.953 in absolute value was less than 0.001. In other words, the

413

Empirical analysis

e1

X1

e4

X4

e6

X6

e9

X9

e11

X11

e12

X12

e7

X7

e8

X8

0.21

0.45

Curiosity

0.63 0.79

0.47
0.26

0.39
0.41

0.44

0.69

0.51
0.63

Self

0.41

0.64
0.29

0.26
0.94

0.55

0.51

0.55

Entertainment

0.92
0.21

e2

X2

e3

X3

0.77
0.68

0.88

Consumption
0.82

(CMIN) 2 = 31.14, df = 29, probability level = 0.36

Figure 42. Standardized estimates CFA model 2b without items X5, X10 and X13

covariance between curiosity and self-enhancement was significantly different from zero at the 0.001 level (two-tailed). Also as can be observed, there
appeared minor differences pertaining to covariance between self-enhancement and entertainment which was significantly different from zero at the
0.01 level (two-tailed). The probability of getting here acritical ratio as large
as 3.11 in absolute value was 0.0019. Similar situation appeared in reference
to covariance between factors entertainment and consumption, which was
significantly different from zero at the 0.01 level (two-tailed). Here, the probability of getting acritical ratio as large as 2.904 in absolute value was 0.0037.

414

Scale development for hedonicconsumerism values

Table 97. Covariances CFA model 2


EstiS.E.
C.R.
mate
Items X5, X13 deleted (model 2a)

Factor covariances

Label

Self-enhancement
Curiosity
Curiosity
Self-enhancement
Self-enhancement
Entertainment

<>
<>
<>
<>
<>
<>

Curiosity
0.14
0.03
3.95
Entertainment
0.32
0.05
6.27
Consumption
0.21
0.04
4.77
Entertainment
0.14
0.04
3.11
Consumption
0.26
0.05
5.21
Consumption
0.16
0.06
2.90
Items X5, X10, X13 deleted (model 2b)

***
***
***
0.0019a
***
0.0037a

par_2
par_3
par_4
par_5
par_6
par_7

Self-enhancement
Curiosity
Curiosity
Self-enhancement
Self-enhancement
Entertainment

<>
<>
<>
<>
<>
<>

Curiosity
Entertainment
Consumption
Entertainment
Consumption
Consumption

***
***
***
***
***
0.0036a

par_1
par_2
par_3
par_4
par_5
par_6

0.16
0.32
0.21
0.16
0.27
0.16

0.04
0.05
0.04
0.05
0.05
0.06

4.54
6.26
4.76
3.56
5.25
2.91

Legend: *** significantly different from zero at the 0.001 level (two tailed); asignificantly different from
zero at the 0.01 level (two-tailed)

The results of factor covariances will be of greater interest for us, if they
show respective relationships in standardized form, i.e. where particular
covariances will be expressed as correlations. Hence reported (in Table 98)
correlation values between all factors, make their interpretation much easier. These factor correlations at the level (0.50) (according to Kline [2010])
suggest moderate discriminant validity (neither strong nor weak). They are
present between factors such as: curiosity <> entertainment or self-enhancement <> consumption.
Table 98. Correlations CFA model 2
Factor correlations
Self-enhancement
Curiosity
Curiosity
Self-enhancement
Self-enhancement
Entertainment

<>
<>
<>
<>
<>
<>

Curiosity
Entertainment
Consumption
Entertainment
Consumption
Consumption

Items X5, X13


deleted
(model 2a)
0.37
0.55
0.41
0.25
0.53
0.21

Items X5, X10, X13


deleted
(model 2b)
0.44
0.55
0.41
0.29
0.55
0.21

415

Empirical analysis

Table 99. Variances CFA model 2


Items and factors
Self-enhancement
Curiosity
Entertainment
Consumption
e4
e1
e12
e11
e10
e9
e6
e8
e7
e3
e2
Self-enhancement
Curiosity
Entertainment
Consumption
e4
e1
e12
e11
e9
e6
e8
e7
e3
e2

Estimate
S.E.
C.R.
Items X5, X13 deleted (model 2a)
0.36
0.39
0.84
0.67
0.22
0.46
0.54
0.48
0.85
0.38
0.45
0.22
0.41
0.32
0.20

0.07
0.09
0.19
0.10
0.08
0.04
0.06
0.05
0.08
0.04
0.05
0.07
0.05
0.07
0.06

5.16
4.31
4.43
6.70
2.75
10.25
9.41
9.02
10.45
10.43
8.81
3.14
9.00
4.73
3.14

Label

***
***
***
***
0.0059a
***
***
***
***
***
***
0.0024a
***
***
0.0017a

par_14
par_15
par_16
par_17
par_18
par_19
par_20
par_21
par_22
par_23
par_24
par_25
par_26
par_27
par_28

***
***
***
***
0.0027a
***
***
***
***
***
0.0023a
***
***
0.0018a

par_13
par_14
par_15
par_16
par_17
par_18
par_19
par_20
par_21
par_22
par_23
par_24
par_25
par_26

Items X5, X10, X13 deleted (model 2b)


0.37
0.38
0.86
0.67
0.23
0.45
0.53
0.51
0.39
0.44
0.22
0.42
0.32
0.20

0.07
0.09
0.20
0.10
0.08
0.04
0.06
0.06
0.04
0.05
0.07
0.05
0.07
0.06

5.11
4.39
4.39
6.68
3.00
10.27
9.01
9.07
10.41
8.23
0.90
9.07
4.68
3.12

Legend: *** significantly different from zero at the 0.001 level (two-tailed); asignificantly different from
zero at the 0.01 level (two-tailed)

On the other hand, lower levels (i.e. correlations below 0.50) suggest
stronger discriminant validity between factors: self-enhancement <> curiosity; curiosity <> consumption; self-enhancement <> entertainment and
entertainment <> consumption.

416

Scale development for hedonicconsumerism values

Ultimate CFA model in the context of reliability and validity


Having finished the process of model fit evaluation and its parameters estimation we proceeded with the construct reliability and validity analysis. For
this objective we have selected the best solution, i.e. CFA model 2b which
met all criteria of the adequate fit. Just to remind, construct validity represents the extent to which aset of measured items corresponds to the theoretical latent construct. We will conduct validation of the hedonic-consumerism
values, through the agency of CFA model on the basis of: convergent, discriminant and nomological validity.
In convergent validity, aset of items (which is presumed to measure the
same construct) shows convergent validity if their intercorrelations are at
least moderate in magnitude. Poor convergent validity within set of items
of the same factor suggests that the model may have too few factors [Kline
2010]. Hair et al. [2010] classified afew ways to estimate the relative amount
of convergent validity among items. However, the most important refer to
size of factor loading and average variance extracted AVE. In former, afinding that items have high loadings on the predicted factors indicates convergent validity. So the size of factor loadings is one important consideration.
At aminimum, all factors should be statistically significant35. Agood rule of
thumb is that standardized loading estimates should be 0.50 or higher (ideally at the level of 0.70 or even higher).
In the latter option, AVE coefficient represents the mean variance extracted for the items loading on aconstruct. This value can be obtained with the
following formula [Hair et al. 2010, p.709]:
p

AVE =

2i
i =1

,

(7.2)

where 2i is the standardized factor loading of the i-th item.


For p items, AVE is computed as total of all squared standardized factor
loadings (squared multiple correlations) divided by the number of items. If
AVE value is of 0.50 or higher, then we have agood convergence. An AVE
less than 0.50 indicates that, on average, more error remains in the items than
variance explained by the latent factor structure imposed on the measure.
35
We assess convergent validity in order to determine whether each items estimated maximum likelihood loading on the underlying construct is significant [Anderson and Gerbing
1988].

417

Empirical analysis

Table 100. Standardized factor loadings, average variance extracted


andreliability estimates CFA model 2b
Items

curiosity

X4
X1
X12
X11
X9
X6
X8
X7
X3
X2

0.79
0.45

AVE
Reliability

42%
0.57

Factors
self-enhanceentertainment
ment

consumption

0.64
0.63
0.51
0.69
0.92
0.51
0.82
0.88
38%
0.71

55%
0.73

72%
0.84

As illustrated in Table 100, most of the factor loadings exceeded level of


0.50 with the exception of item X1. Therefore we can say that we have agood
evidence of convergent validity. However, when we calculated AVE coefficients, this conclusion might be somewhat misleading. As observed, two
factors curiosity and self-enhancement obtained values below 0.50 indicating, on average, that items which were reflected by these two factors had no
adequate convergence. In contrast, two other factors, entertainment and consumption, obtained AVE values above 0.50. Generally, factor entertainment
was explained in just 55% by two indicators. Much stronger is the factor
consumption because it exceeded 50% rule of thumb where items accounted
for 72% of variance of that factor.
Considering all these facts, we cannot explicitly claim that we have astrong
evidence of the convergent validity of the model 2b. However, we cannot also
say that this model is completely bad, especially when we consider size and
significance of some factor loadings. There is no doubt about the half of convergent validity of the measured construct, which refers to factors entertainment and consumption. On the other hand, more doubts will arise in factors
self-enhancement and curiosity. Probably too, the question that will arise in
reference to all factors (with the exception self-enhancement), is the number
of items falling per each factor.

418

Scale development for hedonicconsumerism values

When one decides to additionally examine the convergent validity, one


might consider it in the light of reliability index. In literature, there exist
anumber of alternative reliability estimates for CFA models [Bacon, Sauer
and Young 1995]. As Hair et al. [2010, p.709] commented different reliability coefficients: they do not produce dramatically different estimates, but
aslightly different construct reliability values is often used in conjunction
with SEM models.
In case of CFA model 2b, for the objective of reliability estimation, we
used one of two commonly reliability coefficients36. This coefficient represents afactor rho coefficient which is aratio of explained variance over total
variance that can be expressed in terms of CFA parameters [Raykov 1997,
2004] and which can be calculated for models with no error covariances that
involve their items:

XX =

p
i

i =1
2

p
p

i + ii

i =1
i =1

,

(7.3)

where:
ii uncorrelated errors,
the factor variance.
And for correlated errors, this coefficient will be respectively:
2

p
i

i =1
XX =
2
p
p
i +

i =1
i =1

ij

j =1

,

(7.4)

where ij is correlated errors.


36

Some other reliability coefficients can be applied when factor is measured on weighted
sum of the observed variables (assuming simultaneously that this factor is standardized and
errors are correlated (or uncorrelated). These indices are mainly used when observed variables
are congeneric, that is, when they measure one latent variable [Bacon, Sauer and Young 1995;
Raykov 1997, 2004; Kozyra 2004]. If the observed variables affected by more than one latent
variable (factor), we can use proposition suggested by Bollen [1989b].

419

Empirical analysis

As we notice from Table 100, all reliability coefficients (with the exception of curiosity factor) exceeded the level of 0.70. In general, model 2b is
not that bad nor it is very good. It is placed somewhere in the middle of the
HCV scale.
Although hedonic-consumerism values and its facets are conceptually related within amultidimensional construct, yet they are also expected to exhibit discriminant validity. In discriminant validity, aset of items presumed
to measure different factors will show intercorrelations that are not too high.
Poor discriminant validity is evidenced by high factor correlations.
In CFA model 2b, when we compared AVE estimates (Table 100) for each
factor with the squared interconstruct correlations associated with factor, we
noticed, that all AVE37 estimates were greater than the corresponding interconstruct squared correlation estimates in Table 101 (above the diagonal).
Therefore, there are no problems with discriminant validity.
Table 101. The HCV construct correlation matrix (standardized)
CFAmodel2b
Curiosity
Curiosity
Self-enhancement
Entertainment
Consumption

1.00
0.44
0.55
0.41

Self
enhancement
0.19
1.00
0.29
0.55

Entertainment

Consumption

0.30
0.08
1.00
0.21

0.17
0.30
0.04
1.00

Legend: Values below the diagonal are correlation estimates among four factors and values above the
diagonal are squared correlations.

At last, in the nomological validity, we tried to find out whether the correlations among all the extracted factors had the logical sense in the other
context. We used a nomological validity, in order to demonstrate that all
four factors were to some extent related to other items, which have not been
included in the CFA model38.
37

When we compare the AVE values for any two constructs with the square of the correlation estimate between these two constructs, we expect that AVE values should be greater
than the squared correlation estimate. The logic used here, is based on the idea that alatent
construct should explain more of the variance in its item measures that it shares with another
construct. Passing this test provides agood evidence of discriminant validity.
38
The term nomology has been derived from the Greek, meaning lawful, or in philosophy
of science terms, lawlike. It was Cronbach and Meehls view of construct validity. Nomological
validity is the degree to which aconstruct behaves as it should within asystem of related constructs (the nomological network). The elements of anomological network are [Cronbach and

420

Scale development for hedonicconsumerism values

Nomological validity was taken into account by correlating scale responses with selected additional relevant measures. These relevant measures referenced to leisure activities of young consumers, such as: 1) going
to the theater, 2) attending on additional educational courses to increase
knowledge, 3) watching TV, 4) going out with friends to the city, 5) going
to parties. These activities were evaluated on a5-point scale ranging from
1= totally disagree, to 5 = totally agree. All correlation estimates were summarized in Table 102.
Table 102. Correlations between factors and additional relevant measures
CFA model 2b
No.

Items

1
2

Go to the theater
Attend on additional educational courses
Watching TV
Go out with friends to the city
Going to parties

3
4
5

curiosity
0.17*
0.69**
0.34**
0.13*
0.08a

Factors
self-enhan entertaincement
ment
0.49**
0.09a
0.37**
0.08a
0.19*
0.17*

0.04a
0.21**
0.72**
0.55**

consumption
0.06a
0.02a
0.14*
0.23**
0.02a

Legend: * correlation is significant at the 0.05 level (two-tailed); ** correlation is significant at the 0.01
level (two-tailed); a all other correlations not significant.

On the theoretical basis we expect that young people who pay more attention on the factors curiosity and openness to change will be more inclined
towards new experiences through participation in additional educational
courses increasing the level of knowledge. Factors curiosity and openness to
change is represented through the lens of knowledge accumulation, e.g. additional courses, participation in lectures or lessons during studies at the
academic level. As so, the more knowledge ayoung person accumulates during this time, the greater are prospects for better good living with the higher
socio-economic status and finally, the greater satisfaction from life, as well as
hedonic consumption in general.
On the other hand, as we observe from Table 102, curiosity is moderately related with leisure activity such as watching tv, which in its unique
Meehl 1955]: 1) theoretical propositions, specifying linkages between constructs, 2) correspondence rules, allowing aconstruct to be measured, 3) operationalization, 4) empirical constructs
or variables that can actually be measured, 5) empirical linkages, i.e. sometimes hypotheses are
set before data collection and empirical generalization after data collection.

Empirical analysis

421

way enables young people to escape from realism. By watching movies,


documentary programs or entertainment shows, viewers safely peep into
dangerous events, feel the uncomfortable position of the other people and
experience some fun on the distance. Viewers can also project imaginary
world (based on fantasy and fictious events which are often supplied with tv
watching) and sometimes adopt it without criticism into their own internal
world of values. Therefore, we expect positive relationship between factor
curiosity and activity such as watching tv (0.34). However, this relationship
is weaker as compared to young peoples participation in educational courses
(0.69).
In contrast, activity called going to the theater (0.17) is not so clear when
we correlate it with curiosity (0.17). Another factor self-enhancement appears
to be more important for youth (0.49) as it may increase personal ego. Usually, when someone in the youth group decides go to the theater, then it
looks like, as if young person would like to break the rules, commonly established by the group. Going to cinema, visiting pubs, meeting with friends
at shopping mall etc. are the most popular leisure activities in youth, not
the theater. So when young people participate in such cultural events, they
probably want to manifest akind of extraordinariness among their friends.
Hence, this activity reflects purposely snobbish behavior.
Curiosity or self-enhancement factors are rather not related with leisure
activities such as going to parties or watching tv. There are no positive relationships. Moreover activity, going to parties, is not related with consumption at all (0.02). On the other hand, activity going out with friends to the
city which is defined by younger people in broader context of various leisure
activities, and which is ranging from regular meetings at the restaurants
pubs, or simply activities such as dining out, means that youth may have
agood fun while eating, drinking and talking with friends. Thus, the activity going out with friends to the city is more associated with entertainment
(0.72) and then with the factor consumption (0.23). In fact, as proved, going
out with friends to the city, has indirect impact on consumption, because
if one goes to the city and meets with friends at the restaurant, one does
two things simultaneously at the same time. For instance, we may look for
entertainment / fun and consumption of products. In contrast, an activity
going to parties is solely based on the searches aimed at collecting sensory
experiences that are evoked (in case entertainment) mainly by sounds, lights,
pleasant scent etc.
In the end, when we compare all correlations between five variables (describing leisure activities) and four factors, we will notice that the correla-

422

Scale development for hedonicconsumerism values

tions are consistent with the theoretical expectations as described in general.


Therefore, the analysis of the correlations supported the nomological validity of the model.

Hedonic-consumerism values (HCV) ultimate scale and its


implications for marketing
Considering now, pros and cons of the CFA model 2b, we should confirm its
ultimate validity in reference to scale describing hedonic-consumerism values HCV. This four-factor multidimensional construct has the following
underlying structure, where factor self-enhancement was loaded with items:
X6: I care more for myself than others, X9: I strive to achieve success in my
professional life, X11: I make choices in my life for my own, X12: Ilike when
I am praised and admired. Next factor entertainment and fun has covered
observed variables such as: X7: I spend nicely time and have agood time, X8:
I search for adventurous and exciting life.
The last two factors in CFA model 2b covered items as: X1: I strive always
for new experiences, X4: I want to be creative and act with imagination (with
regards to factor curiosity and openness to change), and items in factor consumption style: X2: I like to earn more and spend more for consumption to enjoy myself, and X3: Consumption itself is an enjoyable experience in my life.
As for the number of indicators attributable to agiven factor, probably the
most controversy will cause factors which contain only two indicators. This
controversial situation applies to factors as: entertainment and fun, curiosity
and openess to change, consumption style. In previous yet studies (that have
been conducted through the agency of EFA), factor curiosity and openness to
change had two additional indicators (X5: Iexplore new things and aspects
of life, X13: I learn constantly something that is new). However, they were
dropped, through the agency of EFA and items analyses. Items X5 and X13
had cross-loadings and simply mixed with other factors. This investigation
was further confirmed in CFA models. Similarly, item X10: I respect and believe in those people who possess lots of money, was suspiciously loaded on
factor self-enhancement.
In general, ameasurement model for HCV scale (based on theoretical assumptions) has been confirmed, but only partially. It is worth to note here
that asubsequent acceptance of the CFA model 2 (due to rejection of the CFA
model 1), which was further divided into two submodels 2a and 2b, resulted

Hedonic-consumerism values (HCV) ultimate scale and its implications for marketing

423

more from exploratory analysis than aconfirmation of the studied phenomenon. For example, when we used modifications indices and residuals analysis, we moved from the confirmation to exploration area. Thus, we should
approve CFA model 2b (which was verified empirically), however this model
is just an adaptation of its rejected version, i.e. model 1. Despite the fact, that
both models have differed in their underlying structure of indicators by only
afew items. Faced with this difficulty, we can only be sure, that HCV construct
needs some further refinement and revision, especially in terms of the items
content, which might be one more time investigated in exploratory studies
on the basis of wider sample, i.e. taking into account more diverse groups of
the examinees respondents from other cities and academic centers. It is also
worth to consider some additional indicators to complement these factors
consisting of few (only two) items. However, these issues (although important) are beyond the scope of this book and author leaves them open to other
researchers who would like to pursue this topic more profoundly.
In marketing context, HCV scale should be useful for marketers and managers who closely relate their work with markets and its segments or specific
groups of consumers in all. The HCV scale can be creatively combined with
other yet useful information (e.g. with consumers market behavior) which
may reinforce marketing activities undertaken by companies. Relationships
between marketing-mix elements 4P and consumers are strongly dependent
on the familiarity of personal values. Therefore, if we decide to explore consumers hedonic values, we will obtain one more additional opportunity to
effectively trace specific market segments. Knowledge about hedonic values
is likely to determine future state of companies [Laurent and Kapferer 1985].
Having based on adeeper understanding of hedonic aspects and simultaneously recognition of prospective segments, marketing experts may elicit more
sales from their target consumers by adequately addressing their perceptions
and attitudes toward certain products39. This scale will be useful both in market segmentation and market positioning and will, of course, enhance the
efficiency of marketing efforts from the perspective of better product sales.
In market segmentation, clustering groups according to consumers personal
values (such as hedonism) will indicate on distinct market segments to which
39

In the strategic planning of the product categories, brand marketers should rest not only
on socio-demographic aspects but also on the benefits of the dimensions proposed in the HCV
scale. Also with regard to financial, functional, individual, and social dimensions, marketers
will be able to improve purchase value with regards to different segments of consumers, who
differ in orientations pertaining to hedonic-consumerism.

424

Scale development for hedonicconsumerism values

different sets of products are appealing, or for which suitable advertising


strategies could be implemented.
In conclusion, HCV scale helps to understand contemporary modern
consumers behavior to emerging trends in the market. It also helps to understand astyle of living and way of consuming products. It assuredly stands
for mass culture and mass products era (as it happens at the moment). It
reflects not only the measurement instrument but also provides marketing
information about respective segments and its consumers.
The HCV scale will work to many marketing advantages where one can
generate new concepts of marketing strategy and undertake effective actions
on different segments in the area of:
1) promotion (e.g. expression of catching advertising slogans or selection of media communication such as: tv, press, radio or internet),
2) choice of appropriate distribution channels based on the indirect or
direct consumer preferred contact,
3) improved attributes in products and methods of product launch into
the market,
4) setting the adjusted price level for products according to consumers
personal preferences.

Measurement and scales development in marketing field


conclusions
In both theory and practice, measurement issues are of great importance.
Aperfectly reliable40 and valid measurement instrument and thus developed
scale should reflect the researchers early intentions that would remain per 40

Just to remind, according to Symonds [1928] there are several factors which considerably affect the reliability level of the measurement. Six factors are also general considerations
in scale construction and they are related to: 1) number of items reliability increases as the
number of items in scale increases; 2) the range of item difficulty the narrower range of item
difficulty, the greater reliability of the scale; items that are answered correctly (or incorrectly)
by all examinees do not contribute to variability within a test and decrease the number of functional items; 3) evenness in scaling the result of developing scale with items at the same level
of difficulty is equivalent to reducing the number of items all items of equal difficulty should
be answered either correctly or incorrectly; 4) interdependence of measured items lower estimates of the reliability will be achieved if the response to one item is suggested by another
item, or if the meaning one item is dependent upon a previous item; 5) guessing scale reliability decreases as the likelihood of guessing by examinees the correct answer increases.

Measurement and scales development in marketing field conclusions

425

manent or fixed over time. No doubt, the same measure being applied to the
same person should yield the same or approximate value each time, provided
the studied phenomenon has not changed itself. As often proved, numbers
ascribed, do not guarantee that the meaning and measurement level will be
always the same. Virtually there is no simple way to look at the numbers and
say whether they express any real value or not, and whether there are drawn
from the top-hat. It is due to the fact some measurement instruments are
close to the designated conditions of accidental measurement. Hence, the
conclusions to be drawn may vary substantially, depending on whether we
know that the measurements are highly accurate or not. This is why, the issues of measurement deserve much attention from any researcher who cares
and endeavors to solve scientific problems.
Because theories in social sciences evolve, so there also rises a natural
need to test them more accurately and objectively. These theories require
often empirical operationalization of the constructs under study, as it is in
case of personal consumers values. When constructs are measured well (i.e.
reliably and validly), then theory testing is enhanced. Furthermore, it is often found that a once-used scale needs to be updated or refined to better
reflect aconstruct of interest. Many articles in measurement subject present
new scales, derived from already existing measures, that are felt to more accurately or more efficiently reflect constructs of interest. Also the advancement of computer and software technology has greatly helped researchers to
develop measures and design new scales. Statistical packages such as SPSS,
Statistica, Amos but also less known, M plus, EQS, Lisrel have made it easier
and quicker to perform most of the basic and many of the more advanced
analyses recommended in scale development. More importantly, one can either observe trends aimed not only towards development of more scales,
but also towards the procedures used to develop and validate such scales
[Shavelson and Webb 1991; Marcoulides 1998; Bearden and Netemeyer
1999; Netemeyer, Bearden and Sharma 2003].
In literature we differentiate two, the most important approaches to measurement and test theory. One is based on the classical theory of measurement, the other is called the item response theory. Choice of an appropriate
way depends on the research requirements and objectives (in particular field
of science). And as one can easily infer, each area of science develops its own
set of methodological procedures, tools and methods. In social sciences,
measurement methods are mostly designed and applied to study peoples life
and in general their traits. Virtually, in many social fields (such as: business
administration, economics, political science, sociology, international rela-

426

Scale development for hedonicconsumerism values

tions, communication, etc.), the core aspect of running research investigations (e.g. based on peoples activities, behavior, etc.) includes knowledge
which is derived from psychology and which indirectly affects all above social fields41.
Psychometrics (as the subspecialty concerned with measuring psychological and social phenomena) have emerged as amethodological paradigm
in its own right. Its growth and development was mainly due to:
widespread use of psychometrics definitions,
popularity of factor analysis in the social science research,
adoption of psychometrics methods for developing scales measuring
an array of various subjects.
For these reasons, psychometric measurement was quickly adopted in the
consumers research. Even now up to this moment it is one of the most commonly applied solution in the consumers measurement and scales development. This assertion was proved by Bearden and Netemeyer [1999] in their
book Marketing scales who classified various types of multi-item scales, developed measures, evaluation procedures or reliability estimation approaches to
be used in consumer studies. Bearden and Netemeyer [1999] mentioned, for
example, the following marketing scales which can be related to:
consumers traits and individual differences, e.g. 1) scales related
to interpersonal orientation, needs/preferences, and self-concept;
2) scales related to consumer compulsiveness and impulsiveness;
3)scales related to country image and affiliation; 4) scales related to
consumer opinion leadership and opinion seeking; 5) scales related to
innovativeness; 6) scales related to consumer social influence,
consumers personal values, e.g. 1) scales exploring general values;
2) scales related to environmentalism and socially responsible consumption; 3) scales related to values on materialism and possessions/
objects,
consumers involvement, information processing, and price perceptions, e.g. 1) scales on involvement with aspecific class of product;
2)scales on involvement general to several products; 3) scales related
to purchasing involvement,
consumers reactions to advertising stimuli, e.g. 1) scales related to
ad emotions and ad content; 2) scales related to ad believability/credibility; 3) scales related to childrens advertising,
41

In marketing research methodology one often monitors consumers beliefs, motivational


states, personal value systems, expectancies, needs, emotions, or perceptions.

Measurement and scales development in marketing field conclusions

427

consumers attitudes about satisfaction and post-purchase behavior, e.g. 1) scales measuring consumers attitudes toward business
practices and marketing; 2) scales related to post-purchase behavior/
discontent; 3)scales toward product/services satisfaction.
Marketing has also profited from early statistical achievements, especially
in the field of factor analysis which played asignificant role in the scale development and simultaneously in the measurement of the theoretical constructs. For example, exploratory factor analysis (EFA) supported marketing
within adimensionality exploration and in the refinement of scale components under development. In contrast, confirmatory factor analysis (CFA)
was primarily grounded on scale validation and its confirmation in reference
to an a priori hypothesis about the relationships of a set of measurement
items (indicators) to their respective factors. In particular, CFA has become
auseful tool for the later stages of scale development when the internal consistency and validity of the theoretical construct is considered.
At the recent time marketing has reached its own concept of measurement
and scale development. For instance, Rossiter invented C-OAR-SE method
(as asort of direct response to psychometrics). In C-OAR-SE, Rossiter proposed abrand new procedure for development of the scale to measure latent
variables. He gave anew perspective on the measurement and indicated also
on how to use single-item vs. multi-item scales and when to use an index of
essential items rather than selecting unidimensional items [Rossiter 2002].

APPENDIX - QUESTIONNAIRE
FOREWORD

Survey Statistical analysis and exploration of youth values


POZNA UNIVERSITY OF ECONOMICS

al. Niepodlegoci 10
61-875 Pozna
Ph: +48 061 854-33-12
Fax: +48 061 854-36-10

About survey
Dear Participant,

We at The Pozna University of Economics are conducting survey on peoples values. We would like to
include your points of view. As you know human values are rapidly changing in todays world because of new
cultures, technologies, the economy and social aspects of living. These changes are having important effects on
you and your family. We would like to gain insights into these changes and their impact from your perspective.
We think you will find the survey interesting.
You are one of the examinees randomly selected. Because the success of the survey depends upon the
cooperation of all the people who were selected, we would especially appreciate your willingness to help us in
this study.
The information obtained from the survey will in no way reflect the identities of the people participating.
Your cooperation, attitudes are very important to the success of the study and will be kept strictly confidential.
Your responses, will only be used when grouped with those of the other people taking part in the study.
We realize that many of us receive a lot of things through different forms of communications which we
classify as a junk and not important to respond to, but please do not consider this particular study as being
junk. The study is strictly focused on scientific problem.
In the questionnaire we included all important directions and instructions necessary to complete the survey
without the assistance of the interviewer. You will find that the survey will take only about 20 minutes of your
time. Please take your time in responding to each question. Your honest responses are what we are looking for in
this study.
Thank you in advance. We deeply appreciate your cooperation in taking part in our study.
Sincerely,
If you have any comments, concerns or questions about this survey, please send us your message via email
pt.badania@ue.poznan.pl

453

429

Appendix questionnaire

Self-administered questionnaire

Q.1. Below there is a list of things that some people look for or want out of life. Please study the list carefully
and then rate each thing on how important it is in your daily life.
Answers are ranging from: 1 = totally unimportant value, 2 = partially unimportant value, 3 = neither
unimportant nor important value, 4 = partially important value, 5 = totally important value.
Please circle your answer on the rating scale
Table 1
I strive always for new experiences

I like to earn more and spend more for consumption to enjoy myself

I spend nicely time and have a good time

I search for adventurous and exciting life

I explore new things and aspects of life

Consumption itself is an enjoyable experience in my life

I strive to achieve success in my life

I make choices in my life for my own

I learn constantly something that is new

I want to be creative and act with imagination

I like when I am praised and admired

I respect and believe in those people who possess lots of money

I care more for myself than others

Q.2. Now, please characterize additionally your favourite activities, using for this purpose 5 point-scale ranging
from: 1 = totally disagree, 2 = disagree, 3 = neither disagree nor agree, 4 = agree, to 5 = totally agree.
Table 2
Do you often?

No
1

Go to theater

Play on computer games

Go for a walk to the park or forest

Go to parties

Read magazines

Go to art exhibits

Watch TV

Read books

Go out with friends to the city

11

Read newspapers

12

Visit relatives

13

Listen to the radio

454

430

List of figures

14

Go to the movies

15

Go to live music shows

16

Dine out

18

Do sports

19

Go to the pub

20

Go to the music club to dance

21

Listen to music at home

24

Attend on educational courses

Basic metrics of the questionnaire


M1. Are you? please circle the answer
Female

Male

M2. What is your exact year of your birth?


Year;
______________________/
M3. How many people, in total, live in your household, including yourself (including your parents, sisters,
brothers and yourself)?
Number of occupants;
______________________/
M4. Place or the kind of permanent residence of your parents:
please circle the answer
House

Flat

Other______________

M5. Please indicate below your temporary place of residence during studies (in your university days):
please circle the answer
Dormitory

Rented room

Rented flat

Rented house

Parents own flat

Own flat

Other

M6. Outside learning hours and going to school, do you work?


please circle the answer
Full time work

Part time work

Dont work; only learning and studying

455

References
Abell, P., 1971, Model building in sociology, Weidenfeld and Nicolson, London.
Abelson, R.P., Tukey, J.W., 1959, Efficient conversion of non-metric information into metric
information, Proceedings of the Social Statistics Section, American Statistical Association,
pp.226230.
Abelson, R.P., Tukey, J.W., 1963, Efficient utilisation on non-numerical information in quantitative analysis general theory and the case of simple order, Annals Mathematical Statistcs,
vol.34, pp.13471369.
Achenreiner, G.B., 1997, Materialistic values and susceptibility to influence in children, Advances in Consumer Research, vol.24, pp.8288.
Adams, E.W., 1966, On the nature and purpose of measurement, Synthese, vol.16, pp.125169.
Adams, E.W., Fagot, R.F., Robinson, R.E., 1965, A theory of appropriate statistics, Psychometrika, vol.30, pp.99127.
Ajzen, I., 2001, Nature and operation of attitudes, Annual Review of Psychology, vol. 52,
pp.2758.
Akaike, H., 1987, Factor analysis and AIC, Psydiometrika, vol.52, pp.317332.
Albaum, G., Murphy, B.D., 1988, Extreme response on a Likert scale, Psychological Reports,
vol.63, October, pp.501502.
Allen, M.J., Yen, W.M., 1979, Introduction to measurement theory, Waveland Press Illinois.
Allport, G., 1935. Attitudes, in: Murchison, C. (ed.), A handbook of social psychology, Clark
University Press, Worcester, MA, pp.789844.
llport, G.W., Vernon, P.E., Lindzey, G., 1960, Study of values manual and test booklet, 3rd ed.,
Houghton Mifflin, Boston.
Alper, T.M., 1984, Groups of homeomorphisms on the real line, BSc Thesis, Harvard University,
Cambridge.
Alper, T.M., 1985, A note on real measurement structures of scale type (m, m + 1), Journal of
Mathematical Psychology, vol.29, pp.7381.
Alper, T.M., 1987, A classification of all order-preserving homeomorphism groups of the reals
that satisfy finite uniqueness, Journal of Mathematical Psychology, vol.31, pp.135154.
Anastasi, A., Urbina, S., 1997, Psychological testing, 7th ed., Prentice Hall, Upper Saddle River.
Anderson, J.C., Gerbing, D.W., 1988, Structural equation modeling in practice Areview and
recommended two-step approach, Psychological Bulletin, vol.103, pp.411423.
Anderson, R.D., Acito, F., 1980, A Monte Carlo comparison of factor analytic methods, Journal
of Marketing Research, vol.17, May, pp.228236.
Anderson, R.D., Rubin, H., 1956, Statistical inference in factor analysis, Proceedings of the
3rd Berkeley Symposium of Mathematical Statistics and Probability, vol.5, pp.111150.
Andrich, D., 1988, Rasch models for measurement, Sage Publications, Newbury Park, CA.
Aneshensel, C.S., Clark, V.A., Frerichs, R.R., 1983, Race, ethnicity and depression Aconfirmatory analysis, Journal of Personality and Social Psychology, vol.44, February, pp.385398.
Angoff, W., 1982, Summary and derivation of equating methods used at ETS, in: Holland, P.,
Rubin, D. (eds.), Test equating, Academic Press, New York, pp.5579.

432

References

Antil, J.H., Bennett, P.D., 1979, Construction and validation of a scale to measure socially responsible consumption behavior, in: Henion, K.E., Kinnear. T.C. (eds.), The conserver society, American Marketing Association, Chicago. pp.5168.
Aranowska, E., 2005, Pomiar ilociowy w psychologii, Scholar, Warszawa.
Arbuckle, J.L., 2007, Amos 16 users guide, SPSS, Chicago.
Armor, D.J., 1974, Theta reliability and factor scaling, in: Costner, H.L. (ed.), Sociological methodology, Jossey-Bass, San Francisco, pp.1750.
Arrindell, W.A., Ende, J. van der, 1985. An empirical-test of the utility of the observationsto-variables ratio in factor and components-analysis, Applied Psychological Measurement,
vol.9, no.2, pp.165178.
Babakus, E., Ferguson, J.R., Jreskog, K.G., 1987, The sensitivity of confirmatory maximum
likelihood factor analysis to violations of measurement scale and distributional assumptions,
Journal of Marketing Research, vol.39, February, pp.3146.
Babin, B.J., Darden, W.R., Griffin, M., 1994, Work and/or fun measuring hedonic and utilitarian shopping value, Journal of Consumer Research, vol.20, no.4, March, pp.644656.
Bachman, J.G., OMalley, P.M., 1984, Yea-saying, nay-saying, and going to extremes blackwhite differences in response styles, Public Opinion Quarterly, vol.48, Summer, pp.491509.
Bacon, D.R., Sauer, P.I., Young, M., 1995, Composite reliability in structural equation modelling, Educational and Psychological Measurement, vol.55, June, pp.394406.
Bagozzi, R.P., 1982, The role of measurement in theory construction and hypothesis testing:
Toward a holistic model, in: Fornell, C. (ed.), A second generation of multivariate analysis,
vol.1, Praeger, New York, pp.523.
Bagozzi, R.P., 1983, Issues in the application of covariance structure analysis A further comment, Journal of Consumer Research, vol.9, March, pp.449450.
Baker, B.O., Hardyck, C.D., Petrinovich, L.F., 1966, Weak measurements vs. strong statistics
An empirical critique of S.S. Stevenss proscriptions on statistics, Educational and Psychological Measurement, vol.26, pp.291309.
Baker, F.B., 1992, Item response theory parameter estimation techniques, Marcel Dekker, New
York.
Baker, F.B., 2001, The basics of item response theory, 2nd ed., Eric Clearinghouse.
Bales, R., Couch, A., 1969, The value profile A factor analytic study of value statements, Sociological Inquiry, vol.39, pp.317.
Balicki, A., 2009, Statystyczna analiza wielowymiarowa i jej zastosowania spoeczno-ekono
miczne, Uniwersytet Gdaski, Gdask, pp.131189.
Ball, A.D., Tasaki, L., 1992, The role and measurement of attachment in consumer behavior,
Journal of Consumer Psychology, vol.1, no.2, pp.155172.
Ball-Rokeach, S.J., Rokeach, M., Grube, J.W., 1984, The great American values test, Free Press,
New York.
Bandalos, D.L., 2002, The effects of item parceling on goodnessof-fit and parameter estimate bias
in structural equation modeling, Structural Equation Modeling, vol.9, pp.78102.
Bardi, A., Schwartz, S.H., 2003, Values and behavior strength and structure of relations, Personality Social Psychological Bulletin, vol.29, pp.12071220.
Bart, W.M., Krus, D.J., 1973, An ordering theoretic method to determine hierarchies among
items, Educational and Psychological Measurement, vol.33, pp.291300.
Bartlett, M.S., 1937, The statistical conception of mental factors, British Journal of Psychology,
vol.28, pp.97104.

References

433

Bartlett, M.S., 1950, Tests of significance in factor analysis, British Journal of Statistical Psychology, vol.3, pp.7785.
Bartlett, M.S., 1951, A further note on tests of significance in factor analysis, British Journal of
Statistical Psychology, vol.4, pp.12.
Bass, F.M., Tigert, D.S., Lonsdale R.T., 1968, Market segmentation group versus indyvidual
behavior, Journal of Marketing Research, vol.5, August, pp.264270.
Batra, R., Ahtola, T., 1990, Measuring the hedonic and utilitarian sources of consumer attitudes,
Marketing Letters, vol.2, no.2, pp.159170.
Batson, D.C., 1989, Personal values, moral principles, and a three-path model of prosocial motivation, in: Eisenberg, N., Reykowski, J., Staub, E. (eds.), Social and moral values individual and societal perspectives, Lawrence Erlbaum, Hillsdale, NJ, pp.213228.
Baumgartner, H., Homburg, C., 1996, Applications of structural equation modeling in marketing and consumer research A review, International Journal of Research in Marketing,
vol.13, April, pp.139161.
Bk, A., Walesiak, M., 2000, Conjoint Analysis w badaniach marketingowych, Wydawnictwo
Akademii Ekonomicznej we Wrocawiu, Wrocaw, pp.1819.
Bearden, W.O., Mason, B.J., 1980, Determinants of physician and pharmacist support of generic
drugs, Journal of Consumer Research, vol.7, September, pp.121130.
Bearden, W.O., Netemeyer, R.G., 1999, Handbook of marketing scales mutli-item measures
for marketing and consumer behavior research, Sage Publication, London.
Bearden, W.O., Hardesty, D., Rose, R., 2001, Consumer self-confidence refinements in conceptualization and measurement, Journal of Consumer Research, vol.28, June, pp.121134.
Beatty, S.E., Kahle, L.R., 1984, Beyond demographics how Madison Avenue knows who you
are and what you want, The Atlantic, vol.254, October, pp.4958.
Beatty, S.E., Kahle, L.R., Homer, P., Misra, S., 1985, Alternative measurement approaches to
consumers values The list of values and Rokeach values survey, Psychology and Marketing, vol.2, Fall, pp.181200.
Bechtoldt, H.P., 1951, Selection, in: Stevens, S.S. (ed.), Handbook of experimental psychology,
Wiley, New York, pp.12371267.
Bechtoldt, H.P., 1959, Construct validity A critique, American Psychologist, vol. 14,
pp.619629.
Bedyska, S., Ksiek, M., 2012, Statystyczny drogowskaz 3 praktyczny przewodnik wykorzystania modeli regresji oraz rwna strukturalnych, Wydawnictwo Akademickie Sedno,
Warszawa.
Belk, R.W., 1984, Three scales to measure constructs related to materialism reliability, validity
and relationships to measures of happiness, Advances in Consumer Research, vol.11, Association for Consumer Research, pp.291297.
Bell, W., Robinson, R.V., 1978, An index of evaluated equality measuring conceptions of social
justice in England and the United States, in: Tomason, R.F. (ed.), Comparative studies in sociology An annual compilation of research, vol.1, JAI Press, Greenwich, CT, pp.235270.
Bem, D., 1970, Beliefs, attitudes, and human affairs, Brooks-Cole, Monterey, CA.
Bentler, P.M., 1968, Alpha-maximized factor analysis (alphamax) its relation to alpha and
canonical factor analysis, Psychometrika, vol.33, pp.335345.
Bentler, P.M., 1982, Confirmatory factor analysis via noniterative estimation A fast inexpensive method, Journal of Marketing Research, vol.19, pp.417517.
Bentler, P.M., 1983, Some contribution to efficient statistics in structural models specifications
and estimation of moment structures, Psychometrika, vol.48, pp.493517.

434

References

Bentler, P.M., 1990, Comparative fit indexes in structural models, Psychological Bulleting,
vol.107, no.2, pp.238246.
Bentler, P.M., 1995, EQS structural equations program manual, Multivariate Software, Encino,
CA.
Bentler, P.M., Bonett, D.G., 1980, Significance tests and goodness-of-fit in the analysis of covariance structures, Psychological Bulletin, vol.88, pp.588606.
Bentler, P.M., Chan, W., 1993, The covariance structure analysis using an ipsative data, Sociological Methods and Research, vol.22, pp.214247.
Bentler, P.M., Chan, W., 1996, Covariance structure analysis of partially additive ipsative
data using restricted maximum likelihood, Multivariate Behavioral Research, vol. 31,
pp.289312.
Bentler, P.M., Chan, W., 1998, Covariance structure analysis of ordinal ipsative data, Psychometrika, vol.63, no.4, pp.369399.
Bentler, P.M., Kano, Y., 1990, On the equivalence of factors and components, Multivariate Behavioral Research, vol.25, no.1, pp.6774.
Berge, J.M.F., Zegers, E.F., 1978, A series of lower bounds to the reliability of a test, Psychometrika, vol.43, no.4, December, pp.575579.
Bergkvist, L., Rossiter, J.R., 2007, The predictive validity of multiple-item versu single-item
measures of the same constructs, Journal of Marketing Research, vol.19, May, pp.175184.
Bernstein, I.H., Teng, G., 1989, Factoring items and factoring scales are different spurious evidence for multidimensionality due to item categorization, Psychological Bulletin, vol.105,
pp.467477.
Best, R.J., 1978, Validity and reliability of criterion-based preferences, Journal of Marketing
Research, vol.15, February, pp.154160.
Biaynicka-Birula, J., 2011, Warto w filozofii, in: Sagan, A. (red.), Warto dla klienta wuka
dach rynkowych aspekty metodologiczne, Wydawnictwo Uniwersytetu Ekonomicznego
w Krakowie, Krakw, pp.1112.
Biemer, P., Groves, R.M., Lyberg, L.E., Mathiowetz, N.A., Sudman, S. (eds.), 1991, Measurement errors in surveys, John Wiley and Sons, New York.
Biernat, M., Vescio, T.K., Theno, S.A., Crandall, C.S., 1996, Values and prejudice toward
understanding the impact of American values on outgroup attitudes, in: Seligman, C.,
Olson,J.M., Zanna, M.P. (eds.), The Ontario symposium the psychology of values, Lawrence Erlbaum, Mahwah, NJ, pp.153189.
Biggs, J.B., Das, J.P., 1973, Extreme response set, intemality-extemality, and performance, British Journal of Social and Clinical Psychology, vol.12, June, pp.199210.
Billiet, J., 2002, Cross-cultural equivalence with structural equation modeling, in: Mohler, P.P.
(ed.), Cross-cultural survey methods, John Wiley and Sons, New Jersey, pp.247264.
Birnbaum, A., 1968, Some latent trait models and their uses in inferring an examinees ability, in: Lord, F.M., Novick, M.R. (eds.), Statistical theories of mental test scores, AddisonWesley, Reading, MA, pp.397479.
Blalock, H.M., 1968, The measurement problem, in: Blalock, H.M., Blalock A. (eds.), Methodology in social research, McGraw-Hill, New York.
Bloch, P.H., Richins, M.L., 1983, Shopping without purchase An investigation of consumer
browsing behavior, Advances in Consumer Research, vol.10, Association for Consumer
Research, pp.389393.
Bloch, P.H., Bruce, G.D., 1984, Product involvement as leisure behavior, Advances in Consumer Research, vol.11, Association for Consumer Research, pp.197202.

References

435

Bloch, P.H., Sherrell, D.L. Ridgway, N.M., 1986, Consumer search An extended framework,
Journal of Consumer Research, vol.13, June, pp.119126.
Bock, R.D., 1997, A brief history of item response theory, Educational Measurement Issues
and Practice, vol.16, no.4, pp.2133.
Bohrnstedt, G.W., 1970, Reliability and validity assessment in attitude measurement, in: Summers, G.F. (ed.), Attitude measurement, Rand McNally, Chicago, pp.8199.
Bollen, K.A., 1984, Multiple indicators internal consistency or nor necessary relationship?,
Quality and Quantity, vol.18, pp.377385.
Bollen, K.A., 1986, Sample size and Bentler-Bonetts nonormed fit index, Psychometrika,
vol.51, pp.375377.
Bollen, K.A., 1989a, A new incremental fit index for general structure equation models, Sociological Methods and Research, vol.17, pp.303316.
Bollen, K.A., 1989b, Structural equations with latent variables, Wiley, New York.
Bollen, K.A., Arminger, G., 1991, Observational residuals in factor analysis and structural
equation models, Sociological Methodology, vol.21, pp.235262.
Bollen, K.A, Lennox, R., 1991, Conventional wisdom in measurement A structural equation
perspective, Psychological Bulletin, vol.110, no.2, pp.305314.
Bollen, K.A., Long, J.S., 1993, Testing structural equation models, Sage Publications, Newbury
Park.
Bollen, K.A, Ting, K.F., 2000, A tetrad test for causal indicators, Psychological Methods, vol.5,
no.1, pp.322.
Bond, E.J., 1983, Reason and value, Cambridge University Press, Cambridge.
Boomsma, A., 2000, Reporting analyses of covariance structures, Structural Equation Modeling, vol.7, pp.461483.
Borgatta, E.F., 1955, An error ratio for scalogram analysis, Public Opinion Quarterly, vol.19,
pp.96100.
Borgatta, E.F., Bohrnstedt. G.W., 1980, Level of measurement once over again, Sociological
Methods and Research, vol.9, no.2, pp.148160.
Borowicz, R., 1991, Wyksztacenie jako warto casus Polski lat siedemdziesitych, Edukacja,
nr 4.
Borsboom, D., Mellenbergh, G.J., Heerden, J. van, 2003, The theoretical status of latent variables, Psychological Review, vol.110, no.2, pp.203219.
Borsboom, D., Mellenbergh, G.J., Heerden, J. van, 2004, The concept of validity, Psychological
Review, vol.111, no.4, pp.10611071.
Bourdie, P., 1984, Distinction A social critique of the judgement of a taste, Routledge and
Kegan Paul, London.
Boyle, G.J., 1985, Self-report measures of depression some psychometric considerations, British
Journal of Clinical Psychology, vol.24, pp.4559.
Braithwaite, V.A., Law, H.G., 1985, Structure of human values testing the adequacy of the
Rokeach value survey, Journal of Personality and Social Psychology, vol.49, pp.250263.
Brandstaetter, H., 1996, Saving, income and emotional climate for households related to personality structure, Center for Economic Research, Tilburgh University, Progress Report, no.38.
Brannick, M.T., 1995, Critical comments on applying covariance structure modeling, Journal of
Organizational Behavior, vol.16, pp.201214.
Breckler, S.J., Wiggins, E.C., 1992. On defining attitude and attitude theory once more with
feeling, in: Pratkanis, A.R., Breckler, S.J., Greenwald, A.C. (eds.), A
ttitude structure and
function, Erlbaum, Hillsdale, NJ, pp.407427.

436

References

Brehm, J.W., 1956, Post-decision changes in desirability of choice alternatives, Journal of Abnormal and Social Psychology, vol.52, pp.384389.
Bridgman, P.W., 1927, The logic of modern physics, Macmillan, New York.
Brown, J.D., 2009, Statistics corner questions and answers about language testing statistics
principal components analysis and exploratory factor analysis definitions, differences, and
choices, Shiken: JALT Testing and Evaluation SIG Newsletter, vol.13, no.1, pp.2630.
Brown, R.L., 1994, Efficacy of the indirect approach for estimating structural equation models
with missing data A comparison of five methods, Structural Equation Modeling, vol.1,
pp.287316.
Brown, T.A., 2006, Confirmatory factor analysis for applied research, Guilford, New York.
Browne, M.W., 1974, Generalised least squares estimators in the analysis of covariance structures, South African Statistical Journal, vol.8, pp.124.
Browne, M.W., 1982, Covariance structures, in: Hawkins, D.M. (ed.), Topics in applied multivariate analysis, Cambridge University Press, pp.72141.
Browne, M.W., 1984, Asymptotic distribution free methods in the analysis of covariance structures, British Journal of Mathematical and Statistical Psychology, vol.37, pp.127141.
Browne, M.W., 2001, An overview of analytic rotation in exploratory factor analysis, Multi
variate Behavioral Research, vol. 36, pp.111120.
Browne, M.W., Cudeck, R., 1989, Single sample cross-validation indices for covariance structures, Multivariate Behavioral Research, vol.24, pp.445455.
Browne, M.W., Cudeck, R., 1993, Alternative ways of assesing model fit, in: Bollen, K.A.,
Long,J.S., Testing structural equation models, Sage, Newbury Park, CA.
Bryant, F.B., Yarnold, P.R., 1995, Principal-components analysis and confirmatory factor analysis, in: Grimm, L.G., Yarnold, P.R. (eds.), Reading and understanding multivariate statistics,
American Psychological Association, Washington, D.C., pp.99136.
Brzeziski, J., 1978, Elementy metodologii bada psychologicznych, PWN, Warszawa.
Brzozowski, P., 1989, Skala wartoci (SW), Polska adaptacja Value Survey M. Rokeacha,
Wydzia Psychologii Uniwersytetu Warszawskiego, Warszawa.
Buczyska-Garewicz, H., 1975, Znak, znaczenie, warto, KiW, Warszawa.
Burnham, K.P., Anderson, D.R., 2004, Multimodel inference understanding AIC and BIC in
model selection, Sociological Methods and Research, vol.33, pp.261304.
Byrne, B.M., 1989, A primer of LISREL basic applications and programming for confirmatory
factor analytic models, Springer-Verlag, New York.
Byrne, B.M., 2010, Structural equation modeling with Amos basic concepts, applications and
programming, 2nd ed., Multivariate Applications Series, Routledge.
Byrne, B.M., Shavelson, R.J., Muthn, B., 1987, Testing for the equivalence of factor covariance
and mean structures The issue of partial measurement variance, Psychological Bulletin,
vol.105, pp.456466.
Campbell, C., 1987, The romantic ethic and the spirit of modern consumerism, Blackwell,
Oxford.
Campbell, D.T., Fiske, D.W., 1959, Convergent and discriminant validation by the multitraitmultimethod matrix, Psychological Bulletin, vol.56, pp.81105.
Carman, J.M., 1977, Values and consumtion patterns A closed loop, Advances in Consumer
Research, vol.5, Association for Consumer Research, pp.403407.
Carroll, J.B., 1945, The effect of difficulty and chance success on correlations between items or
between tests, Psychometrika, vol.10, pp.119.

References

437

Carroll, J.B., 1953, An analytical solution for approximating simple structure in factor analysis,
Psychometrika, vol.18, pp.2328.
Cattell, R.B., 1946, Description and measurement of personality, World Book Company, New
York.
Cattell, R.B., 1978, The scientific use of factor analysis in behavioral and life sciences, Plenum,
New York.
Cattell, R.B., Muerle, J.L., 1960, The maxplane program for factor rotation to oblique simple
structure, Educational and Psychological Measurement, vol.20, pp.569590.
Chevan, A., 1972, Minimum-error scalogram analysis, Public Opinion Quarterly, vol. 36,
pp.379387.
Chilton, R., 1969, A review and comparison of simple statistical tests for scalogram analysis,
American Sociological Review, vol.34, pp.238244.
Chomsky, N., Fodor, J., 1980, The inductivist fallacy, in: Piattelli-Palmarini, M. (ed.), Language
and learning The debate between Jean Piaget and Noam Chomsky, Harvard University
Press, Cambridge, Mass.
Chou, C.P., Bentler, P.M., 1995, Estimates and tests in structural equation modeling, in:
Hoyle,R.H. (ed.), Structural equation modeling concepts, issues, and applications, Sage,
Thousand Oaks, CA, pp.3754.
Chou, C.P., Bentler, P.M., Satorra, A., 1991, Scaled test statistics and robust standard errors for
nonnormal data in covariance structure analysis: A Monte Carlo study, British Journal of
Mathematical and Statistical Psychology, vol.44, pp.347357.
Choynowski, M., 1971, Pomiar w psychologii, in: Kozielecki, R. (red.), Problemy psychologii
matematycznej, PWN, Warszawa.
Churchill, G.A., 1979, A paradigm for developing better measures of marketing constructs,
Journal of Marketing Research, vol.16, no.1, pp.6473.
Churchill, G.A., Iacobucci, D., 2002, Marketing research methodological foundations, 8th ed.,
Harcourt College Publishers, Fort Worth, TX.
Churchill, G.A., Moschis, G.P., 1979, Television and interpersonal influences on adolescent consumer learning, Journal of Consumer Research, vol.6, pp.2335.
Churchill, G.A., Peter, J.P., 1984, Research design effects on the reliability of rating scales: Ameta-analysis, Journal of Marketing Research, vol.21, no.4, pp.360375.
Clark, L.A., Watson, D., 1995, Construct validity basic issues in scale development, Psychological Assessment, vol.7, no.3, pp.309319.
Clarkson, D.B., Jennrich, R.I., 1988, Quartic rotation criteria and algorithms, Psychometrika,
vol.53, pp.251259.
Clawson, C.J. Vinson, D.E., 1978 Human values A historical and interdisciplinary analysis,
Proceedings of Association for Consumer Research, pp.396402.
Cohen, J., 1988, Statistical power analysis for the behavioral sciences, 2nd ed., Lawrence Erlbaum, Hillsdale, NJ.
Cole, D.A., Maxwell, S.E., 1985, Multitrait-multimethod comparisons across populations A confirmatory factor analytic approach, Multivariate Behavioral Research., vol.20, pp.389417.
Coltman, T., Devinney, T.M., Midgley, D.F., Venaik, S., 2008, Formative versus reflective
measurement models two applications of formative measurement, Journal of Business
Research, vol.61, pp.12501262.
Comrey, A.L., 1978, Common methodological problems in factor analytic studies, Journal of
Consulting and Clinical Psychology, vol.46, pp.648659.
Comrey, A.L., Lee, H.B., 1992. A first course in factor analysis, 2nd ed., Erlbaum, Hillsdale, NJ.

438

References

Cook, T.D., Campbell, D.T., 1979, Quasi-experimentation design and analysis issues for field
settings, Houghton Mifflin, Boston.
Coombs, C.H., 1950, Psychological scaling without a unit of measurement, Psychological Review, no.57, pp.145158.
Coombs, C.H., Dawes, R.M., Tversky, A., 1977, Wprowadzenie do psychologii matematycznej,
PWN, Warszawa.
Coppin, G., Delplanque, S., Cayeux, I., Porcherot, C., Sander, D., 2010, Im no longer torn after
choice how explicit choices can implicitly shape preferences for odors, Psychological Science, vol.21, pp.489493.
Cortina, J.M., 1993, What is coefficient alpha? An examination of theory and applications, Journal of Applied Psychology, vol.78, February, pp.98104.
Costello, A.B., Osborne, J.W., 2005, Best practices in exploratory factor analysis four recommendations for getting the most from your analysis, Practical Assessment Research and
Evaluation, vol.7, no.7, July, pp.19.
Couch, A.S., Keninston, K., 1960, Yeasayers and naysayers agreeing response set as a personality variable, Journal of Abnormal and Social Psychology, vol.60, March, pp.151174.
Cowles, D., Crosby, L.A., 1986, Measure validation in consumer research A confirmatory
factor analysis of the voluntary simplicity lifestyle scale, in: Lutz, R.J. (ed.), Advances in Consumer Research, vol.13, Association for Consumer Research, pp.392397.
Cox, E.P., 1980, The optimal number of response alternatives for a scale: a review, Journal of
Marketing Research, vol.17, November, pp.407422.
Cox, T.F., Cox, M.A.A., 1994, Multidimensional scaling, Chapman and Hall, London.
Crandall, J.E., 1982, Social interest, extreme response style, and implications for adjustment,
Journal of Research in Personality, vol.16, March, pp.8289.
Crawford, C.B., Ferguson, G.A., 1970, A general rotation criterion and its use in orthogonal
rotation, Psychometrika, vol.35, pp.321332.
Crissman, P., 1942, Temporal changes and sexual difference in moral judgements, Journal of
Social Psychology, vol.52, pp.117118.
Crocker, L., Algina, J., 2008, Introduction to classical and modern test theory, Cengage Learning, Mason.
Cronbach, L.J., 1947, Test reliability its meaning and determinantion, Psychometrika, vol.12,
no.1, March, pp.116.
Cronbach, L.J., 1951, Coefficient Alpha and the internal structure of tests, Psychometrika,
vol.16, pp.297334.
Cronbach, L.J., 1970, Review of on the theory of achievement test items, Psychometrika, vol.35,
pp.509551.
Cronbach, L.J., Gleser, G.C., Nanda, H., Rajaratnam, N., 1972, The dependability of behavioral
measurements theory of generalizability for scores and profiles, John Wiley and Sons, New
York.
Cronbach, L.J., Meehl, P.E., 1955, Construct validity in psychological tests, Psychological Bulletin, vol.52, pp.281302.
Cronbach, L.J., Rajaratnam, N., Gleser, G.C., 1963, Theory of generalizability: A liberalization
of reliability theory, British Journal of Statistical Psychology, vol.16, pp. 137163.
Crosby, L.A., Bitner, M.J., Gill, J.D., 1990, Organizational structure of values, Journal of Business Research, vol.20, pp.123134.

References

439

Cudeck, R., 2000, Exploratory factor analysis, in: Tinsley, H.E.A., Brown, S.D. (eds.), Handbook of applied multivariate statistics and mathematical modeling, Academic Press, San
Diego CA, pp.265296.
Cudeck, R., Browne, M.W., 1983, Cross-validation of covariance structures, Multivariate Behavioral Research, vol.18, pp.147167.
Curran, P.J., West, S.G., Finch, J.F., 1996, The robustness of test statistics to nonnormality and
specification error in confirmatory factor analysis, Psychological Methods, vol.1, pp.1629.
Das, J.P., Dutta, T., 1969, Some correlates of extreme response set, Acta Psychologica, vol.29,
no.1, pp.8592.
Davidov, E., Depner, F., 2011, Testing for measurement equivalence of human values across
online and paper-and-pencil surveys, Quality and Quantity, vol. 45, no. 2, pp.375390.
Davidov, E., Schmidt, P., Schwartz, S.H., 2008, Bringing values back in: A multiple group comparison with 20 countries using the European Social Survey, Public Opinion Quarterly,
pp.234245.
Dayton, C.M., MacReady, G.B., 1980, A scaling model with response errors and intrinsically
unscalable examinees, Psychometrika, vol.45, pp.343356.
Deaux, K., 1996, Social identification, in: Higgins, E.T., Kruglanski, A.W. (eds.), Social psychology handbook of basic principles, Guilford Press, New York, pp.777798.
DeCarlo, L., 1997, On the meaning the kurtosis, Psychological Methods, vol. 2, no. 3,
pp.292307.
Dempsey, P., Dukes, W., 1966, Judging complex value stimuli An examination and revision of
Morriss paths of life, Educational and Psychological Measurement, vol.26, pp.871882.
DeVellis, R.F., 2003, Scale development theory and applications, Sage Publications, London.
Dewey, J., 1939, Theory of valuation, University Chicago Press, Chicago, IL.
Diamantopoulos, A., 2005, The C-OAR-SE procedure for scale development in marketing
Acomment, International Journal of Research in Marketing, vol.22, pp.19.
Diamantopoulos, A., 2006, The error term in formative measurement models interpretation
and modeling implications, Journal Model Management, vol.1, no.1, pp.717.
Diamantopoulos, A., Siguaw, J.A., 2006, Formative versus reflective indicators in organizational measure development A comparison and empirical illustration, British Journal of
Management, vol.17, no.4, pp.263282.
Diamantopoulos, A., Winklhofer, H.M., 2001, Index construction with formative indicators An alternative to scale development, Journal of Marketing Research, vol.38, no.5,
pp.269277.
DiBello, L.V., Stout, W.F., Roussos, L., 1995, Unified cognitive psychometric assessment likelihood-based classification techniques, in: Nichols, P.D., Chipman, S.F., Brennan, R.L. (eds.),
Cognitively diagnostic assessment, Lawrence Erlabum Associatiates, Hillsdale.
Dichter, E., 1960, The strategy of desire, Doubleday, New York.
Diesing, P., 1962, Reason in society five types of decisions and their social conditions, University of Illinois Press, Urbana, IL.
Ding, L., Velicer, W.F., Harlow, L.L., 1995, Effects of estimation methods, number of indicators
per factor, and improper solutions on structural equation modeling, Structural Equation
Modeling, vol.2, no.2, pp.11291144.
Dingle, H., 1950, A theory of measurement, British Journal Philosophical Science, vol. 1,
pp.526.

440

References

Dodds, W.B., Monroe, K.B., 1985, The effect of brand and price information on subjective product evaluations, Advances in Consumer Research, vol.12, Association for Consumer Research, pp.8590.
Dolan, C.V., 1994, Factor analysis of variables with 2, 3, 5 and 7 response categories A comparison of categorical variable estimators using simulated data, British Journal of Mathematical and Statistical Psychology, vol.47, pp.309326.
Doniec, R., 1991, Stanisawa Brzozowskiego filozofia pracy jako wartoci i tworzywa kultury,
in: Adamski, F. (red.), Spr o wartoci w kulturze i wychowaniu, Uniwersytet Jagielloski,
Krakw, p. 5759.
Dressel, P.L., 1940, Some remarks on the Kuder-Richardson reliability coefficient, Psychometrika, vol.5, no.4, December, pp.305310.
Drolet, A.L., Morrison, D.G. , 2001, Do we really need multiple-item measures in service research?, Journal of Service Research, vol.3, February, pp.196204.
DuBois, P.H., 1970, A history of psychological testing, Allyn and Bacon, Boston.
Dubois, B., Laurent, G., 1994, Attitudes toward the concept of luxury An exploratory analysis,
Asia Pacific Advances in Consumer Research, vol.1, Association for Consumer Research,
pp.273278.
Duliniec, E., 1994, Badania marketingowe w przedsibiorstwie, PWE, Warszawa.
Duncan, O.D., 1984, Notes on social measurement historical and critical, Russell Sage, New
York.
Eagly, A.H., Chaiken, S., 1998, Attitude structure and function, in: Gilbert, D.T., Fisk, S.T.,
Lindsey, G. (eds.), Handbook of social psychology, McGowan-Hill, New York, pp.269322.
Ebel, R.L., 1982, Proposed solutions to two problems of test construction, Journal of Educational
Measurement, vol.19, pp.267278.
Edwards, A.L., 1957, Techniques of attitude scale construction, Appleton-Century-Crofts, New
York.
Edwards, J.R., Bagozzi, R.P., 2000, On the nature and direction of relationships between constructs and measures, Psychological Methods, vol.5, no.2, pp.155174.
Eisenhart, C., 1947, The assumptions underlying the analysis of variance, Biometrics, vol.3,
pp.121.
Embretson, S.E., 1984, A general latent trait model for response processes, Psychometrika,
vol.49, pp.175186.
Embretson, S.E., 1991, A multidimensional latent trait model for measuring learning and
change, Psychometrika, vol.56, pp.495516.
Embretson, S.E., 1996, The new rules of measurement, Psychological Assessment, vol.8, no.4,
pp.341349.
Embretson, S.E., 1997, Structured ability models in tests designed from cognitive theory, in:
Wilson, M., Engelhard, G., Draney, K. (eds.), Objective measurement III, Ablex, Norwood,
pp.223236.
Embretson, S.E. (ed.), 2010, Measuring psychological constructs advances in model-based
approaches, American Psychological Association, Washington, D.C.
Embretson, S.E., DeBoeck, P., 1994, Latent trait theory, in: Sternberg, R.J. (ed.), Encyclopedia
of human intelligence, Macmillan, New York, pp.40214017.
Embretson, S.E., Reise, S.P., 2000, Item response theory for psychologists, Lawrence Erlbaum
Associates, New Jersey.
Enders, C.K., Bandalos, D.L., 1999, The effects of heterogeneous item distributions on reliability,
Applied Measurement in Education, vol.12, no.2, pp.133150.

References

441

Enders, C.K., Bandalos, D.L., 2001, The relative performance of full information maximum
likelihood estimation for missing data in structural equation models, Structural Equation
Modeling, vol.8, pp.430457.
England, G.W., 1967, Personal value systems of American managers, Academy of Management
Journal, vol.10, pp.5368.
Erickson, R.J., 1995, The importance of authenticity for self and society, Symbolic Interaction,
vol.18, pp.121144.
Everitt, B.S., 1975, Multivariate analysis the need for data, and other problems, British Journal of Psychiatry, vol.126, pp.237240.
Eysenck, H.J., 1953, The logical basis for factor analysis, American Psychologist, vol.8, no.3,
March, pp.105114.
Faber, R.J., OGuinn, T.C., 1989, Classifying compulsive consumers: Advances in the development of a diagnostic tool, Advances in Consumer Research, vol.16, Association for Consumer Research, pp.97109.
Fabrigar, L.R., MacCallum, R.C., Wegener, D.T., Strahan, E.J., 1999, Evaluating the use of exploratory factor analysis in psychological research, Psychological Methods, vol. 4, no. 3.
pp.272299.
Fan, X., Thompson, B., Wang, L., 1999, The effects of sample size, estimation methods, and
model specification on SEM fit indices, Structural Equation Modeling, vol.6, pp.5683.
Fan, X., Wang, L., 1998, Effects of potential confounding factors on fit indices and parameter
estimates for true andmisspecified SEM models, Educational and Psychological Measurement, vol.58, pp.701735.
Fava, J.L., Velicer, W.F., 1992, The effects of overextraction on factor and component analysis,
Multivariate Behavioral Research, vol.27, pp.387415.
Feather, N.T., 1975, Values and income level, Australian Journal of Psychology, vol. 27,
pp.2329.
Feather, N.T., 1995, Values, valences, and choice the influence of values on the perceived
attractiveness and choice of alternatives, Journal of Personality and Social Psychology,
vol.86, pp.11351151.
Feather, N.T., Peay, E.R., 1975, The structure of terminal and instrumental values dimensions
and clusters, Australian Journal of Psychology, vol.27, pp.151164.
Fechner, G.T., 1860, Elemente der Psychophysik, Breitkopf and Hrtel, Leipzig.
Feinberg, F.M., Kahn, B.E., McAlister, L., 1992, Market share response when consumers seek
variety, Journal of Marketing Research, vol.29, no.2, pp.227237.
Feldt, L.S., 1993, The relationship between the distribution of item difficulties and test reliability,
Applied Measurement in Education, vol.6, pp.3748.
Ferguson, G.A., Takane, Y., 1999, Analiza statystyczna w psychologii i pedagogice, Wydawnictwo Naukowe PWN, Warszawa.
Finney, D.J., 1977, Dimensions of statistics, Applied Statistician, vol.26, pp.285289.
Firth, R., 1953, The study of values by social anthropology, Man, vol.20.
Fischer, E., Arnold, S.J., 1990, More than a labor of love gender roles and Christmas gift shopping, Journal of Consumer Research, vol.17, no.3, pp.333345.
Fitzsimmons, G.W., Macnab, D., Casserly, C., 1985, Technical manual for the life roles inventory scale and the salience inventory, PsiCan Consulting Ltd., Edmonton, Alberta, Canada.
Flora, D.V., Curran, P.J., 2004, An empirical evaluation of alternative methods of estimation
for confirmatory factor analysis with ordinal data, Psychological Methods, vol. 9, no. 4,
pp.466491.

442

References

Floyd, F.J., Widaman, K., 1995, Factor analysis in the development and refinement of clinical
assessment instruments, Psychological Assessment, vol.7, no.3, pp.286299.
Flynn, L.R., Pearcy, D., 2001, Four subtle sins in scale development some suggestions for
strengthening the current paradigm, International Journal of Market Research, vol. 43,
Quarter 4, pp.429423.
Ford, J.K., MacCallum, R.C., Tait, M., 1986, The application of exploratory factor-analysis in
applied-psychology A critical-review and analysis, Personnel Psychology, vol.39, no.2,
pp.291314.
Fornell, C., 1983, Issues in the application of covariance structure analysis A comment, Journal of Consumer Research, vol.9, March, pp.443447.
Frankena, W., 1967, Value and Valuation, in: Edwards, P. (ed.), The encyclopedia of philosophy,
vol.8, The Macmillan Company, New York, pp.229232.
Frederiksen, N., 1981, The real test bias, Research Report, no. 8140, Educational Testing
Service, Princeton, N.J.
Freud, S., 1955, Beyond the pleasure principle, in: Strackey, J. (ed.), The standard edition of the
complete psychological works of Sigmund Freud, vol.18, Ho-Garth Press, London.
Friedman, H.H., Amoo, T., 1999, Rating the rating scales, Journal of Marketing Management,
vol.9, no.3, pp.114123.
Friedman, I.A., 1996, Deliberation and resolution in decision-making processes Aself-report
scale for adolescents, Educational and Psychological Measurement, vol.56, pp.881890.
Furnham, A., Argyle, M., 1998, The psychology of money, Routledge, London.
Gaito, J., 1960, Scale classification and statistics, Psychological Review, vol.67, no.4, pp.277278.
Gaito, J., 1980, Measurement scales and statistics: resurgence of an old misconception, Psychological Bulletin, vol.87, pp.564567.
Galbraith, J.K., 1991, Ekonomia w perspektywie krytyka historyczna, PWE, Warszawa.
Galewicz, W., 1987, N. Hartmann, Wiedza Powszechna, Warszawa.
Ganter, B., Wille, R., 1999, Formal concept analysis mathematical foundations, SpringerVerlag, Berlin.
Garbarski, L., Rutkowski, I., Wrzosek, W., 2001, Marketing punkt zwrotny nowoczesnej
firmy, PWE, Warszawa.
Gatnar, E., 2003, Statystyczne modele struktury przyczynowej zjawisk ekonomicznych, Prace
Naukowe Akademii Ekonomicznej w Katowicach, Katowice.
Gecas, V., 2000, Values identities, self-motives and social movements, in: Stryker, S., Owens,
T.J., White R.W. (eds.), Self, identity, and social moevements, University of Minnesota
Press, Minneapolis, pp.93109.
Gerbing, D.W., Anderson, J.C., 1984, On the meaning of within-factor correlated measurement
errors, Journal of Consumer Research, vol.11, no.1, June, pp.572580.
Gerbing, D.W., Hamilton, J.G., 1996, Viability of exploratory factor analysis as a precursor to
confirmatory factor analysis, Structural Equation Modeling, vol.3, no.1, pp.6272.
Gifi, A., 1990, Nonlinear multivariate analysis, Wiley, Chichester.
Gilgen, A.R., Cho, J.H., 1979, Questionnaire to measure eastern and western thought, Psychological Reports, vol.44, pp.835841.
Givon, M.M., Shapira, Z., 1984, Response to rating scales A theoretical model and its application to the number of categories problem, Journal of Marketing Research, vol. 21, November, pp.410419.
Goldberg, M.E., Gorn, G.J., Peracchio, L.A., Bamosy, G., 2003, Understanding youth, Journal
of Consumer Psychology, vol. 13, pp.278288.

References

443

Goodenough, W.A., 1944, A technique for scale analysis, Educational and Psychological Measurement, vol.7, pp.247279.
Gordon, L.V., 1960, Survey of interpersonal values, Science Research Associates, Chicago.
Gorlow, L., Noll, G.A., 1967, A study of empirically derived values, Journal of Social Psychology, vol.73, pp.261269.
Grniak, J., 2008, Pierwsze kroki w analizie danych SPSS for Windows, SPSS Polska, Krakw.
Grniak, J., 2010, Wprowadzenie do analizy danych w marketingu analiza rynku przy uyciu
technik wielowymiarowych, SPSS Polska, Krakw.
Gorsuch, R.L., 1970, A comparison of biquartimin, maxplane, promax and varimax, Educational and Psychological Measurement, vol. 30, pp.861872.
Gorsuch, R.L., 1974, Factor analysis, W.B. Sauncters, Philadelphia.
Gorsuch, R.L., 1983, Factor analysis, 2nd ed., Lawrence Erlbaum Associates, Hillsdale, NJ.
Gorsuch, R.L., 1990, Common factor-analysis versus component analysis some well and little
known facts, Multivariate Behavioral Research, vol.25, no.1, pp.3339.
Gorsuch, R.L., 1997, New procedures for extension analysis in exploratory factor analysis, Educational and Psychological Measurement, vol.57, pp.725740.
Gorsuch, R.L., McFarland, S.G., 1972, Single versus multiple-item scales for measuring religious
values, Journal for the Scientific Study of Religion, vol.11, no.1, pp.5364.
Goude, G., 1962, On fundamental measurement in psychology, Almqvist och Wiksell, Stockholm.
Gould, S.J., 1988, Consumer attitudes toward health and health care: A differential perspective,
Journal of Consumer Affairs, vol.22, pp.96118.
Gower, J.C., Hand, D.J., 1996, Biplots, Chapman and Hall, London.
Graeber, D., 2001, Toward an anthropological theory of value the false coin of our own dreams,
Palgrave, New York.
Green, B.F., Jr., 1950, A test of the equality of standard errors of measurement, Psychometrika,
vol.15., no.3, September, pp.251257.
Green, P.E., Rao, V.R., 1970, Rating scales and information recovery How many scales and
response categories to use?, Journal of Marketing, vol. 34 (July), pp.3339.
Greene, V.L., Carmines, E.G., 1979, Assessing the reliability of linear composites, in: Schuessler,
K.F. (ed.), Sociological methodology, Jossey-Bass, San Francisco, pp.160175.
Grice, J.W., 2001, Computing and evaluating factor scores, Psychological Methods, vol. 6,
pp.430450.
Griffin, M., Babin, B.J., Modianos, D., 2000, Shopping value of russian consumers: The impact
of habituation in a developing economy, Journal of Retailing, vol.76, no.1, pp.3352.
Grzegorczyk, A., 1970, O pojciu wartoci w antropologii kultury, Studia Socjologiczne, nr 1.
Guadagnoli, E., Velicer, W.F., 1988, Relation of sample-size to the stability of component patterns, Psychological Bulletin, vol.103, no.2, pp.265275.
Guilford, J.P., 1950, Psychometric methods, 2nd ed., McGraw-Hill, New York.
Guion, R.M., 1950, On Trinitarian doctrines of validity, Professional Psychology, vol. 11,
pp.385398.
Guion, R.M., 2005, Trafno i rzetelno testw psychologicznych, in: Brzeziski, J. (red.),
Wybr tekstw, Gdaskie Wydawnictwo Psychologiczne, Gdask.
Guion, R.M., Ironson, G.H., 1983, Latent trait for organizational research, Organizational Behavior and Performance, vol.31, pp.5487.
Gulliksen, H., 1950, Theory of mental tests, Wiley, New York.

444

References

Guttman, L., 1944, A basis for scaling qualitative data, American Sociological Review, no.9,
pp.139150.
Guttman, L., 1945, A basis for analyzing test-retest reliability, Psychometrika, vol. 10,
pp.255282.
Guttman, L., 1947, The Cornell technique for scale and intensity analysis, Educational and Psychological Measurement, vol.7, pp.247279.
Guttman, L., 1953, Image theory for the structure of quantitative variates, Psychometrika,
vol.18, pp.277296.
Guttman, L., 1954, Some necessary conditions for common-factor analysis, Psychometrika,
vol.19, pp.149161.
Guttman, L., 1955, The determinacy of factor score matrices with implications for five other
basic problems of common-feature theory, British Journal of Statistical Psychology, vol.8,
pp.6581.
Guttman, L., 1968, A general nonmetric technique for finding the smallest coordinate space for
a configuration of points, Psychometrika, vol.33, pp.469506.
Guttman, L., 1977, What is not what in statistics, The Statistician, vol.26, pp.11811107.
Hair, J.F., Black, W.C., Babin, B.J., Anderson, R.E., 2010, Multivariate data analysis global
perspective, Pearson, London.
Halaby, C.N., 2003, Where job values come from family and schooling background, cognitive
ability, and gender, American Sociological Review, vol.68, pp.251278.
Hambleton, R.K., Swaminathan, H., Rogers, H.J., 1991, Fundamentals of item response theory,
Kluwer, Nijhoff, Hingham, MA.
Hamilton, D.L., 1968, Personality attributes associated with extreme response style, Psychological Bulletin, vol.69, no.3, pp.192203.
Hammersley, M., 1987, Some notes on the terms validity and reliability, British Educational
Research Journal, vol.13, no.1, pp.7381.
Hand, D.J., 1996, Statistics and the theory of measurement, Journal of The Royal Statistical
Society, series A, vol.159, no.3, pp.445492
Harding, S., Philips, D., 1986, Contrasting values in Western Europe unity, diversity and
change, Macmillan, London.
Harman, H.H., 1976, Modern factor analysis, 3rd ed., University of Chicago Press, Chicago.
Harrington, D., 2009, Confirmatory factor analysis, Oxford University Press, New York.
Harris, C.W., Kaiser, H.F., 1964, Oblique factor analytic solutions by orthogonal transformations, Psychometrika, vol.29, pp.347362.
Harris, M.M., Sackett, R.P., 1987, A factor analysis and item response theory analysis of an
employee honesty test, Journal of Business and Psychology, vol.2, pp.122135.
Hartman, R.S., 1967, The structure of values, Southern Illinois University Press, Carbondale, IL.
Harvey, R.J., Billings, R., Nilan, K.J., 1985, Confirmatory factor analysis of the job diagnostic survey good news and bad news, Journal of Applied Psychology, vol.70, no.3, pp.461468.
Hattie, J., 1985, Methodology review assessing unidimensionality of tests and items, Applied
Psychological Measurement, vol.9, pp.139164.
Haynes, S., Richard, D.C., Kubany, E.S., 1995, Content validity in psychological assessment Afunctional approach to concepts and methods, Psychological Assessment, vol.7,
pp.238247.
Hechter, M., Nadel, L., Michod, R.E. (eds.), 1993, The origin of values, de Gruyter, New York.
Heeler, R.M., Ray, M.L., 1972, Measure validation in marketing, Journal of Marketing Research, vol.9, November, pp.361370.

References

445

Heise, D.R., 1972, Employing nominal variables, induced variables, and block variables in path
analysis, Sociological Methods and Research, vol.1, pp.147173.
Heise, D.R., Bohrnstedt, G.W., 1970, Validity, invalidity and reliability, in: Borgatta,E.F., Bohrnstedt, G.W. (eds.), Sociological methodology, Jossey-Bass, San Francisco, pp.104129.
Hempel, C.G., 1953, Methods of concept formation in science, in: International encyclopedia of
unified science, University of Chicago Press, Chicago, pp.2338.
Hempel, C.G., 1956, A logical appraisal of operationism The validation of science theories,
Beacon Press, Boston, pp.5258.
Hendrickson, A.E., White, P.O., 1964, Promax A quick method for rotation to oblique simple
structure, The British Journal of Statistical Psychology, vol.17, pp.6570.
Henry, W., 1976, Cultural values do correlate with consumer behavior, Journal of Marketing
Research, vol.13, no.2, May, pp.121127.
Herche, J., 1994, Measuring social values A multi-item adaptation to the list of values
(MILOV), Working Paper Report, no.94101, Marketing Science Institute, Cambridge,
MA.
Hewitt, J.P., 1997, The self and society A symbiolic interactionist social psychology, 7th ed.,
Allyn and Bacon, Boston.
Heywood, H.B., 1931, On finite sequences of real numbers, Proceedings of the Royal Society,
Series A, vol.134, pp.486501.
Hill, T., Lewicki, P., 2006, Statistics methods and applications, Statsoft, Tulsa.
Hilliard, A.L., 1950, The forms of value the extension of hedonistic axiology, Columbia University Press, New York.
Hirschman, E.C., 1980, Innovativeness, novelty seeking, and consumer creativity, Journal of
Consumer Research, vol.7, no.3, pp.288295.
Hirschman, E.C., 1982, Ethnic variation in hedonic consumption, Journal of Social Psychology,
vol.118, no.2, pp.225234.
Hirschman, E.C., 1983, Predictors of self-projection, fantasy fulfillment, and escapism, Journal
of Social Psychology, vol.120, June, pp.6376.
Hirschman, E.C., Holbrook, M.B., 1982, Hedonic consumption emerging concepts, methods,
and propositions, Journal of Marketing, vol.46, July, pp.92101.
Hitlin, S., 2003, Values as the core of personal identity drawing links between two theories of
self, Social Psychology Quarterly, vol.66, no.2, pp.118137.
Hitlin, S., Piliavin, J.A., 2004, Values reviving a dormant concept, Annual Review Sociological, vol.30, pp.359393.
Hoffman, P., 1963, Test reliability and practice effects, Psychometrika, vol.28, no.3, September,
pp.273288.
Hofstede, G., 1980, Cultures consequences: International differences in work-related values,
Sage, Beverly Hills, CA.
Holbrook, M.B., 1980, Some preliminary notes on research in consumer esthetics, Advances in
Consumer Research, vol.VII, Association for Consumer Research, pp.104108.
Holbrook, M.B., 1984, Situation-specific ideal points and usage of multiple dissimilar brands,
Research in Marketing, vol.7, pp.93131.
Holbrook, M.B., 1993, Nostalgia and consumption preferences some emerging patterns in
consumer tastes, Journal of Consumer Research, vol.20, pp.245256.
Holbrook, M.B., Corfman, K.P., 1985, Quality and value in the consumption experience phaedrus rides again, in: Jacoby, J., Olson, J.C. (eds.), Perceived quality how consumers view
stores and merchandise, Heath, D.C. Company, Lexington, MA, pp.3157.

446

References

Holbrook, M.B., Hirschman, E.C., 1982, The experiential aspects of consumption consumer
fantasies, feelings, and fun, Journal of Consumer Research, vol.9, September, pp.132140.
Holzinger, K.J., Harman, H.H., 1941, Factor analysis A synthesis of factorial methods, University of Chicago Press, Chicago.
Homans, G., 1951, The human group, Routledge, London.
Homer, P.M., Kahle, L.R., 1988, A structural equation analysis of the value-attitude-behavior
hierarchy, Journal of Personality and Social Psychology, vol.54, pp.683646.
Horn, J.L., 1969, On the internal consistency reliability of factors, Multivariate Behavioral Research, vol.4, pp.115125.
Hornowska, E., 1989, Operacjonalizacja wielkoci psychologicznych zaoenia, struktura,
konsekwencje, Ossolineum, Wrocaw.
Hornowska, E., 2000, Operacjonalizacja terminw teoretycznych czynniki, wielko i cecha,
in: Strelau, W.J. (red.), Psychologia podrcznik akademicki, t.1, Gdaskie Wydawnictwo
Psychologiczne, Gdask, pp.389400.
Hornowska, E., 2001, Testy psychologiczne teoria i praktyka, Scholar, Warszawa.
Hotelling, H., 1933, Analysis of a complex of statistical variables into principal components,
Journal of Educational Psychology, vol. 24, pp.417441.
Hoyle, R.H. (ed.), 1995, Structural equation modeling concepts, issues, and applications, Sage,
Thousand Oaks, CA.
Hoyle, R.H., 2000, Confirmatory factor analysis, in: Tinsely, H.E.A., Brown S.D. (ed.), Handbook of applied multivariate statistics and mathematical modeling, Academic Press, New
York.
Hoyle, R.H., Panter, A.T., 1995, Writing about structural equation models, in: Hoyle, R.H.
(ed.), Structural equation modeling concepts, issues, and applications, Sage, Thousand
Oaks, CA, pp.158176.
Hoyt, C., 1941, Test reliability estimated by analysis of variance, Psychometrica, vol.6, pp.153
160.
Hsee, C.K., 1996, Elastic justification how unjustifiable factors influence judgments, Organizational Behavior and Human Decision Processes, vol.66, no.1, pp.122129.
Hu, L.T., Bentler, P.M., 1995, Evaluation model fit, in: Hoyle, R. (ed.), Structural equation modeling concepts, issues, and applications, Sage, Thousand Oaks, CA.
Hu, L.T., Bentler, P.M., 1998, Fit indices in covariance structure modeling sensitivity to underparameterized model misspecification, Psychological Methods, vol.3, pp.424453.
Hu, L.T., Bentler, P.M., 1999, Cutoff criteria for fit indexes in covariance structure analysis conventional criteria versus new alternatives, Structural Equation Modeling, vol.6, pp.155.
Hu, L.T., Bentler, P. M., Kano, Y., 1992, Can test statistics in covariance structure analysis be
trusted?, Psychological Bulletin, vol.112, pp.351362.
Hughes, G.D., 1968, Measurement the neglected half of marketing theory, in: King, R.L. (ed.),
Marketing and the new science of planning, American Marketing Association, Chicago,
pp.151153.
Hui, C.H., Triandis, H.C, 1985, The instability of response sets, Public Opinion Quarterly,
vol.49, Summer, pp.253260.
Hulin, C.L., Drasgow, F., Parsons, C.K., 1983, Item response theory application to psychological measurement, Dow-Jones-Irwin, Homewood.
Humphreys, L.G., Montanelli, R.G., 1975, An investigation of the parallel analysis criterion
for determining the number of common factors, Multivariate Behavioral Research, vol.10,
pp.193205.

References

447

Hurley, A.E., Scandura, T.E., Schriesheim, C.A., Brannick, M.T., Seers, A., Vanderberg, R.J.,
Willimas, L.J., 1997, Exploratory and confirmatory factor analysis guidelines, issues, and
alternatives, Journal of Organizational Behavior, vol.18, iss. 6, pp.667683.
Hurley, J.R., Cattell, R.B., 1962, The Procrustes program producing direct rotation to test ahypothesized factor structure, Behavioral Science, vol.7, pp.258262.
Hutchinson, S.R., Olmos, A., 1998, Behavior of descriptive fit indexes in confirmatory factor
analysis using ordered categorical data, Structural Equation Modeling, vol.5, pp.344364.
Iacobucci, D., Coughlan, A.T., Duhachek, 2005, Results on the standard error of the coefficient
alpha index of reliability, Marketing Science, vol.24, no.2, Spring, pp.294301.
Ihara, Y., Berkane, M., Bentler, P.M., 1990, Covariance structure analysis with heterogeneous
kurtosis parameters, Biometrika, vol.77, pp.575585.
Ingarden, R., 1966, Przeycie, dzieo, warto, Wydawnictwo Literackie, Krakw.
Inglehart, R., 1977, The silent revolution, Princeton University Press, Princeton, NJ.
Inglehart, R., 1979, Value priorities and socioeconomic change, in: Barnes, S.H., Kaase, M.
(eds.), Political action, mass participation in five Western democracies, Sage, London,
pp.305342.
Inglehart, R., 1981, Post-materialism in an environment of insecurity, American Political Science Review, vol.75, pp.880900.
Inglehart, R., 1997, Modernization and postmodernization cultural, economic, and political
change in 43 societes, Princeton Univeristy Press, Princeton, NJ.
Inglehart, R., Baker, W.E., 2000, Modernization, cultural change, and the persistence of traditional values, American Sociological Review, vol.65, pp.1951.
Iwawaki, S., Zax, Z., 1969, Personality dimensions and extreme response tendency, Psychological Reports, vol.25, no.1, pp.3134.
Izard, C.E., Buechler, S., 1980, Aspects of conscious-ness and personality in terms of differential
emotions theory, in: Plutchik, R., Kellerman, H. (eds.), Emotion theory, research and
experience, Academic Press, New York.
Jackson, D.L., 2007, The effect of the number of observations per parameter in misspecified confirmatory factor analytic models, Structural Equation Modeling, vol.14, pp.4876.
Jackson, D.L., Gillapsy, J.A., Stephenson, R., 2009, Reporting practices in confirmatory factor
analysis An overwiew and some recommendations, Psychological Methods, vol.14, no.1,
pp.623.
Jackson, P., 1979, A note on the relation between coefficient Alpha and Guttmans split-half
lower bounds, Psychometrika, vol.44, no.2, June, pp.251252.
Jackson, R.W.B., 1942, Note on the relationship between internal consistency and test-retest
estimates of the reliability of test, Psychometrika, vol.7, no.3, September, pp.157164.
James, L.R., Mulaik, S.A, Brett, J.M., 1982, Causal analysis, assumptions, models, and data,
Sage, Beverly Hills, CA.
Jarvis, C.B., Mackenzie, S.B., Podsakoff, P.M., 2003, A critical review of construct indicators
and measurement model misspecification in marketing and consumer research, Journal of
Consumer Research, vol.30, no.3, pp.199218.
Jennrich, R.I., 2002, A simple general procedure for oblique rotation, Psychometrika, vol.66,
pp.289306.
Jennrich, R.I., Sampson, P.F., 1966, Rotation for simple loadings, Psychometrika, vol. 31,
pp.313323.
Joas, H., 2000, The genesis of values, Polity, Cambridge.

448

References

Johnson, M.K., 2002, Social origins, adolescent experiences, and work value trajectories during
the transition to adulthood, Social Forces, vol.80, pp.13071341.
Johnson, R.M., 1971, Market segmentation A strategic management tool, Journal of Marketing Research, vol.8, February, pp.1318.
Jones, R.R., Rorer, L.G., 1973, Response biases and trait descriptive adjectives, Multivariate
Behavioral Research, vol.8, July, pp.31330.
Jreskog, K.G., 1967, Some contributions to maximum likelihood factor analysis, Psychometrika, vol.32, pp.443482.
Jreskog, K.G., 1971a, Simultaneous factor analysis in several populations, Psychometrika,
vol.36, pp.409426.
Jreskog, K.G., 1971b, Statistical analysis of sets of congeneric tests, Psychometrika, vol.36,
pp.109133.
Jreskog, K.G., 1974, Analyzing psychological data by structural analysis of covariance matrices, in: Krantz, D.H., Atkinson, R.C., Luce, R.D., Suppes, P. (eds.), Contemporary developments in mathematical psychology, vol.2, Freeman, San Francisco, pp.156.
Jreskog, K.G., 1994, On the estimation of polychoric correlations and their asymptotic covariance matrix, Psychometrika, vol.59, pp.381389.
Jreskog, K.G., Goldberger, A.S., 1972, Factor analysis by generalized least squares, Psychometrika, vol.37, no.3, pp.243260.
Jreskog, K.G., Reyment, R., 1996, Applied factor analysis in the natural sciences, Cambridge
University Press, Cambridge.
Jreskog, K.G., Srbom, D., 1981, Analysis of linear structural relationships by maximum
likelihood and least squares methods, Research Report, University of Uppsala, Sweden,
pp.8188.
Jreskog, K.G., Srbom, D., 1984, LISREL 6 Analysis of linear structural relationships by the
method of maximum likelihood, National Educational Resources, Chicago.
Jreskog, K.G., Srbom, D., 1989, LISREL 7 A guide to the program and applications, SPSS
Inc., Chicago:
Jreskog, K.G., Srbom, D., 1993, LISREL 8 The structural equation modeling with the
SIMPLIS command language, SSI, Chicago.
Jreskog, K.G., Srbom, D., 1996, LISREL 8 Users reference guide, Scientific Software, Chicago.
Kahl, J.A., 1968, The measurement of modernism A study of values in Brasil and Mexico,
University of Texas Press, Austin.
Kahle, L.R., 1980, Stimulus condition self-selection by males in the interaction of locus control
and skill-chance situation, Journal of Personality and Social Psychology, vol.38, pp.5056.
Kahle, L.R. (ed.), 1983, Social values and social change adaptation to life in America, Praeger,
New York.
Kahle, L.R., 1986, The nine nations of North America and the value basis of geographic segmentation, Journal of Marketing, vol.50, pp.3747.
Kahneman, D., Tversky, A., 1979, Prospect theory An analysis of decision under risk, Econometrica, vol.47, pp.263269.
Kaiser, H.F., 1958, The varimax criterion for analytic rotation in factor analysis, Psychometrika, vol.23, pp.187200.
Kaiser, H.F., 1960, The application of electronic computers in factor analysis, Educational and
Psychological Measurement, vol.20, pp.111151.

References

449

Kaiser, H.F., Caffrey, J., 1965, Alpha factor analysis, Psychometrika, vol. 30, no. 1, March,
pp.114.
Kaiser, H.F., Madow, W.G., 1974. The KD method for the transformation problem in exploratory factor analysis, paper presented to the Psychometric Society, March, Palo Alto, CA.
Kamakura, W.A., Balasubramanian, S.K., 1989, Tailored interviewing An application of item
response theory for personality measurement, Journal of Personality Assessment, vol.53,
pp.502519.
Kanter, D.L., 1978, The europeanizing of america A study in changing values, Proceedings of
Association for Consumer Research, pp.408410.
Kaplan, D., 1995, Statistical power in structural equation modeling, in: Hoyle, R.H. (ed.), Structural equation modeling concepts, issues, and applications, Sage, Thousand Oaks, CA,
pp.100117.
Kaplan, D., 2000, Structural equation modeling foundations and extensions, Sage, Thousand
Oaks, CA.
Katz, D., 1960, The functional approach to the study of attitudes, Public Opinion Quarterly,
vol.6, pp.248268.
Katz, D., Scotland, E., 1959, A preliminary statement to a theory of attitudes structure and
change, in: Koch, S. (ed.), Psychology A study of science, vol.3, McGraw Hill, New York.
Katz, E., Lazarsfeld, P.F., 1955, Personal influence the part played by people in the flow of mass
communications, The Free Press, New York.
Kaydos, W.J., 1999, Operational performance measurement, CRC Press, Florida.
Kelley, T.L., 1927, Interpretation of educational measurements, Macmillan, New York.
Kelley, T.L., 1940, Comment on Wilson and Worcesters note on factor analysis, Psychometrika,
vol.5, pp.117120.
Kelloway, K.E., 1995, Structural equation modeling in perspective, Journal of Organizational
Behavior, vol.16, pp.215224.
Kelly, G.A., 1958, The theory and technique of assessment, Annual Review of Psychology,
vol.9, February, pp.323352.
Kendall, M.G., 1950, Factor analysis, Royal Statistical Society, vol.12, pp.6073.
Kenny, D.A., 1979, Correlation and causality, Wiley-Interscience, New York.
Kenny, D.A., Kashy, D.A., Bolger, N., 1988, Data analysis in social psychology, in: Gilbert, D.,
Fiske, S, Lindzey, G. (eds.), The handbook of social psychology, vol.1, 4th ed., McGraw-Hill,
Boston, MA.
Kerlinger, F., 1964, Foundations of behavioural research, Holt, Rinehart and Winston, New
York.
Kiesler, S., Sproull, L.S., 1986, Response effects in the electronic survey, Public Opinion Quarterly, vol.50, Fall, pp.402413.
Kilmann, R.H., 1975, A scale-projective measure of interpersonal values, Journal of Personality
Assessment, vol.39, pp.3440.
Kim, J.O., Mueller, C.W., 1978, Introduction to factor analysis what it is and how to do it, Sage
University Press, Newbury Park.
Kivetz, R., Simonson, I., 2002a, Earning the right to indulge effort as a determinant of customer preferences toward frequency program rewards, Journal of Marketing Research, vol.39,
May, pp.155170.
Kivetz, R., Simonson, I., 2002b, Self-control for the righteous: toward a theory of precommitment to indulge, Journal of Consumer Research, vol.29, September, pp.199217.
Kline, P., 2000, Handbook of psychological testing, 2nd ed., Routledge, London.

450

References

Kline, P., 2002, An easy guide to factor analysis, Routledge, London.


Kline, R.B., 2010, Principles and practices of structural equation modeling, 3rd ed., Guilford
Press, New York.
Kluckhohn, C.K.M., 1951, Value and value orientations in the theory of action, in: Parsons, T.,
Sbus, E.A. (eds.), Toward a general theory of action, Harvard University Press, Cambridge,
MA.
Kluckhohn, C.K.M., 1962, Value and value orientation in the theory of action. An exploration
in definition and classification, in: Parsons, T., Shils, E. (eds.), Towards a general theory of
action, Harvard University Press, Cambridge, MA.
Kluckhohn, C.K.M., Strodtbeck, F., 1961, Variations in value orientations, Evanston, Peterson.
Kluegel, J.R., Singleton, R., Starnes, C.E., 1977, Subjective class identification: A multiple indicators approach, American Sociological Review, vol.42, pp.599611.
Kohn, M.L., 1959, Social class and parental values, American Journal of Sociology, vol.64,
pp.223228.
Kohn, M.L., 1969, Class and conformity A study in values, Dorsey, Homewood, IL.
Kohn, M.L., 1976, Social class and parantal values another confirmation of the relationship,
American Sociological Review, vol.41, pp.538545.
Kohn, M.L. Schooler, C., 1982, Job conditions and personality A longitudinal assessment of
their reciprocal effects, American Journal of Sociology, vol.87, pp.12571286.
Konarski, R., 2009, Modele rwna strukturalnych teoria i praktyka, Wydawnictwo Naukowe PWN, Warszawa.
Koopmans, T.C., Rubin, H., Leipnik, R.B., 1950, Measuring the equation systems of dynamic
economics, in: Koopmans, T.C. (ed.), Statistical inference in dynamic economic models,
Wiley, New York, pp.53237.
Kotler, P.J., 1991, Marketing management, 7th ed., Prentice-Hall, Englewood Cliffs, New York.
Kowal, J., 1998, Metody statystyczne w badaniach sondaowych rynku, Wydawnictwo Naukowe PWN, WarszawaWrocaw.
Kozyra, C., 2004, Metody analizy i oceny jakoci usug PhD thesis, Wroclaw University of
Economics, pp.68101.
Krantz, D.H., Luce, R.D., Suppes, P., Tversky, A., 1971, Foundations of measurement, vol.1,
Additive and polynomial representations, Academic Press, New York.
Kristiansen, C.M., Zanna, M.P., 1991, Value relevance and the value-attitude relation value
expressiveness versus halo effects, Basic Applied Sociological Psychology, vol. 12, no. 4,
pp.471483.
Krosnick, J.A., Alwin, D.F., 1985, The measurement of values in surveys A comparison of ratings and rankings, Public Opinion Quarterly, vol.49, pp.535552.
Krosnick, J.A., Alwin, D.F., 1988, A test of the form-resistant correlations hypothesis: ratings, rankings, and the measurement of values, Public Opinion Quarterly, vol.52, no.4,
pp.526538.
Kuder, G.F., Richardson, M.W., 1937, The theory of the estimation of test reliability, Psychometrika, vol.2, September, pp.151160.
Kumar, A., Dillon, W.R., 1987, Some further remarks on measurement-structure interaction
and the unidimensionality of constructs, Journal of Marketing Research, vol.24, November, pp.438444.
Kyburg, H., 1984, Theory and measurement, Cambridge University Press, Cambridge.
Lamont, W.D., 1955, The value judgment, Greenwood Press, Westport, CT.

References

451

Landau, S., Everitt, B.S, 2004, Handbook of statistical analyses using SPSS for Windows, Chapman and Hall, Washington, D.C.
Laurent, G., Kapferer, J.N., 1985, Measuring consumer involvement profiles, Journal of Marketing Research, vol.22, no.1, pp.4153.
Lawley, D.N., 1940, The estimation of factor loadings by the method of maximum likelihood,
Proceedings of The Royal Society of Edinburgh, vol.60, pp.6482.
Lawley, D.N., 1943, On problems connected with item selection and test construction, Proceedings of the Royal Society of Edinburgh, Series A, vol.23, pp.273287.
Lawshe, C.H., 1975, A quantitative approach to content validity, Personnel Psychology, vol.28,
pp.563575.
Leavitt, C., Walton, J., 1975, Development of a scale for innovativeness, Advances in Consumer
Research, vol.2, Association for Consumer Research, pp.545554.
Leclerc, F., Schmitt, B.H., Dube, L., 1994, Foreign branding and its effects on product perceptions and attitudes, Journal of Marketing Research, vol.31, no.2, pp.263270.
Lee, H.B., Comrey, A.L., 1979, Distortions in a commonly used factor analytic procedure, Multivariate Behavioral Research, vol.14, pp.301321.
Leech, L.N., Barrett, C.K., 2005, SPSS for intermediate statistics use and interpretation, 2nd
ed., Lawrence Erlbaum Associates, New Jersey.
Leech, L.N., Barrett, C.K., Morgan, G.E., Gloeckner, G.W., 2004, SPSS for introductory statistics, 2nd ed., Lawrence Erlbaum Associates, New Jersey.
Lehmann, D.R., Hulbert, J., 1972, Are three-point scales always good enough?, Journal of Marketing Research, vol.9, November, pp.444446.
Lembkin, M., Foxall, G., Raaij, F. van, Heilbrunn, B. (eds.), 2001, Zachowanie konsumenta
koncepcje i badania europejskie, Wydawnicwto Naukowe PWN, Warszawa.
Leonard-Barton, D., 1981, Voluntary simplicity lifestyles and energy conservation, Journal of
Consumer Research, vol.8, pp.243252.
Lesthaeghe, R., Moors, G., 2000, Life course transitions and value orientations: selection and
adaption, Presented at Contact Forum: Values orientations and life cycle decisions, Results
from Longitudinal Studies, Brussels.
Levitin, T., 1968, Values, in: Robinson, J.P., Shaver, P.R. (eds.), Measures of social psychological
attitudes, Institute for Social Research, Ann Arbor, MI, pp.489585.
Levy, S.J., 1959, Symbols for sale, Harvard Business Review, vol.37, July-August, pp.117119.
Levy, S.J., 1963, Symbolism and life style, in: Greyser, S.A. (ed.), Toward scientific marketing,
American Marketing Association, Chicago.
Lichtenstein, S., Slovic, P., 2006, The construction of preference, Cambridge University Press,
New York.
Likert, R., 1932, A technique for the measurement of attitudes, Archives of Psychology, vol.140,
pp.4453.
Linden, W.J. van der, Hambleton, R.K. (eds.), 1997, Handbook of modern item response theory,
Springer, New York.
Lindquist, E.F., 1936, The theory of test construction, in: Hawkes, H.E., Lindquist, E.F., Mann,C.
(eds.), The construction and use of achievement examinations, Houghton Mifflin, Boston.
Little, T.D., Slegers, D.W., Card, N.A., 2006, A non-arbitrary method of identifying and scaling latent variables in SEM and MACS models, Structural Equation Model, vol.13, no.1,
pp.5972.
Loehlin, J.C., 1990, Component analysis versus common factor-analysis A case of disputed
authorship, Multivariate Behavioral Research, vol.25, no.1, pp.2931.

452

References

Loehlin, J.C., 2004, Latent variables An introduction to factor, patch and structural equation
analysis, Lawrence Erlbaum Associates, New JerseyLondon.
Loevinger, J., 1948, The technique of homogeneous tests compared with some aspects of scale
analysis and factor analysis, Psychological Bulletin, vol.45, pp.507530.
Loevinger, J., 1957, Objective tests as instruments of psychological theory, Psychological Reports, vol.3, pp.635694.
Loewenstein, G., Prelec, D., 1992, Anomalies in intertemporal choice evidence and interpretation, Quarterly Journal of Economics, vol.107, no.2, pp.573597.
Lofman, B., 1991, Elements of experiential consumption: An exploratory study, Advances in
Consumer Research, vol.18, Association for Consumer Research, pp.729735.
Lomax, R., Algina, J., 1979, Comparison of two procedures for analyzing multitrait multimethod matrices, Journal of Educational Measurement, vol.16, pp.177186.
Lord, F.M., 1952, A theory of test scores, Psychometric Monograph, no.7.
Lord, F.M., 1953, On the statistical treatment of football numbers, American Psychologist,
vol.8, pp.750751.
Lord, F.M., 1958, Some relations between Guttmans principal components of scale analysis and
other psychometric theory, Psychometrika, vol.23, pp.291296.
Lord, F.M., 1980, Applications of item response theory to practical testing problems, Erlbaum,
Hillsdale.
Lord, F.M., Novick, M.R., 1968, Statistical theories of mental test scores, Lawrence Erlbaum
Associates, California:.
Lorr, M., Suziedelis, A., Tonesk, X., 1973, The structure of values conceptions of the desirable,
Journal of Research in Personality, vol.7, pp.137147.
Lortie-Lussier, M., Fellers, G.L., Kleinplatz, P.J., 1986, Value orientations of English, French
and Italian Canadian children continuity of the thnic mosaic?, Journal of Cross-Cultural
Psychology, vol.16, pp.283299.
Luce, R.D., 1959, On the possible psychophysical laws, Psychological Review, vol.66, pp.8195.
Luce, R.D., Krantz, D.H., Suppes, P., Tversky, A., 1990, Foundations of measurement, vol.3,
Representation, axiomatization, and invariance, Academic Press, San Diego.
Lumsden, J., 1976, Test theory, Annual Review of Psychology, vol.27, pp.251280.
Lunt, P., Livingstone, S., 1991, Psychological, social and economic determinanst of savings comparing recurrent and total savings, Journal of Economic Psychology, vol.12, pp.621641.
Luszniewicz, A., Saby, T., 2008, Statystyka z pakietem komputerowym STATISTICA teoria
izastosowania, 3rd ed., C.H. Beck, Warszawa.
MacCallum, R.C., 1986, Specification searches in covariance structure modeling, Psychological
Bulleting, vol.100, pp.107120.
MacCallum, R.C., Austin, J.T., 2000, Applications of structural equation modeling in psychological research, Annual Review of Psychology, vol.51, pp.201226.
MacCallum, R.C., Browne, M.W., Sugawara, H.M., 1996, Power analysis and determination of
sample size for covariance structure modeling, Psychological Methods, vol.1, pp.130149.
MacCallum, R.C., Roznowski, M., Necowitz, L.B., 1992, Model modifications in covariance
structure analysis the problem of capitalization on chance, Psychological Research,
vol.29, pp.132.
MacCallum, R.C., Tucker, L.R., 1991, Representing sources of error in the common-factor model
implications for theory and practice, Psychological Bulletin, vol.109, no.3, pp.502511.
MacCallum, R.C., Widaman, K.F., Zhang, S., Hong, S., 1999, Sample size in factor analysis,
Psychological Methods, vol.4, pp.8489.

References

453

MacDonald, R.P., 1985, Factor analysis and related methods, Lawrence Erlbaum Associates,
New Jersey.
MacKay, D.B., Summers, J.O., 1977, On establishing convergent validity A reply to Wilkes and
Wilcox, Journal of Marketing Research, vol.14, May, pp.263265.
MacKenzie, S.B., Podsakoff, P.M., Jarvis, C.B., 2005, The problem of measurement model specification in behavioral and organizational research and some recommended solutions, Journal of Applied Psychology, vol.90, pp.710730.
Madansky, A., 1965, On admissible communalities in factor analysis, Psychometrika, vol.30,
no.4, December, pp.455458.
Magnusson, D., 1981, Wprowadzenie do teorii testw, PWN, Warszawa
Mahoney, J., Katz, G.M., 1976. Value structures and orientations to social institutions, Journal
of Psychology, vol.93, pp.20321.
Maio, G.R., Olson, J.M., 1994, Value-attitude behaviour relations the moderating role of attidude functions, British Journal of Social Psychology, vol.33, pp.301312.
Maio, G.R., Olson, J.M., 1998, Values as truisms: evidence and implications, Journal of Personality and Social Psychology, vol.74, pp.294311.
Maio, G.R., Olson, J.M., 2000, What is a value-expressive attitude?, in: Maio, G.R., Olson, J.M.
(eds.), Why we evaluate functions of attitudes, Lawrence Erlbaum Associates, New Jersey,
pp.249269.
Maio, G.R., Olson, J.M., Allen, L., Bernard, M.M., 2001, Addressing discrepancies between
values and behavior: the motivating effect of reasons, Journal of Experimanetal Social Psychology, vol.37, pp.104117.
Maio, G.R., Olson, J.M., Bernard, M.M., Luke, M.A., 2003, Ideologies, values, attitudes, and
behavior, in: DeLamater, K. (ed.), Handbook of social psychology, Plenum, New York,
pp.283308.
Malhotra, N.K., 2009, Basic marketing research a decision-making approach, 3rded., Pearson, London.
Mandler, G., 1993. Approaches to a psychology of value, in: Hechter, M., Nadel, L., Michod,
R.E. (eds.), The origin of values, de Gruyter, New York, pp.228258.
Mano, H., Oliver, R.L, 1993, Assessing the dimensionality and structure of the consumption
experience: evaluation, feeling, and satisfaction, Journal of Consumer Research, vol. 20,
December, pp.451466.
Marcoulides, G.A., 1998, Applied generalizability theory models, in: Marcoulides, G.A. (ed.),
Modern methods for business research, Lawrence Erlbaum Associates, New Jersey, pp.128.
Mariaski, J., 1989, Wprowadzenie do socjologii moralnoci, KUL, Lublin.
Marini, M.M., 2000, Social values and norms, in: Borgatta, E.F., Montgomery, R.J.V. (eds.),
Encyclopedia of sociology, 2nd ed., vol.4, pp.28282840.
Markin, R.J., Lillis, C.M., Narayana, C.L., 1976, Social-psychological significance of store space,
Journal of Retailing, vol.52, Spring, pp.4355.
Marsh, H.W., 1995, Confirmatory factor analysis models of factorial invariance Amultifaced
approach, Structural Equation Modeling, vol.1, pp.534.
Marsh, H.W., Hocevar, D., 1983, Confirmatory factor analysis of multitrait-multimethod matrices, Journal of Educational Measurement, vol.20, pp.231248.
Marsh, H.W., Hocevar, D., 1985, The application of confirmatory factor analysis to the study of
self-concept first and higher order factor structures and their invariance across age groups,
Psychological Bulletin, vol.97, pp.562582.

454

References

Marsh, H.W., Balla, J.R., McDonald, R.P., 1988, Goodness-of-fit indices in confirmatory factor
analysis the effect of sample size, Psychological Bulletin, vol.103, pp.391410.
Marsh, H.W., Balla, J.R., Hau, K.-T., 1996, An evaluation of incremental fit indices A clarification of mathematical andempirical properties, in: Marcoulides, G.A., Schumacker, R.E.
(eds.), Advanced structural equation modeling issues and techniques, Lawrence Erlbaum
Associates, New Jersey.
Marsh, H.W., Hau, K.-T., Balla, J.R., Grayson, D., 1998, Is more ever too much? The number
of indicators per factor in confirmatory factor analysis, Multivariate Behavioral Research,
vol.33, pp.181220.
Martin, I.M., Eroglu, S., 1993, Measuring a multi-dimensional construct country image, Journal of Business Research, vol.28, pp.191210.
Maslow, A., 1954, Motivation and personality, Harper, New York.
Matsunaga, M., 2010, How to factor-analyze your data right dos, donts, and how-tos, International Journal of Psychological Research, vol.3, no.1. pp.97110.
Mazurek-opaciska, K., 2002, Badania marketingowe podstawowe metody iobszary zasto
sowa, Wydawnictwo Akademii Ekonomicznej we Wrocawiu. Wrocaw.
McAlister, L., Pessemier, E., 1982, Variety seeking behavior An interdisciplinary review, Journal of Consumer Research, vol.9, no.3, pp.311322.
McCammon, R.B., 1966, Principal component analysis and its application in large-scale correlation studies, Journal of Geology, vol.74, pp.721733.
McCarty, J.A., Shrum, L.J., 2000, The measurement of personal values in survey research: A test
of alternative rating procedures, Public Opinion Quarterly, vol.64, pp.271298.
McDonald, R.P., 1967, Nonlinear factor analysis, Psychometric Monographs, no.15.
McDonald, R.P., 1985, Factor analysis and related methods, Lawrence Erlbaum Associates,
New Jersey.
McDonald, R.P., 1989, An index of goodness-of-fit based on noncentrality, Journal of Classification, vol.6, pp.97103.
McDonald, R.P., Ho, M.H.R., 2002, Principles and practice in reporting structural equation
analyses, Psychological Methods, vol.7, pp.6482.
McDonald, R.P., Krane, W.R., 1977, A note on local identifiability and degrees of freedom in the
asymptotic likelihood ratio test, British Journal of Mathematical and Statistical Psychology,
vol.30, pp.198203.
McDonald, R.P., Marsh, H.W., 1990, Choosing a multivariate model noncentrality and goodness of fit, Psychological Bulletin, vol.107, pp.247255.
McKeon, J.J., 1968, Rotation for maximum association between factors and tests, unpublished
manuscript, Biometric Laboratory, George Washington University.
McKnight, P.E., McKnight, K.M., Sidani, S., Figueredo, A.J., 2007, Missing data A gentle
introduction, Guilford Press, New York.
McNemar, Q., 1942, On the number of factors, Psychometrika, vol.7, no.1, March, pp.918.
Mendras, H., 1997, Elementy socjologii, Siedmiorg, Wrocaw.
Meredith, W., 1993, Measurement invariance, factor analysis and factorial invariance, Psychometrika, vol.58, pp.525543.
Merrens, M.R., 1970, Generality and stability of extreme response style, Psychological Reports,
vol.27, December.
Messick, S., 1989, Validity, in: Linn, R.L. (ed.), Educational measurement, American Council
on Education and National Council on Measurement in Education, Washington, D.C.,
pp.12103.

References

455

Messick, S., 1998, Test validity A matter of consequence, Social Indicators Research, vol.45,
pp.3544.
Meyers, L.S., Gamst, G., Guarino, A.J., 2006, Applied multivariate research design and interpretation, Sage Publications, Thousand Oaks.
Micceri, T., 1989. The unicorn, the normal curve, and other improbable creatures, Psychological
Bulletin, vol.105, pp.156166.
Michell, J., 1986, Measurement scales and statistics A clash of paradigms, Psychological Bulletin, vol.100, pp.398407.
Michell, J., 1990, An introduction to the logic of psychological measurement, Erlbaum, Hillsdale.
Miller, G.A., 1956, The magical number seven, plus-or-mins two some limits on capacity for
processing information, Psychological Review, vol.63, pp.8197.
Millsap, R.E., 1994, Psychometrics, in: Sternberg, R.J. (ed.), Encyclopedia of intelligence, vol. 2,
Macmillan, New York, pp. 866868.
Millsap, R.E., Everson, H., 1991, Confirmatory measurement model comparisons using latent
means, Multivariate Behavioral Research, vol.26, no.3, pp.479497.
Misztal, M., 1980, Problemy wartoci w socjologii, PWN, Warszawa, pp.1819.
Mokken, R.J., 1971, A theory and procedure of scale analysis with applications in political research, De Gruyter, New York.
Mokken, R.J., 1997, Nonparametric models for dichotomous responses, in: Linden,W.J. van der,
Hambleton, R.K. (eds.), Handbook of modern item response theory, Springer-Verlag, New
York, pp. 351367.
Mokken, R.J., Lewis, C., 1982, Nonparametric approach to the analysis of dichotomous item
responses, Applied Psychological Measurement ,vol. 6, pp.417430.
Molenaar, I.W., 1973, Simple approximations to the poisson, binomial and hypergeometrical
distributions, Biometrics, vol.29, pp.403407.
Molenaar, I.W., 1997 Lenient or strict application of irt with an eye on practical consequences,
in: Rost, J., Langeheine, R. (eds.), Applications of latent trait and latent class models in the
social sciences, Waxmann, Munster, pp.3849.
Monroe, K.B., Chapman, J.D., 1987, Framing effects on buyers subjective product evaluations, Advances in Consumer Research, vol. 14, Association for Consumer Research,
pp.193197.
Montenei, M.S., Adams, G.A., Eggers, L.M., 1996, Validity of scores on the attitudes toward
diversity scale, Educational and Psychological Measurement, vol.56, pp.293303.
Morris, C.W., 1956, Varietes of human value, University of Chicago Press, Chicago.
Moschis, G.P., Churchill, G.A., 1978, Consumer socialization A theoretical and empirical
analysis, Journal of Marketing Research, vol.15, pp.599609.
Mosier, C.I., 1943, On the reliability of a weighted composite, Psychometrika, vol.8, pp.161168.
Mueller, R.O., 1996, Basic principles of structural equation modeling An introduction to Lisrel
and Eqs., Springer, New York.
Mulaik. S.A., 1987, A brief history of the philosophical foundations of exploratory factor analysis, Multivariate Behavioral Research, vol.22, pp.267305.
Mulaik, S.A., 1990, Blurring the distinctions between component analysis and common factor
analysis, Multivariate Behavioral Research, vol.25, no.1, pp.5359.
Mulaik, S.A., James, L.R., Alstine, J. van, Bennett, N., Lind, S., Stilwell, C.D., 1989, Evaluation
of goodness-of-fit indexes for structural equation models, Psychological Bulletin, vol.105,
no.3, pp.430445.

456

References

Mulaik, S.A., McDonald, R.P., 1978, The effect of additional variables on factor indeterminacy
in models with a single common factor, Psychometrika, vol.43, pp.117192.
Munson, J.M., 1984, Personal values considerations on their measurement and application to
five areas of research inquiry, in: Pitts, R.E. Jr., Woodside, A.G. (eds.), Personal values and
consumer psychology, Lexington Books, Lexington, MA.
Muthn, B., 1983, Latent variable structural equation modeling with categorical data, Journal
of Econometrics, vol.22, pp.4865.
Muthn, B., 1984, A general structural equation model with dichotomous, ordered categorical,
and continuous latent variable indicators, Psychometrika, vol. 49, pp.115132.
Muthn, B., 1989, Latent variable modeling in heterogeneous populations, Psychometrika,
vol.54, no. 4, pp.557585.
Muthn, B., Toit, S.H.C. du, Spisic, D., 1997, Robust inference using weighted least squares and
quadratic estimating equations in latent variable modeling with categorical and continuous
outcomes, unpublished manuscript.
Muthn, B., Satorra, A., 1995, Technical aspects of Muthns LISCOMP approach to estimation of latent variable relations with a comprehensive measurement model, Psychometrika,
vol.60, pp.489503.
Muthn, L.K., Muthn, B., 1998, Mplus users guide, Federal Funds, Los Angeles.
Muthn, L.K., Muthn, B., 2002, How to use a Monte Carlo study to decide on sample size and
determine power, Structural Equation Modeling, vol.9, pp.599620.
Myers, J.H., 1996, Segmentation and positioning for strategic marketing decision, American
Marketing Association, Chicago.
Mysakowski, Z., 1965, Wychowanie czowieka w zmiennej spoecznoci, Ksika i Wiedza,
Warszawa.
Nachmias, D., Nachmias, C., 1976, Research methods in the social sciences, St Martins Press,
London.
Najder, Z., 1971, Wartoci i oceny, PWN, Warszawa.
Narens, L., 1974, Measurement without Archimedean axioms, Philosophical Science, vol.41,
pp.374393.
Narens, L., 1981a, A general theory of ratio scalability with remarks about the measurementtheoretic concept of meaningfulness, Theory Decision, vol.13, pp.296322.
Narens, L., 1981b, On the scales of measurement, Journal of Mathematical Psychology, vol.24,
pp.249275.
Narens, L., Luce, R.D., 1986, Measurement the theory of numerical assignments, Psychological Bulletin, vol.99, pp.166180.
Nelson, E., 1939, Their nature and development, Journal of General Psychology, vol. 21,
pp.367399.
Netemeyer, R.G., Bearden, W.O., Sharma, S., 2003, Scaling procedures issues and application,
Sage Publications, London.
Netemeyer, R.G., Boles, J.S., McMurrian, R.C., 1991, Development and validation of workfamily conflict and family-conflict scales, Journal of Applied Psychology, vol. 81, no. 4,
pp.400410.
Netemeyer, R.G., Johnston, R.G., Burton, M.W., 1990, Analysis of role conflict and role ambiguity in structural equations, Journal of Applied Psychology, vol.75, no.2, pp.148158.
Neumann, J. von, Morgenstern, O., 1947, Theory of games and economic behaviour, Princeton
University Press, Princeton, NJ.

References

457

Nevitt, J., Hancock, G.R., 2001, Performance of bootstrapping approaches to model test statistics
and parameter standard error estimation in structural equation modeling, Structural Equation Modeling, vol.8, pp.353377.
Niederee, R., 1994, There is more to measurement than just measurement measurement theory, symmetry, and substantive theorizing, Journal of Mathematical Psychology, vol.38,
pp.527594.
Nietzsche, F., 2004, Tako rzecze Zaratustra, Antyk, Kty.
Nisbet, R., Elder, J., Miner, G., 2009, Handbook of statistical analysis and data mining applications, Academic Press, Canada.
Norman, R.P., 1969, Extreme response tendency as a function of emotional adjustment and stimulus ambiguity, Journal of Consulting ami Clinical Psychology, vol.33, no.4, pp.406410.
Nowakowska, M., 1975, Psychologia ilociowa z elementami naukoemetrii, Polska Akademia
Nauk, Warszawa.
Nunnally, J.C., 1967, Psychometric theory, 1st ed., McGraw-Hill, New York.
Nunnally, J.C., 1978, Psychometric theory, 2nd ed., McGraw-Hill, New York.
Nunnally, J.C, Bernstein, I.H., 1994, Psychometric theory, McGraw-Hill, New York.
OGuinn, T.C., Wells, W.D., 1989, Subjective discretionary income, Marketing Research Magazine of Application and Practice, vol. 1, pp.3241.
OShaughnessy, J., OShaughnessy, N.J., 2002, Marketing, the consumer society and hedonism,
European Journal of Marketing, vol.36, no.5/6, pp.524547.
Obermiller, C., Spangenberg, E.R., 1998, Development of a scale to measure consumer skepticism toward advertising, Journal of Consumer Psychology, vol.7, no.2, pp.159186.
Ok, J., 1964, Analiza czynnikowa w psychologii, PWN, Warszawa.
Omoto, A.M., Snyder, M., 1995, Sustained helping without obligation: motivation, longevity of
service, and perceived attitude change among AIDS volunteers, Journal of Personality and
Social Psychology, vol.68, pp.671686.
Ossowski, S., 1967, Konflikty niewspmiernych skal wartoci, in: Ossowski, S. (ed.), Zzagad
nie psychologii spoecznej, Dziea, t. 3, PWN, Warszawa.
Ostasiewicz, W., 2003, Pomiar statystyczny, Wydawnicwto Akademii Ekonomicznej we Wroc
awiu, Wrocaw.
Ostini, R., Nering, L.M., 2006, Polytomous item response theory models, Sage Publications,
Thousand Oaks.
Ovadia, S., 2004, Ratings and rankings reconsidering the structure of values and their measurement, Journal Social Research Methodology, vol.7, no.5, pp.403414.
Parameswaran, R., Greenberg, B.A., Bellenger, D.N., Robertson, D.H., 1979, Measuring reliability A comparison of alternative techniques, Journal of Marketing Research, vol.16,
February, pp.1825.
Park, C.W., Jaworski, B.J., MacInnis, D.J., 1986, Strategic brand concept image management,
Journal of Marketing, no.4, pp.135145.
Park, C.W., Mothersbaugh, D.L., Feick, L., 1994, Consumer knowledge assessment, Journal of
Consumer Research, vol.21, June, pp.4255.
Parsons, T., 1937, The structure of social action, vol.1, Free Press, New York.
Payne, S., 1951, The art of asking questions, Princeton University Press, Princeton, NJ.
Pecheux, C., Derbaix, C., 1999, Children and attitude toward the brand A new measurement
scale, Journal of Advertising Research, July-August, pp.1927.
Peter, J.P., 1979, Reliability A review of psychometric basics and recent marketing practices,
Journal of Marketing Research, vol.16, February, pp.617.

458

References

Piers, S., Knig, C., 2006, Integrating theories of motivation, Academy of Management Review,
vol.31, pp.889913.
Ployhardt, R.E., Oswald, F.L., 2004, Applications of mean and covariance structure analysis:
integrating correlational and experimental approaches, Organizational Research Methods,
vol.7, no.1, pp.2765.
Plumer, J.T., 1974, The concept and application of life style segmentation, Journal of Marketing
Research, vol. 5, May, pp.3337.
Pociecha, J., 1996, Metody statystyczne w badaniach marketingowych, Wydawnictwo Naukowe
PWN, Warszawa.
Podsakoff, P.M., MacKenzie, S.B., Podsakoff, N.P., Lee, J.Y., 2003, The mismeasure of man (agement) and its implications for leadership research, Leadership Quarter, vol.14, pp.615656.
Popham, W.J., 1981, Modern educational measurement, Prentice Hall, Englewood Cliffs, NJ.
Potthast, M.J., 1993, Confirmatory factor analysis of ordered categorical variables with large
models, British Journal of Mathematical and Statistical Psychology, vol.46, pp.273286.
Powell, D.A., Schafer, W.D., 2001, The robustness of the likelihood ratio chi-square test for structural equation models A meta-analysis, Journal of Educational and Behavioral Statistics,
vol.26, pp.105132.
Prelec, D., Loewenstein, G., 1998, The red and the black mental accounting of savings and
debt, Marketing Science, vol.17, no.1, pp.428.
Prentince, D.A., 2001, The individual self, relational self and collective self A commentary, in:
Sedikes, C., Brewer, M.B. (eds.), The individual self, relational self and collective self, Taylor
and Francis, Ann Arbor, pp.315326.
Pritchard, R., Ashwood, E., 2008, Managing motivation, Taylor Francis Group, New York.
Prymon, M., 2009, Badania marketingowe w aspektach menederskich, Wydawnictwo Uniwersytetu Ekonomicznego we Wrocawiu, Wrocaw.
Raju, N.S., 1988, The area between two item characteristic curves, Psychometrika, vol. 53,
pp.495502.
Raju, P.S., 1980, Optimum stimulation level its relationship to personality, demographics, and
exploratory behavior, Journal of Consumer Research, vol.7, no.3, pp.272- 282.
Ramsey, P.H., Gibson, W.A, 2006, Improved communality estimation in factor analysis, Journal
of Statistical Simulation and Computation, vol.76, no.2, February, pp.93101.
Rao, C.R., 1955, Estimation and tests of significance in factor analysis, Psychometrika, vol.20,
pp.93111.
Rasch, G., 1960, Probabilistic models for some intelligence and attainment tests, University of
Chicago Press, Chicago.
Rasch, G., 1977, On specific objectivity: An attempt at formalizing the request for generality and
validity of scientific statements, in: Blegvad, M. (ed.), The Danish yearbook of philosophy,
Munksgaard, Copenhagen, pp.5894.
Ravi, D., Wertenbroch, K., 2000, Consumer choice between hedonic and utilitarian goods, Journal of Marketing Research, vol.37, February, pp.6071.
Raykov, T., 1997, Estimation of composite reliability for congeneric measures, Applied Psychological Measurement, vol.21, pp.173184.
Raykov, T., 2004, Behavioral scale reliability and measurement invariance evaluation using latent variable modeling, Behavior Therapy, vol. 35, pp.299331.
Raykov, T., 2011, Evaluation of convergent and disriminant validity with multitrait-multimethod correlations, British Journal of Mathematical and Statistical Psychology, vol.64,
pp.3852.

References

459

Raykov, T., Widaman, K.F., 1995, Issues in applied structural equation modeling research,
Structural Equation Modeling, vol.2, no.4, pp.289318.
Reckase, M.D., 1979, Unifactor latent trait models applied to multifactor tests results and
implications, Journal of Educational Statistics, 4, pp.207230.
Reckase, M.D., 1997, The past and future of multidimensional item response theory, Applied
Psychological Measurement, vol.27, pp.2536.
Reckase, M.D., McKinley, R.L., 1982, The discriminating power of items that measure more
than one dimension, Applied Psychological Measurement, vol.15, pp.361373.
Reilly, M.D., 1982, Working wives and convenience consumption, Journal of Consumer Research, vol.8, March, pp.407418.
Reilly, T., 1995, A necessary and sufficient condition for identification of confirmatory factor
analysis model of factor complexity one, Sociological Methods and Research, vol.23, no.4,
pp.421441.
Reilly, T., OBrien, R.M., 1996, Identification of confirmatory factor analysis models of arbitrary complexity the side-by-side rule, Sociological Methods and Research, vol.24, no.4,
pp.473491.
Rest, J.R., 1972, Defining issues test, Moral Research Projects, University of Minnesota, Minneapolis.
Rettig, S., Pasamanick, B., 1959, Changes in moral values among college students Afactorial
study, American Sociological Review, vol.24, pp.856863.
Revelle, W., 1979, Hierarchical clustering and the internal structure of tests, Multivariate Behavioral Research, vol.14, no.1, pp.5774.
Reynolds, T.J., Jolly, J.P., 1980, Measuring personal values. An evaluation of alternative methods, Journal of Marketing Research, vol.17, November, pp.531536.
Richardson, M.W., 1936, Notes on the rationale of item analysis, Psychometrika, vol.1, no.1,
pp.6975.
Richins, M.L., 1987, Media, materialism and human happiness, Advances in Consumer Research, vol.14, Association for Consumer Research, pp.352356.
Roberts, F.S., Luce, R.D., 1968, Axiomatic thermodynamics and extensive measurement, Synthese, vol.18, pp.311326.
Robertson, T.S., 1967, The process of innovation and the diffusion of innovation, Journal of
Marketing, vol.31, no.1, pp.1419.
Robinson, J.P., 1973, Toward a more appropriate use of Guttman scaling, Public Opinion Quarterly, vol.37, pp.260267.
Robinson, J.P., Shaver, P.R., Wrightsman, L.S., 1991, Measures of personality and social psychological attitudes, Elsevier, New York.
Rohan, M.J., 2000, A rose by any name? The values construct, Journal of Personality and Social
Psychology, vol.4, pp. 255277.
Rohatyn, D., 1990, The (mis)information socjety An analysis of the role of propaganda in shaping consciousness, Bulleting of Science, Technology and Society, vol.10, no.2, pp. 7785.
Rokeach, M., 1968, Beliefs, attitudes, and values, Jossey-Bass, San Francisco.
Rokeach, M., 1973, The nature of human values, Macmillan, New York.
Roman, J.B., Kaplan, B.J., 1995, Difference in self-control among spenders and savers, Journal of
Human Behavior, vol.32, pp.817.
Rook, D.W., 1987, The buying impulse, Journal of Consulmer Research, vol.14, September,
pp.189199.

460

References

Rorer, B.A., Ziller, R.C., 1982, Iconic communication of values among American and Polish
students, Journal of Cross-Cultural Psychology, vol.13, pp.352361.
Rosenbaum, P.R., 1987, Comparing item characteristic curves, Psychometrika, vol. 52,
pp.217233.
Rosenberg, M., 1965, Society and the adolescent self-image, Princeton University Press,
Princeton, NJ.
Rossiter, J., 2002, The C-OAR-SE procedure for scale development in marketing, International
Journal of Research Market, vol.19, no.4, pp.131.
Rossiter, J., 2011, Measurement for social sciences The C-OAR-SE method and why it must
replace psychometric, Springer, New York.
Rost, J., 1990, Rasch models in latent classes An integration of two approaches to item analysis,
Applied Psychological Measurement, vol.12, pp.271282.
Rszkiewicz, M., 2011, Analiza klienta, Predictive Sollutions, SPSS Polska, pp.2627.
Rulon, P.J., 1939, A simplified procedure for determining the reliability of a test by split-halves,
Harvard Educational Review, vol.9, pp.99103.
Rumiski, A., 1996, System wartoci rodzicw i dzieci, in: Sareo, Z. (red.), Moralno i etyka
wponowoczesnoci, ATKA, Warszawa.
Rummel, R.J., 1970, Applied factor analysis, Northwestern University Press, Evanston, IL.
Rushton, J.P., Chrisjohn, R.D., Fekken, G.C., 1981, The altruistic personality and the self-report
altruism scale, Personality and Individual Differences, vol.2, pp.293302.
Rusnak, Z., 1999, Analiza czynnikowa, in: Ostasiewicz, W. (ed.), Statystyczne metody analizy
danych, Wydawnictwo Akademii Ekonomicznej we Wrocawiu, Wrocaw, pp.286300.
Sagan, A., 2000, Wybrane problemy identyfikacji i pomiaru struktur ukrytych, Zeszyty Naukowe,
no.543, Wydawnictwo Uniwersytetu Ekonomicznego w Krakowie, Krakw, pp.5464.
Sagan, A., 2003, Skale i indeksy jako narzdzia pomiaru w badaniach marketingowych, Zeszyty
Naukowe, no. 640, Wydawnictwo Uniwersytetu Ekonomicznego w Krakowie, Krakw,
pp.2136.
Sagan, A., 2004, Badania marketingowe podstawowe kierunki, Wydawnictwo Uniwersytetu
Ekonomicznego w Krakowie, Krakw.
Sagan, A., 2005, Ocena ekwiwalencji skal pomiarowych w badaniach midzykulturowych,
Zeszyty Naukowe, no. 659, Wydawnictwo Uniwersytetu Ekonomicznego w Krakowie,
Krakw, pp.5973.
Sagan, A., 2011, Warto w antropologii kulturowej, in: Sagan, A. (red.), Warto dla klienta
wukadach rynkowych aspekty metodologiczne, Wydawnictwo Uniwersytetu Ekonomicz
nego w Krakowie, Krakw, pp.1215.
Salzberger, T., 1999, How the Rasch model may shift our perspective of measurement in marketing research, University of Economics and Business Administration in Vienna.
Sanders, K.R., Atwood, L.E., 1979, Value change initiated by the mass media, in: Rokeach,M. (ed.),
Understanding human values individual and societal, Free Press, New York, pp.226240.
Saris, W.E., Satorra, A., 1993, Power evaluations in structural equation models, in: Bollen, K.A.,
Long, J.S. (eds.), Testing structural equation models, Sage, Newbury Park, CA, pp.181204.
Satorra, A., 1992, Asymptotic robust inferences in the analysis of mean and covariance structures, in: Marsden, P. (ed.), Sociological methodology, American Sociological Association,
Washington, D.C.
Satorra, A., Bentler, P.M., 1986, Some robustness properties of goodness of fit statistics in covariance structure analysis, ASA Proceedings of the Business and Economic Section,
pp.549554.

References

461

Satorra, A., Bentler, P.M., 1988, Scaling corrections for chi-square statistics in covariance structure analysis, ASA Proceedings of the Business and Economic Section, pp.308313.
Satorra, A., Bentler, P.M., 1990, Model conditions for asymptotic robustness in the analysis of
linear relations, Computational Statistics and Data Analysis, vol.10, pp.235249.
Satorra, A., Bentler, P.M., 1994, Corrections to test statistics and standard errors in covariance structure analysis, Proceedings of the Business and Economic Statistics Section of the
American Statistical Association, pp.308313.
Satterthwaite, F.E., 1941, Synthesis of variance, Psychometrika, vol.6, pp.309316.
Schacter, S., Singer, J.E., 1962, Cognitive, social and physiological determinants of emotional
states, Psychological Review, vol.69, July, pp.379399.
Schaefer, J.L., Graham, J.W., 2002, Missing data our view of the state of the art, Psychological
Methods, vol.7, pp.147177.
Schater, D., 2011, Psychology, Catherine Woods.
Scherer, K.R., 2005, What are emotions? And how can they be measured?, Social Science Information, vol.44, pp.695729.
Schmidt, F.L., Hunter, J.E., 1999, Theory testing and measurement error, Intelligence, vol.27,
pp.183198.
Schneeweiss, H., Mathens, H., 1995, Factor analysis and principal components, Journal of
Multivariate Analysis, vol.55, pp.105124.
Schooler, C., 1968, A note of extreme caution on the use of Guttman scales, American Journal
of Sociology, vol.74, pp.296301.
Schuman, H., 1995, Attitudes, beliefs and behavior, in: Cook, K.S., Fine, G.A., House, J.S.
(eds.), Sociological perspectives on social psychology, Allyn Bacon, Needham Hights, MA,
pp.6889.
Schuman, H., Presser, S., 1981, Questions and answers in attitude surveys experiments on
question form, wording and context, Academic Press, New York.
Schuur, W.H. van, 2011, Ordinal item response theory Mokken scale analysis, Sage Publication, Thousand Oaks.
Schwarz, N., Clore, G.L., 1983, Mood, misattribution and judgements of well-being informative and directive functions of affective states, Journal of Personality and Social Psychology,
vol.45, September, pp.513523.
Schwartz, S.H., 1992. Universals in the content and structure of values theoretical advances
and empirical tests in 20 countries, in: Zanna, M. (ed.), Advances in experimental social
psychology, vol.25, Academic Press, Orlando, FL, pp.165.
Schwartz, S.H., 1994, Are there universal aspects in the structure and contents of human values?,
Journal of Social Issues, vol.50, no.4, pp.1945.
Schwartz, S.H., 1996, Value priorities and behavior applying a theory of integrated value
systems, in: Seligman, C., Olson, J.M., Zanna, M.P. (eds.), The Ontario symposium the
psychology of values, Lawrence Erlbaum Associates, New Jersey, pp.124.
Schwartz, S.H., Bilsky, W., 1987, Toward a psychological structure of human values, Journal of
Personality and Social Psychology, vol.53, pp.550562.
Schwartz, S.H., Bilsky, W., 1990, Toward a theory of the universal content and structure of values extensions and cross-cultural replications, Journal of Personality and Social Psychology, vol.58, pp.878891.
Schwartz, S.H., Boehnke, K., 2004, Evaluating the structure of human values with confirmatory
factor analysis, Journal of Research in Personality, vol.38, pp.230255.

462

References

Scott, W.A., 1965, Values and organizations A study of fraternities and sororities, Rand
McNally, Chicago.
Scott, J.E., Lamont, L.M., 1974, Relating consumer values to consumer research Amodel and
method for investigation, in: Greer, T.V. (ed.), Increasing marketing productivity, American
Marketing Association, Chicago, pp.283288.
Sego, T., Stout, P.A., 1994, Anxiety associated with social issues The development of a scale to
measure an antecedent construct, Advances in Consumer Research, vol.21, Association for
Consumer Research, pp.601606.
Shafir, E., Simonson, I., Tversky, A., 1993, Reason-based choice, Cognition, vol. 49, no. 1,
pp.1136.
Sharot, T., De Martino, B., Dolan, R.J., 2009, How choice reveals and shapes expected hedonic
outcome, Journal of Neuroscience, vol.29, pp.37603765.
Shavelson, R.J, Webb, N.M., 1991, Generalizability theory A primer, Sage, Newbury Park.
Sheth, J.N., Newman, B.I., Gross, B.L., 1991, Why we buy what we buy A theory of consumption values, Journal of Business Research, no.22, pp.159170.
Siegel, S., 1956, Nonparametric statistics for the behavioral sciences, McGraw-Hill, New York.
Sijtsma, K., Molenaar, I.W., 2002 Introduction to nonparametric item response theory, Sage,
Thousand Oaks, CA.
Simmons, C.B., Wade, W.B., 1985, A comparative study of young peoples ideals in five countries, Adolescence, Simmons Market Research Bureau, New York, pp.889898.
Simmons, C.B., Wade, W.B., 1993, You and your family survey booklet, Simmons Market Research Bureau, New York.
Simonson, I., 1989, Choice based on reasons The case of the attraction and compromise effects,
Journal of Consumer Research, vol.16, no. 2, pp.158174.
Skinner, B.F., 1971, Beyond freedom and dignity, Knopf, New York.
Smart, R.C., Smart, M.S., 1975, Group values shown in preadolescents drawings in five English
speaking countries, Journal of Social Psychology, vol.97, pp.2337.
Smith, J.B., Colgate, M., 2007, Customer value creation A practical framework, Journal of
Marketing Theory and Practice, vol.15, no.1, pp.723.
Smith, K.W., 1974, On estimating the reliability of composite indexes through factor analysis,
Sociological Methods and Research, vol.2, pp.4551.
Smith, M.B., 1991, Values, self, and society toward a humanist social psychology, Transaction,
New Brunswick, NJ.
Snook, S.C., Gorsuch, R.L., 1989, Component analysis versus common factor-analysis
AMonteCarlo study, Psychological Bulletin, vol.106, no.1, pp.148154.
Sobel, M.E., Bohrnstedt, G.W., 1985, Use of null models in evaluating the fit covariance structure models, in: Tuma, B.B. (ed.), Sociological Methodology, Jossey-Bass, San Francisco,
pp.152178.
Srbom, D., 1974, A general method for studying differences in factor means and factor structure between groups, British Journal of Mathematical and Statistical Psychology, vol.27,
pp.229239.
Srbom, D., 1975, Detection of correlated errors in longitudinal data, British Journal of Mathematical and Statistical Psychology, vol.2, pp.138151.
Spearman, C., 1904a, General intelligence objectively determined and measured, American
Journal of Psychology, vol.15, pp.201293.
Spearman, C., 1904b, The proof and measurement of association between two things, American
Journal of Psychology, vol.15, pp.72101.

References

463

Spearman, C., 1907, Demonstration of formulae for true measurement of correlation, American
Journal of Psychology, vol.18, pp.161169.
Spearman, C., 1910, Correlation calculated from faulty data, British Journal of Psychology,
vol.3, pp.271295.
Spearman, C., 1927, The abilities, Macmillan, London.
Spector, P.E., 1992, Summated rating scale construction, Sage, Thousand Oaks, CA.
Stanisz, A., 2006, Przystpny kurs statystyki z zastosowaniem Statistica na przykadach z medycyny, t.13, Statsoft Polska, Krakw.
Staub, E., 1989, Individual and societal (group) values in a motivational perspective and their
role in benevolence and harmdoing, in: Eisenberg, N., Reykowski, J., Staub, E. (eds.), Social
and moral values: individual and societal perspectives, Lawrence Erlbaum Associates, New
Jersey, pp.4561.
Steenkamp, J.E.M., Baumgartner, H., 1998, The role of optimum stimulation level in exploratory consumer behavior, Journal of Consumer Research, vol.19, December, pp.434448.
Stegelmann, W., 1983, Expanding the Rasch model to a general model having more than one
dimension, Psychometrika, vol.26, pp.261271.
Steiger, J.H., 1989, EzPath: A supplementary module for SYSTAT and SYGRAPH, SYSTAT,
Evanston, IL.
Steiger, J.H., 1990a, Some additional thoughts on components, factors, and factor-indeterminacy, Multivariate Behavioral Research, vol.25, no.1, pp.4145.
Steiger, J.H., 1990b, Structural model evaluation and modification an interval estimation approach, Multivariate Behavioral Research, vol.25, pp.173180.
Steiger, J.H., 2000, Point estimation, hypothesis testing and interval estimation using the RMSEA some comments and a reply to Hayduk and Glaser, Structural Equation Modeling,
vol.7, no.2, pp.149162.
Steiger, J.H., Lind, J., 1980, Statistically-based tests for the number of common factors, Annual
Spring Meeting of the Psychometric Society, Iowa City.
Stevens, S.S., 1946, On the theory of scales of measurement, Science, vol.103, pp.677680.
Stevens, S.S., 1951, Mathematics, measurement, and psychophysics, in: Stevens, S.S. (ed.),
Handbook of experimental psychology, Wiley, New York.
Stevens, S.S., 1966, A metric for social consensus, Science, no.151.
Stewart, D.W., 1981, Application and misapplication of factor analysis in marketing research,
Journal of Marketing Research, vol.18, pp.5162.
Stine, W.W., 1989, Meaningful inference The role of measurement in statistics, Psychological
Bulletin, vol.105, pp.147155.
Stobiecka, J., 2010, Modele pomiaru jakoci marketingowej produktw, Wydawnictwo Uniwersytetu Ekonomicznego w Krakowie, Krakw.
Stouffer, S.A., 1950, Measurement and prediction, Princeton University Press, New Jersey.
Strahilevitz, M., Myers, J.G., 1998, Donations to charity as purchase incentives how well they
work may depend on what you are trying to sell, Journal of Consumer Research, vol.24,
no.4, pp.434446.
Sudman, S., Bradburn, N.M., 1974, Response effects in surveys, Aldine, Chicago.
Sudman, S., Bradburn, N.M., 1982, Asking questions A practical guide to questionnaire design, Jossey-Bass, San Francisco.
Suppes, P., 1951, A set of independent axioms for extensive quantities, Portugalie Mathematica,
vol.10, no. 2, pp.163172.

464

References

Suppes, P., Krantz, D.H., Luce, R.D., Tversky, A., 1989, Foundations of measurement, vol.2
Geometrical, threshold, and probabilistic representations, Academic Press, San Diego.
Suppes, P., Zinnes, J.L., 1963, Basic measurement theory, in: Luce, R.D., Bush, R.R., Galanter,
E. (eds.), Handbook of mathematical psychology, vol.1, Wiley, New York, pp.176.
Sweeney, J.C., Soutar, G.N., 2001, Consumer perceived value the development of amultiple
item scale, Journal of Retailing, no.77, pp.203220.
Symonds, P.M., 1928, Factors influencing test reliability, Journal of Educational Psychology,
vol.19, February, pp.7387.
Sztemberg-Lewandowska, M., 2008, Analiza czynnikowa w badaniach marketingo
wych,
Wydawnictwo Uniwersytetu Ekonomicznego we Wrocawiu, Wrocaw.
Szymaski, M.J., 1998, Modzie wobec wartoci prba diagnozy, IBE, Warszawa.
Tabachnick, B.G., Fidell, L.S., 2007, Using multivariate statistics, Pearson Allyn and Bacon,
Upper Saddle River, NJ.
Tajfel, H., Turner, J.C., 1986, An integrated theory of intergroup conflict, in: Austin, W.J.,
Worchel, S. (eds.), Psychology of intergroup relations, Nelson-Hall, Chicago, pp.724.
Tanaka, J.S., 1987, How big is big enough? Sample size and goodness-of-fit in structural equation
models with latent variables, Child Development, vol.58, pp.253263.
Tanaka, Y., Leeuw, J. de, 1987, On the relationship between item response theory and factor
analysis of discretized variables, Psychometrika, vol.52, pp.393408.
Tanur, J.M. (ed.), 1992 Questions about questions inquiries into the cognitive bases of surveys,
Russell Sage Foundation, New York.
Tarka, P., 2008, From ranking (Rokeach RVS) to rating scales evaluation some empirical
observations on multidimensional scaling Polish and Dutch youths values, Innovative Management Journal, vol.12, pp.2442.
Tarka, P., 2010a, Statistical choice between rating or ranking methods of scaling consumers values, Statistics in Transition, vol.11, no.1, pp.177187.
Tarka, P., 2010b, Subjective methods of exploration and data visualization in factor analysis,
Conference paper presented at 33rd Scientific Conference on Multivariate Statistical
Analysis MSA 2014, 1719 November, d, Poland.
Tarka, P., 2011, Measurement, reliability estimation in a view of classical true-score theory, Argumenta Oeconomica, vol.27, no.2, pp.6599.
Tarka, P., 2012, Equivalence measurement in factor analysis, Statistics in Transition, vol.12,
no.1, March, pp.143158.
Tarka, P., 2013a, Construction of the measurement scale for consumers attitudes in the frame of
one-parametric Rasch model, Folia Oeconomica Lodziensis, no.286, pp.333339.
Tarka, P., 2013b, Geometrical perspective on rotation and data structure diagnosis in factor
analysis, Ekonometria, vol.1, no.39, pp.198209.
Tarka, P., Kaczmarek, M., 2013a, Analiza porwnawcza 5- i 7-stopniowych skal Likerta
w wietle osignitego poziomu rzetelnoci wynikw pomiaru, Wiadomoci Statystyczne,
nr 8, pp.3747.
Tarka, P., Kaczmarek, M., 2013b, Metoda gromadzenia danych a ekwiwalencja wynikw pomiaru systemu wartoci z 5- i 7-stopniowych skal ratingowych Likerta, Handel Wewntrzny,
nr 5, pp.4256.
Tarkkonen, L., 1987, On reliability of composite scales, Statistical studies, 7th ed., Finnish Statistical Society.
Tarkkonen, L., Vehkalahti, K., 2005, Measurement errors in multivariate measurement scales,
Journal of Multivariate Analysis, vol.96, September, pp.172189.

References

465

Taschian, A., Slama, M.E., Taschian, R., 1984, Measuring attitudes towards energy conservation cynicism, belief in material growth, and faith in technology, Journal of Public Policy
and Marketing, vol.3, no.2, pp.134148.
Tatarkiewicz, W., 1985, Wartoci i cele osobiste modziey, Psychologia Wychowawcza, nr 4.
Taylor, P.W., 1961, Normative discourse, Prentice-Hall, Englewood Cliffs, New York.
Thaler, R.H., 1980, Toward a positive theory of consumer choice, Journal of Economic Behavior
and Organization, vol.1, pp.3960.
Thaler, R.H, Johnson, E.J., 1990, Gambling with the house money and trying to break even the
effects of prior out-comes on risky choice, Management Science, vol.36, no.6, pp.643660.
Thoits, P.A, Virshup, L.K., 1997, Mes and Wes forms and functions of social identities, in:
Ashmore, R., Jussim, L. (eds.), Self and identity fundamental issues, Oxford University
Press, New York, pp.106133.
Thomas, W.J., Znaniecki, F., 1918, Polish peasant in Europe and America, vol.1, Badger, Boston.
Thompson, B., 2004, Exploratory and confirmatory factor analysis understanding concepts
and applications, American Psychological Association, Washington, D.C.
Thompson, M.S., Green, S.B., 2006, Evaluating between-group differences in latent means, in:
Hancock, G.R., Mueller, R.O. (eds.), Structural equation modeling Asecond course, Information Age, Greenwich, pp.119169.
Thorndike, R.L., 1904, An introduction to the theory of mental and social measurements, The
Science Press, New York.
Thorndike, R.L., Hagen, E., 1969, Measurement and evaluation in psychology and education,
John Wiley and Sons, New York.
Thurstone, L.L., 1927, A law of comparative judgement, Psychological Review, no. 34,
pp.237296.
Thurstone, L.L., 1928, Attitudes can be measured, American Journal of Sociology, vol. 33,
pp.529554.
Thurstone, L.L., 1929, Theory of attitude measurement, Psychological Review, vol. 36,
pp.222241.
Thurstone, L.L., 1931, The measurement of social attitudes, Journal of Abnormal and Social
Psychology, vol.26, pp.249269.
Thurstone, L.L., 1935. The vectors of mind, University of Chicago Press, Chicago.
Thurstone, L.L., 1937, Current misuse of the factorial methods, Psychometrika, vol.2, pp.7376.
Thurstone, L.L., 1947, Multiple-factor analysis A development and expansion of the vectors of
the mind, University of Chicago Press, Chicago.
Thurstone, L.L., Chave, E.J., 1929, The measurement of attitude, University of Chicago Press,
Chicago.
Tian, K.T., Bearden, W.O., Manning, K.C., 2001, Consumers need for uniqueness scale development and validation, Journal of Consumer Research, vol.28, June, pp.5066.
Townsend, J.T., Ashby, F.G., 1984, Measurement scales and statistics the misconception misconceived, Psychologial Bulletin, vol.96, pp.394401.
Triandis, H.C., Kilty, K.M., Shanmugam, A.V., Tanaka, Y., Vassiliou, V., 1972, Cognitive structures and the analysis of values, in: Triandis, H.C. (ed.), The analysis of subjective culture,
Wiley, New York, pp.181263.
Triandis, H.C., Bontempo, R., Betancourt, H., Bond, M., Leung, K., Brenes, A., Georgas, J.,
Hui, C.H., Marin, G., Setiadi, B., Sinha, J.B.P., Verma, J., Spangenberg, J., Touzard, H.,

466

References

Montmollin, G. de, 1986, The measurement of the etic aspects of individualism and collectivism across cultures, Australian Journal of Psychology, vol.38, pp.257267.
Tucker, L.R., 1949, A note on the estimation of test reliability by the Kuder-Richardson formula
(20), Psychometrika, vol.14, no.2, June, pp.117119.
Tucker, L.R., Lewis, C., 1973, A reliability coefficient for maximum likelihood factor analysis,
Psychometrika, vol.38, pp.110.
Tukey, J.W., 1961, Data analysis and behavioral science or learning to bear the quantitative
mans burden by shunning badmand-ments, in: Jones, L.V. (ed.), The collected works of John
W. Tukey, vol.4, Philosophy and Principles of Data Analysis 1986, Wadsworth, Belmont,
CA, pp.391484.
Tversky, A., 1977, Features of similarity, Psychological Review, vol.84, pp.327352.
Unger, L., Kernan, J.B., 1983, On the meaning of leisure an investigation of some determinants
of the subjective experience, Journal of Consumer Research, vol. 9, pp.381392.
Ven, A.H.G.S. van der, 1980, Introduction to scaling, Wiley, Chichester.
Vijver, F.J.R. van de, Leung, K., 1997, Methods and data analysis for cross-cultural research,
Sage, Newbury Park.
Vandenberg, R.J., Lance, C.E., 2000, A review and synthesis of the measurement invariance
literature: suggestions, practices, and recommendations for organizational research, Organizational Research Methods, vol.3, no.1, pp.469.
Veblen, T., 1899, The theory of the leisure class, Macmillan, New York.
Velicer, W.F., 1974, A comparison of the stability of factor analysis, principal component analysis, and rescaled image analysis, Educational and Psychological Measurement, vol. 34,
pp.563572.
Velicer, W.F., 1976, The relation between factor score estimates, image scores, and principal component scores, Educational and Psychological Measurement, vol.36, pp.149159.
Velicer, W.F., 1977, An empirical comparison of the similarity of principal component, image,
and factor patterns, Multivariate Behavioral Research, vol.12, pp.322.
Velicer, W.F., Fava, J.L., 1987, An evaluation of the effects of variable sampling on component,
image, and factor analysis, Multivariate Behavioral Research, vol.22, pp.193209.
Velicer, W.F., Fava, J.L., 1998, Effects of variable and subject sampling on factor pattern recovery,
Psychological Methods, vol.3, no.2, pp.231251.
Velicer, W.F., Jackson, D.N., 1990, Component analysis vs. common factor-analysis some further observations, Multivariate Behavioral Research, vol.25, no.1, pp.97114.
Velicer, W.F., Peacock, A.C., Jackson, D.N., 1982, A comparison of component and factor patterns A Monte Carlo approach, Multivariate Behavioral Research, vol. 17,
pp.371388.
Velleman, P.F., Wilkinson, L., 1993, Nominal, ordinal, interval and ratio typologies are misleading, The American Statistician, vol.47, no.1, February, pp.6572.
Veroff, J., Douvan, E., Kulka, R.A., 1981, The inner American, The Basic Books, New York.
Verplanken, B., Holland, R.W., 2002, Motivated decision making effects of activation and selfcentrality of values on choices and behavior, Journal of Personality and Social Psychology,
vol.82, no. 3, pp.434447.
Vigneron, F., Johnson, L.W., 2004, Measuring perceptions of brand luxury, Journal of Brand
Management, vol.11, pp.484506.
Vinson, D.E., 1977, Personal values as a dimension of consumer discontent, in: Greenberg, B.,
Bellinger, D. (eds.), Contemporary marketing thought, American Marketing Association,
Chicago.

References

467

Vinson, D.E., Munson, J.M., 1976, Personal values An approach to market segmentation, in:
Bernhardt, K.L. (ed.), Marketing and beyond, American Marketing Association, Chicago,
pp.313317.
Vinson, D.E., Scott, J.E., Lamont, L.M., 1977, The role of personal values in marketing and
consumer behavior, Journal of Marketing, April, vol.41, no.2, pp.4450.
Vispoel, W.P., 1996, The development and validation of the arts self-perception inventory for
adults, Educational and Psychological Measurement, vol.56, pp.719735.
Voss, K.E., Spangenberg, E.R., Grohmann, B., 2003, Measuring the hedonic and utilitarian
dimensions of consumer attitude, Journal of Marketing Research, vol.40, no.3, August,
pp.310320
Vyncke, P., 2002, Lifestyle segmentation from attitudes, interests and opinions to values,
aesthetic styles, life visions and meta preferences, European Journal of Communications,
vol.17, no.4, pp.445463.
Wainer, H., 1982, Robust statistic A survey and some prescriptions, in: Keren, G. (ed.), Statistical and methodological issues in psychology and social sciences research, Lawrence Erlbaum, Hillsdale, NJ, pp.187214.
Wakefield, K.L., Barnes, J.H., 1996, Retailing hedonic consumption A model of sales promotion of a leisure service, Journal of Retailing, vol.72, no.4, pp.409427.
Walesiak, M., 1996, Metody analizy danych marketingowych, Wydawnictwo Naukowe PWN,
Warszawa.
Wang C.-L., Chen Z.-X., Chan A.K.K., Zheng Z.-C., 2000, The influence of hedonic values on
consumer behaviors, Journal of Global Marketing, vol.14, no.1/2, pp.169186.
Wang, W.C., Wilson, M., Adams, R.J., 1997, Rasch models for multidimensionality between and
within items, in: Wilson, M., Engelhard, G. (eds.), Objective measurement theory into
practice, vol.4, Ablex Publishing Corporation, Greenwich, CT, pp.139156.
Wart, P.B., Coffman, T.L., 1970, Personality, involvement, and extremity of judgement, British
Journal of Social and Clinical Psychology, vol.9, no.2, pp.108121.
Waters, M.C., 1990, Ethnic options choosing identities in America, University California Press,
Berkeley.
Weathers, D., Sharma, S., Niedrich, R.W., 2005, The impact of the number of scale points, dispositional factors, and the status quo decision heuristic on scale reliability and response accuracy, Journal of Business Research, vol.58, pp.15161524.
Weiss, D.J., 1983, Introduction, in: Weiss, D.J. (ed.), New horizons in testing latent trait test
theory and computerized adaptive testing, Academic Press, New York, pp.18.
Werts, C.E., Breland, H.M., Grandy, J., Rock, D.R., 1980, Using longitudinal data to estimate
reliability in the presence of correlated measurement errors, Educational and Psychological
Measurement, vol.40, Spring, pp.1929.
Wesman, A.G., 1971, Writing the test item, in: Thorndike, R.L. (ed.), Educational Measurement, 2nd ed., American Council on Education, Washington D.C.
West, S.G., Finch, J.F., Curran, P.J., 1995, Structural equation models with non-normal variables problems and remedies, in: Hoyle, R. (ed.), Structural equation modeling concepts,
issues, and applications, Sage, Thousand Oaks, pp.5675.
Westbrook, R.A., Oliver, R.L., 1991, The dimensionality of consumption emotion patterns and
consumer satisfaction, Journal of Consumer Research, vol.18, pp.8491.
Whitely, S.E., 1980, Multicomponent latent trait models for ability tests, Psychometrika, vol.45,
pp.479494.

468

References

Widaman, K.F., 1990, Bias in pattern loadings represented by common factor-analysis and component analysis, Multivariate Behavioral Research, vol.25, no.1, pp.8995.
Widaman, K.F., 1993, Common factor-analysis versus principal component analysis differential bias in representing model parameters, Multivariate Behavioral Research, vol.28, no.3,
pp.263311.
Wilcox, J.B., Howell, R.D., Breivik, E., 2008, Questions about formative measurement, Journal
of Business Research, vol.61, pp.12191228.
Williams, L.J., 1995, Covariance structure modeling in organizational research problems with
the method versus applications of the method, Journal of Organizational Behavior, vol.16,
pp.225234.
Williams, R.M., 1968, International encyclopedia of the social sciences, Macmillan, New York.
Williams, R.M., 1979, Change and stability in values and value system A sociological perspective, in: Rokeach, M. (ed.), Understanding human values individual and societal, Free Press,
New York, pp.1546.
Wilson, E.B., Worcester, J., 1939, Note on factor analysis, Psychometrika, vol.4, pp.133148.
Wilson, M., 1985, Measuring stages of growth A psychometric model of hierarchical development, Occasional paper no.19, Australian Council for Educational Research, Hawthorn,
Victoria.
Wilson, M., 2005, Constructing measures An item response modeling approach, Lawrence
Erlbaum Associates, New York.
Wittgenstein, L., 1961, Entry, in: Anscombe, G.E.M., Von Wright, G.H. (eds.), Notebooks
19141916, Basil Blackwell, London, pp.78.
Wojciszke, B., 1989, The system of personal values and behavior, in: Eisenberg, N., Reykowski,
J., Staub, E. (eds.), Social and Moral Values: Individual and Societal Perspectives, Lawrence
Erlbaum, Hillsdale, NJ, pp.229251.
Wood, J.M., Tataryn, D.J., Gorsuch, R.L., 1996, Effects of under- and overextraction on principal axis factor analysis with varimax rotation, Psychological Methods, vol.1, pp.354365.
Wood, W., 2000, Attitude change: persuasion and social influence, Annual Review of Psychology, vol.51, pp.539570.
Woodall, T., 2003, Conceptualizing value for the customer An attributional, structural and
dispositional analysis, Academy of Marketing Review, no.12.
Woodruffe-Burton, H., Eccles, S., Elliott, R., 2002, Towards a theory of shopping Aholistic
framework, Journal of Consumer Behaviour, vol.1, no.3, pp.256266.
Wright, B.D., 1997, A history of social science measurement, Educational Measurement Issues and Practice, vol.16, no.4, pp.3345.
Wright, B.D., Stone, M.A., 1979, Best test design, Mesa Press, Chicago.
Wyer, R.S. Jr., 1969, The effects of general response style on measurement of own attitude and the
interpretation of attitude-relevant messages, British Journal of Social and Clinical Psychology, vol.8, no.2, pp.105115.
Yamauchi, K.T., Templer, D.I., 1982, The development of a money attitude scale, Journal of
Personality Assessment, vol.46, pp.522528.
Yang, Y., 2005, Can the strengths of AIC and BIC be shared?, Biometrika, vol.92, pp.937950.
Yankelovich, D., 1964, New criteria for market segmentation, Harvard Business Review, March
vol. 42, pp.8390.
Yankelovich, D., Meer, D., 2006, Rediscovering market segmentation, Harvard Business Review, February, pp.111.

References

469

Yates, A., 1987, Multivariate exploratory data analysis A perspective on exploratory factor
analysis, State University of New York Press, Albany.
Yuan, K.H., 2005, Fit indices versus test statistics, Multivariate Behavioral Research, vol.40,
pp.115148.
Yuan, K.H., Bentler, P.M., 2001, Effect of outliers on estimators and tests in covariance structure
analysis, British Journal of Mathematical and Statistical Psychology, vol.54, pp.161175.
Yule, G.U., 1897, On the theory of correlation, Journal of the Royal Statistical Society, vol.60,
pp.812851.
Zagra-Jonszta, U., 2004, Ewolucja teorii wartoci, in: Zadora, H. (red.), Warto wnaukach
ekonomicznych, Politechnika lska, Gliwice.
Zaichkowsky, J.L., 1985, Measuring the involvement construct, Journal of Consumer Research,
vol.12, pp.341352.
Zeller, R.A., Carmines, E.G., 1979, Reliability and validity assessment, Sage Publications, New
York.
Zeller, R.A., Carmines, E.G., 1980, Measurement in the social sciences The link between theory and data, Cambridge University Press, Cambridge.
Zhang, W., 2008, A comparison of four estimators of a population measure of model fit in covariance structure analysis, Structural Equation Modeling, vol.15, pp.301326.
Zuckerman, M., 1979, Sensation seeking beyond the optimum level of arousal, Lawrence Erlbaum, Hillsdale, NJ.
Zuse, H., 1994, Complexity metrics, in: Marciniak, J. (ed.), Encyclopedia of software engineering, Wiley, New York.

List of Figures
1. Theoretical model of relations among motivational types of values, higher
order values, and bipolar value dimension .......................................................20
2. Market segmentation from various perspectives .............................................42
3. Measurement process by Krizs ..........................................................................51
4. The relationship between ability and item response on item characteristic
curve ICC ...........................................................................................................75
5. A step function ICC .............................................................................................76
6. The IRT model for measurement of latent variable .........................................78
7. Illustration of items with different discrimination and difficulty parameters
on ICC ...................................................................................................................81
8. Four item response functions of monotone homogenous items that conform
2PL model .............................................................................................................84
9. Category boundary response functions for a five-category polytomous
item ........................................................................................................................88
10. Summated scale ....................................................................................................111
11. Example of answers by importance according to two examinees: rating and
ranking scale .........................................................................................................137
12. Steps in scale development .................................................................................142
13. Stages of operational procedure .........................................................................144
14. Effect model with reflective indicators ..............................................................146
15. Causal model with formative indicators ...........................................................146
16. The OAR structure of measurement .................................................................157
17. Reliability and validity of measurement instrument .......................................164
18. Distributions of true scores T, errors scores E and observed scores X in the
same data set .........................................................................................................166
19. Reliability coefficient as the square of the correlation between observed
scores and true scores ..........................................................................................173
20. Difference between classical test theory and factor analysis model ..............214
21. Selected group of techniques using analysis ofinterdependence and dependence ........................................................................................................................216
22. Path diagrams of two correlated factors as modeled using exploratory factor analysis (EFA) (on the left, with cross-loadings) and confirmatory factor analysis (CFA) (on the right, with oblique rotation) .................................221

List of Figures

471

23. Confirmatory factor analysis with oblique rotation (correlated two factors)
and correlated measurement errors ...................................................................223
24. Decomposition of the original variable variance .............................................230
25. Communality between three measures observed variables X1, X2 and X3
composing factor F ..............................................................................................232
26. Vectors configuration defined in the space of two common factors F1and F2 .258
27. A patch diagram for higher-order factor analysis model ...............................267
28. Under-identified CFA model (df = 1) .............................................................284
29. Just-identified CFA model (df = 0) ....................................................................285
30. Under-identified CFA model (df = 1) .............................................................286
31. Over-identified CFA model (df = 2) ..................................................................287
32. Over-identified CFA model (df = 1) ..................................................................288
33. Empirically under-identified CFA model (df = 0) ...........................................289
34. Empirically under-identified CFA model (df = 1) ...........................................290
35. Scale development process .................................................................................341
36. Graphical display of 13 items distribution by their categories and examinees answers .........................................................................................................350
37. Scree plot for 13 items as components ..............................................................358
38. Two models: CFA model 1 in reference to theoretical construct of HVS scale
and CFA model 2 derived from exploratory factor analysis EFA ...............387
39. Identification (based on unit loading identification constraint) of CFA
model .....................................................................................................................388
40. Standardized estimates for CFA model 1 2 = 116.77, degrees of freedom
=59, probability level = 0.00 ..............................................................................390
41. Standardized estimates CFA model 2a without items X5 and X13 ...........412
42. Standardized estimates CFA model 2b without items X5, X10 and X13...413

List of Tables
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.

Motivational types of values ...............................................................................19


Typology of selected consumers values ............................................................40
Consumers segmentation information ............................................................45
General classification of the measurement approaches by reference and
method ..................................................................................................................53
Rules of the measurement in classical theory of measurement and item response theory ........................................................................................................94
Types of measurement scales according to Stevens .........................................100
The concepts of the measurement levels ...........................................................100
Summated scale with statements for measuring values on 5-point scale .....108
Binary codes for Gutmmans scalogram ...........................................................112
RVS scale ...............................................................................................................116
List of values LOV scale ...................................................................................118
Typology and characteristics of scales for values measurement ....................123
Health conscious scale: HCS ..............................................................................129
Subjective leisure scale: SLS ................................................................................130
Belief in material growth scale: BIMG ..............................................................131
Attachment to possessions scale ........................................................................132
A framework for assessing reflective and formative measurement scales ....148
The dichotomous agree-disagree format for declarative statements .............153
Various methods of test-scale reliability estimation some differences ......179
Guidelines for identifying significant factor loadings based on sample size.234
Number of factors and number of observed variables ....................................251
Number of factors at the defined level of observed variables ........................292
Basic characteristics of selected fit indices demonstrating goodness-of-fit
across different model situations .......................................................................310
Dimensions along which fit indices can vary ...................................................311
The CFA reporting guidelines check-list ..........................................................323
General categories for intercultural and cross-groups invariance measurements .....................................................................................................................326
Theoretically-based multidimensional construct of hedonicconsumerism
values with respective facets ...............................................................................346
Sample characteristics gender .........................................................................348

List of Tables

473

29. Sample characteristics examinees parents residence (place of living) ......348


30. Sample characteristics residence of examinees during studying ................348
31. Sample characteristics involvement of examinees in additional activities
except learning .....................................................................................................348
32. Analysis of 13 items according to categories and examinees responses ......349
33. Descriptive statistics 13 items .............................................................................350
34. Anti-image matrices of 13 items ........................................................................352
35. Correlation matrix of 13 items ...........................................................................353
36. Bartletts test of sphericity ...................................................................................354
37. Total variance explained in extraction: PCA before varimax rotation .........356
38. Total variance explained in extraction: PAF, GLS and ML methods before
varimax rotation ...................................................................................................357
39. Total variance explained after varimax rotation and PCA extraction ..........357
40. Communalities by PCA extraction method .....................................................359
41. Goodness-of-fit test by GLS method .................................................................360
42. Factor loadings structure matrix before rotation by PCA and PAF method 362
43. Factor loadings structure matrix after varimax by PCA and ULS method...363
44. Factor loadings structure matrix after varimax by MINRES and PAF
method ..................................................................................................................364
45. Factor loadings structure matrix after varimax by C and AF method ..........364
46. Factor loadings structure matrix after varimax by ML and IFA method .....365
47. Factor loadings structure matrix after quartimax by PCA and ULS
method...................................................................................................................365
48. Factor loadings structure matrix after quartimax by MINRES and PAF
method ..................................................................................................................366
49. Factor loadings structure matrix after quartimax by C and AF method ......366
50. Factor loadings structure matrix after quartimax by ML and IFA method .367
51. Factor loadings structure matrix after direct oblimin (at delta 0) with Kaiser normalization by GLS method .....................................................................369
52. Pattern matrix after direct oblimin (at delta 0) with Kaiser normalization
by GLS method .....................................................................................................370
53. Factor correlation matrix direct oblimin (at delta 0) with Kaiser normalization by GLS method .........................................................................................370
54. Factor loadings structure matrix after direct oblimin with Kaiser normalization by GLS method at 0 and 0.8 delta levels ..........................................372
55. Factor loadings structure matrix after direct oblimin with Kaiser normalization by GLS method at 0.6 and 0.4 delta levels .....................................372
56. Factor loadings structure matrix after direct oblimin with Kaiser normalization by GLS method at 0.2 and 0.2 delta levels .......................................373

474

List of Tables

57. Factor loadings structure matrix after direct oblimin with Kaiser normalization by GLS method at 0.4 delta level .........................................................373
58. Factor correlation matrix direct oblimin with Kaiser normalization by
GLS method at different delta levels ..............................................................374
59. Correlations among oblique factors at PCA method and varimax rotation.375
60. Correlations of group variables with secondary and primary factors atPCA
method and varimax rotation ............................................................................376
61. Secondary and primary factor loadings at PCA method and varimax rotation .........................................................................................................................377
62. Reliability statistics for SF1, SF2, PF1, PF2, PF3 and PF4 ..................................379
63. Summary item statistics for SF1, SF2, PF1, PF2, PF3 and PF4 ..........................381
64. Item-total statistics for SF1 and SF2 ...................................................................382
65. Additional items for primary factor PF3 ...........................................................384
66. Parameters summary CFA model 1 ...............................................................390
67. Computation of degrees of freedom CFA model 1 ......................................391
68. Identification rules ...............................................................................................392
69. Model fit summary FMIN CFA model 1 with all items ...........................394
70. Model fit summary CMIN CFA model 1 with all items ...........................394
71. Model fit summary NCP CFA model 2 with all items ..............................394
72. Model fit summary RMSEA CFA model 1 with all items ........................395
73. Model fit summary HOELTER CFA model 1 with all items ...................395
74. Model fit summary RMR, GFI, AGFI, PGFI CFA model 1 with all
items .......................................................................................................................395
75. Model fit summary baseline comparisons CFA model 1 with all items .396
76. Model fit summary parsimony-adjusted measures CFA model 1 with
all items .................................................................................................................396
77. Modification indices error covariances CFA model 1 with all items ......398
78. Modification indices regression weights CFA model 1 with items X5,
X10, X13 deleted .....................................................................................................399
79. Standardized residual covariances CFA model 1 with all items .................400
80. Standardized residual covariances CFA model 2b with all items ...............401
81. Modification indices error covariances CFA model 2a with items X5,
X13 deleted .............................................................................................................402
82. Modification Indices regression weights CFA model 2a with items X5,
X13 deleted .............................................................................................................402
83. Parameters summary CFA model 2 ...............................................................403
84. Computation of degrees of freedom CFA model 2 ......................................403
85. Model fit summary FMIN CFA model 2 ....................................................404
86. Model fit summary CMIN CFA model 2 ...................................................405

List of Tables

475

87. Model fit summary HOELTER CFA model 2 ............................................405


88. Model fit summary RMSEA CFA model 2 .................................................406
89. Model fit summary NCP CFA model 2 ......................................................406
90. Model fit summary RMR, GFI CFA model 2 ............................................407
91. Model fit summary baseline comparisons CFA model 2 .........................407
92. Model fit summary parsimony-adjusted measures CFA model 2 ..........408
93. Regression weights CFA model 2a .................................................................409
94. Regression weights CFA model 2b with items X5, X10, X13 deleted ...........410
95. Standardized regression weights CFA model 2 .............................................410
96. Squared multiple correlations CFA model 2 .................................................411
97. Covariances CFA model 2 ...............................................................................414
98. Correlations CFA model 2 ...............................................................................414
99. Variances CFA model 2 ....................................................................................415
100. Standardized factor loadings, average variance extracted andreliability estimates CFA model 2b ......................................................................................417
101. The HCV construct correlation matrix (standardized) CFA model2b .....419
102. Correlations between factors and additional relevant measures CFA
model2b ................................................................................................................420

Subject and author index

Abell, P., 144


Achenreiner, G.B., 343
Acito, F., 244
Adequacy for factorial model, 351
Ahtola, T., 337, 340
Akaike, H., 307, 310
Albaum, G., 136
Algina, J., 63, 76, 94, 152, 208, 209
Allen, M.J., 68, 180, 188
Allport, G., 119
Alwin, D.F., 135
Amoo, T., 118
Anderson, J.C., 145, 215
Anderson, R.D., 244
Andrich, D., 76, 93
Aneshensel, C.S., 224
Aranowska, E., 64, 203, 262
Arbuckle, J.L., 386, 396
Argyle, M., 25
Arminger, G., 322
Armor, D.J., 199
Arnold, S.J., 337
Arrindell, W.A., 235
Ashwood, E., 28
Attitude, 29
Attitudes scales, 105
definition, 105
types
graphical scale, 106
itemized scales, 106
numerical scale, 106
rating scale, 105
summated scale, 106
Atwood, L.E., 31
Austin, J.T., 276, 325

Babin, B.J., 340


Bachman, J.G., 136
Bacon, D.R., 418
Bagozzi, R.P., 224
Baker, B.O., 103
Baker, F.B., 74, 82
Bales, R., 119
Balicki, A., 233, 240
Ball, A.D., 122
Balla, J.R., 299, 300, 325
Bandalos, D.L., 187, 324
Barnes, J.H., 337, 339
Batra, R., 337, 340
Batson, D.C., 29
Baumgartner, H., 327
Bk, A., 35
Bearden, W.O., 142, 203, 224, 299, 329, 425,
426
Bechtoldt, H.P., 202, 204, 205
Bentler, P.M., 198, 199, 235, 293, 294,
296298, 300305, 309, 311, 312, 321,
323325, 394
Berge, J.M.F., 180
Berkane, M., 312
Bernstein, I.H., 186
Biaynicka-Birula, J., 24, 27
Biemer, P., 133
Biernat, M., 30
Billiet, J., 326
Birnbaum, A., 80, 84
Biserial correlation, 211
Bitner, M.J., 18
Bloch, P.H., 336, 340, 344
Bock. R.D., 73
Bohrnstedt, G.W., 103, 311

Subject and author index


Bolger, N., 391, 392
Bollen, K.A., 145, 147149, 267, 283, 291,
300, 304, 309, 311, 312, 316318, 322, 325,
327, 329, 330, 385
Bonett, D.G., 300, 302, 303, 304, 305, 311,
323, 325, 394
Borgatta, E.F., 103, 113
Borowicz, R., 23
Borsboom, D., 150
Bourdie, P., 333
Boyle, G.J., 186
Bradburn, N.M., 133
Braithwaite, V.A., 18, 119, 123
Brandstaetter, H., 25
Brannick, M.T., 225
Brehm, J.W., 21
Breivik, E., 146
Brett, J.M., 394
Brown, T.A., 223
Browne, M.W., 272, 295, 299, 306, 312, 313
Brzeziski, J., 109, 184, 205
Buechler, S., 334
Burton, S., 339
Byrne, B.M., 303, 307, 327, 347, 397, 409

C
Caffrey, J., 198, 244
Campbell, C., 333, 337
Carman, J.M., 35
Carmines, E.G., 162, 165, 199
Casserly, C., 119
Cattell, R.B., 264, 273
Chave, E.J., 105
Chevan, A., 113
Chilton, R., 113
Cho, J.H., 119
Chomsky, N., 274
Chou, C.P., 296, 324
Choynowski, M., 49
Chrisjohn, R.D., 120
Churchill. G., 106, 127, 149, 341, 342, 343
Clark, L.A., 141
Clark, V.A., 224
Classical test theory CTT, 65
conditions, 68
criticism, 71

477

Classical test theory and factor analysis, 213


Classical test theory vs. item response
theory
establishing meaningful scale scores, 92
establishing scale properties, 93
interchangeable test forms, 91
standard error of measurement, 91
test length and reliability, 92
unbiased estimation of item profiles, 92
Clawson, C.J., 133
Clore, G.L., 136
C-OAR-SE
classification of the focal object, 158
classification the attribute of construct,
159
conceptual definition of construct, 158
identification of the rater entity, 159
reliability, 160
scale enumeration rules, 160
validity, 160
Coffman, T.L., 136
Cognitive system, 115
Cole, D.A., 330
Colgate, M., 41
Coltman, T., 146
Comrey, A.L., 235, 250
Confirmatory factor analysis CFA
alternative models, 316
correlated measurement errors, 316
independent model, 316
one-factor model, 316
uncorrelated factors model, 317
uncorrelated measurement errors, 316
decision rule, 220
error covariances, 223
errors correlation, 221
factor covariance, 223
fitting function
continuous vs. categorical observed
variables, 293
normal vs. non-normal observed variables, 293
type of the method
asymptotically distribution free, 295
continuous/categorical variable, 295
generalized least squares, 294
maximum likelihood, 293

478

Subject and author index

robust weighted least squares, 296


weighted least squares, 294
goodness of fit indexes
absolute indexes, 298
comparative or incremental indexes,
298
parsimony-based fit indexes, 298
identification
types of models
empirically under-identified, 289
just-identified, 283
number degrees of freedom, 283
under-identified, 283
identification rules
three-indicator rule, 291
t-rule, 291
two-indicator rule, 291
lack of factors rotation, 222
latent factor
identification of scale, 282
model identification
parameters summary
number of distinct parameters, 389
number of distinct sample moments,
389
rules, 392
unit loading identification, 386
unit variance identification, 386
model in structural factor analysis, 219
model parameters
interpretation, 313
nonsignificant
error variance, 315
factor covariance, 315
factor loading, 314
factor variance, 315
size and statistical significance, 313
standard error, 314
unstandardized parameter, 314
z score, 314
model reliability, 418
model respecification, 219
model validation
convergent validity, 416
average variance extracted AVE, 416
discriminant validity, 419
nomological validity, 419

other term
restricted factor analysis, 219
parameter estimates
appropriateness of the standard errors, 408
feasibility, 408
statistical significance, 408
parameters
assigned, 280
constrained, 280
free, 280
potential threat to dimensionality, 220
process, 276
data fit analysis,
identification of model, 276
model specification, 276
optional modification, 276
parameters estimation, 276
respecification of model
empirical procedures, 319
exploratory tests
lagrangian multiplier, 317
likelihood ratio, 317
wald, 317
theory and substantive knowledge
revision, 317
sample size
distributional properties of indicators, 319
estimation strategy, 319
model complexity, 319
scale of indicators, 319
scaling latent factor
identification
reference indicator, 282
marker / reference indicator, 281
unit loading / variance identification, 282
solution
analysis of covariance structure, 280
means structure, 280
standardized, 280
unstandardized, 280
statistical power of individual parameter
estimates, 313
statistical power of the test, 313
structure confirmation, 220
the invariance of the factor structure, 220
types of models
over-identified, 283

Subject and author index


Construct validity, 2047
Consumerism, 334
Coombs, C.H., 99
Coppin, G., 21
Costello, A.B., 236
Couch, A., 119, 136
Cox, M.A.A., 57
Cox, T.F., 57
Crandall, J.E., 136
Crawford, C.B., 260
Crissman, P., 119
Crocker, L., 94, 152
Cronbach, L.J., 178, 330
Crosby, L.A., 18
Cudeck, R., 299, 309, 310
Curran, P.J., 294, 296, 322, 324

D
Darden, W.R., 335, 340, 344
Davidov, E., 329
Das, J.P., 136
Dawes, R.M., 99
Dayton, C.M., 73
De Leeuw, J., 77
DeBoeck, P., 92
DeCarlo, L., 321
Dempsey, P., 119
Depner, F., 329
Derbaix, C., 219
DeVellis, R.F., 141, 145, 189
Dewey, J., 17
Diamantopoulos, A., 145
DiBello, L.V., 77
Dillon, W.R., 145
Ding, L., 312
Dingle, H., 56
Discrepancy and function
approximation, 279
error, 279
estimation, 279
Dolan, R.J., 21
Doniec, S., 25
Drasgow, F., 90, 91
Dressel, P.L., 192
Drolet, A.L., 141
Dubois, B., 340

479

Dukes, W., 119


Duliniec, E., 42
Duncan, O.D., 50
Dutta, T., 136

E
Ebel, R.L., 152
Eccles, S., 336
Edwards, A.L., 109
Eisenhart, C., 103
Embretson, S.E., 49, 72, 78, 91, 92
Ende, J. van der, 235
Enders, C.K., 186, 324
England, G.W., 120
Erickson, R.J., 30
Eroglu, S., 219
Everson, H., 328
Exploratory factor analysis EFA, 341, 375,
378
adequacy of factorial model
anti-image correlation matrix, 351
anti-image covariance matrix, 351
Bartletts test, 354
KMO index Kaiser, Mayer and Olkin,
354
partial correlations, 351
item deletion, 219
necessity of factors rotation, 222
scale identification, 218
Exploratory factor analysis EFA vs.
confirmatory factor analysis CFA, 218

F
Fabrigar, L.R., 242, 252, 272, 384
Factor analysis FA
common factor model
conditions of model construction, 229
communality estimation
average correlation, 247
highest correlation of a variable, 247
iterative improvement of the estimate,
247
squared multiple correlation SMC, 247
triads, 247
data collection, 226
dependence and interdependence, 216

480

Subject and author index

factor scores
classificatory scheme, 268
computation
coarse factor scores, 268
refined factor scores, 268
computation methods
Anderson-Rubins method, 268
Bartletts method, 268
Thurstones method, 268
definition, 268
evaluation criteria, 269
indeterminacy, 268
plots of the factors scores, 268
purpose, 267
factor scores vs. summated scale, 268
factor term, 217
general factor, 214
heywood case, 249
methods of factor loadings estimation
alpha factor analysis, 244
canonical factor analysis, 245
centroid method, 240
centroid of points, 240
generalized least squares, 243
image factor analysis, 246
maximum likelihood, 242
function, 243
wishart distribution, 243
minres, 241
principal axis factoring, 240
model
factor loadings, 226
and patch coefficients, 226
and regression coefficients, 226
factorization, 233
factor loadings, 227
and patch coefficients, 226
and regression coefficients, 226
latent variable
general factor, 227
group factor, 227
matrices
matrix of common factors, 228
matrix of loading actors, 228
matrix of unique elements, 228
matrix of variables, 228
reduced correlation matrix, 233

unique factor
uniqueness, 227
number of factors
Bartletts test, 255
eigenvalue method, 252
explained variance value method, 252
half method, 252
Jolliffe method, 252
methods of extraction
objective, 251
subjective, 251
overextraction, 250
parallel analysis, 252
parsimony vs. plausibility, 250
reduced matrix method, 254
scree plot method, 252
underextraction, 250
partial correlations of variables, 225
place in the field of statistics, 216
random measurement error, 215
rotation
correlation coefficients
n-dimensional space, 257
vectors, 257
effectiveness of rotation, 260
geometrical identification of factors, 256
simple structure, 256
techniques
KD transformation, 264
maxplane, 264
oblique
oblimin, 262
orthoblique, 263
promax, 263
quartimin, 263
orthogonal
biquartimax, 262
equimax, 262
quartimax, 261
varimax, 261
sample size
communalities level of observed variables, 271
nature of the data, 272
number of variables, 271
the extent of factors determination, 271
scale freeness, 243

Subject and author index


scale invariance, 243
soundness of observed variables in study,
272
multiple measured variables, 272
number of observed variables for factor,
272
psychometric properties of measures,
273
specific-systematic error, 215
three general aims, 216
uniqueness, 215
variance configuration
measurement error variance, 230
specific error variance, 230
unique variance, 230
Factor loadings, 410
squared, 410
standardized, 410
Factors indeterminacy
interpretation, 273
creative generalizations, 274
data of experience, 274
hypothesis, 274
inductive vs. deductive methods, 273
Fava, J.L., 235, 250, 271, 273
Feather, N.T., 17, 18, 117
Feick, L., 203
Feinberg, F.M., 337, 339
Fekken, G.C., 120
Feldt, L.S., 187
Fellers, G.L., 121
Ferguson, G.A., 260
Fiddell, L.S., 371
Finch, J.F., 294, 296, 322, 324
Fischer, E., 337
Fit indexes list
adjusted goodness-of-fit AGFI, 301
akaike information criteria AIC, 307
bayesian information criteria BIC, 307
chi-square index, 298
goodness-of-fit GFI, 301
incremental fit index IFI, 304
nonnormed fit index FI, 304
nonnormed fit index NNFI, 302
normed chi-square index, 300
normed comparative fit index CFI, 304
normed fit index NFI, 302

481

parsimony goodness-of-fit index PGFI,


305
parsimony normed fit index PNFI, 305
relative noncentrality index RNI, 301
root mean square error of approximation
RMSEA, 306
root mean square residual RMSR, 306
standardized root mean residual SRMR,
306
Tucker-Lewis index TLI, 302
Fitzsimmons, G.W., 119
Floor effects in item responses, 293
Floyd, F.J., 145
Flynn, L.R., 221
Fodor, J., 274
Formative and reflective indicators
aspects
empirical
indicator intercorrelation, 146
indicator relationships with construct,
146
measurement error and collinearity,
146
theoretical
causation, 146
nature of the construct, 146
scale and index, 150
Fornell, C., 224
Frederiksen, N., 153
Frerichs, R.R., 224
Freud, S., 334
Friedman, H.H., 133
Furnham, A., 25

G
Gaito, J., 98
Galbraith, J.K., 34
Gamst, G., 322
Ganter, B., 73
Garbarski, L., 42
Gatnar, E., 240, 363
Gecas, V., 32
Generalizability theory, 71
decision study, 71
Gerbing, D.W., 223, 342, 374
Gibson, W.A., 247

482

Subject and author index

Gifi, A., 57
Gilgen, A.R., 119
Gill, J.D., 18
Givon, M.M., 107
Gleser, G.C., 197
Goldberg, M.E., 243
Goldberger, A.S., 343
Gordon, L.V., 119
Gorlow, L., 119
Gorsuch, R.L., 235
Gorsuch, R.L., 235, 253, 269, 271,
Goude, G., 63
Gould, S.J., 122
Gower, J.C., 57
Graham, J.W., 324
Green, P.E., 109
Green, S.B., 327
Grice, J.W., 268
Griffin, M., 335, 336, 340, 344
Grohmann, B., 344
Gross, B.L., 37, 340
Grzegorczyk, A., 25
Guadagnoli, E., 235
Guarino, A.J., 322
Guilford, J.P., 66, 71
Guion, R.M., 91
Gulliksen, H., 66
Guttman, L., 103

H
Hagen, E., 65
Halaby, C.N., 29
Hambleton, R.K., 112
Hambleton, R.K., 83, 112
Hamilton, D.L., 136
Hancock, G.R., 324
Hand, D., 53, 57
Harding, S., 119, 120
Hardyck, C.D., 103
Harlow, L.L., 312
Harman, H.H., 215, 217, 241, 268
Harrington, D., 297, 321, 397, 398
Harris, C.W., 259, 263
Harris, M.M., 91
Hartmann, N., 22
Harvey, R.J., 303

Hattie, J., 145, 186


Hechter, M., 138
Hedonism, 19, 206, 277, 332, 334, 339, 368
consumers
brand-consciousness, 340
foreign brands, 340
luxury goods consumption, 340
novelty seeking, 339
products symbolic meaning, 339
responsiveness to promotion stimuli,
339
egoism, 333
ethical hedonism, 333
experiencing pleasure, 332
Hedone, 332
hedonic consumption, 334
materialistic satisfaction, 332
selfish gratification, 332
universal hedonism, 333
Hedonism in marketing
emotional arousal, 334
emotive response, 334
feelings, 334
hedonic response, 335
sensory aspects, 334
scents, 334
sounds, 334
tactile impressions, 334
tastes, 334
visual images, 334
Heeler, R.M., 161
Heilbrunn, B., 44
Hempel, C.G., 159
Hendrickson, A.E., 263
Henry, W., 26
Hewitt, J.P., 32
Hierarchical Exploratory Factor Analysis
HEFA, 375
Hilliard, A.L., 23
Hirschman, E.C., 334, 335, 336, 337, 339,
340, 344
Hitlin, S., 30, 32, 115
Hocevar, D., 300
Hoffman, P., 181
Hofstede, G., 18
Holbrook, M.B., 39, 40, 334, 335, 337, 339,
340, 344,

Subject and author index


Holland, R.W., 17, 31
Holzinger, K.J., 217
Homans, G., 26
Homer, P.M., 30
Homoscedasticity, 320, 351
Hotelling, H., 237
Howell, R.D., 146
Hoyle, R.H., 277, 313
Hoyt, C., 193, 194,
Hu, L.T., 294, 296, 325
Hui, C.H., 136
Hulbert, J., 108
Hulin, C.L., 83, 90
Hu, L.T., 294, 296, 325

I
Iacobucci, D., 383
Ihara, Y., 312
Inglehart, R., 33, 120, 122
Ironson, G.H., 91
Item response theory IRT, 72
dichotomous models, 76
multidimensional, 77
unidimensional, 77
one-parametric logistic model, 79
three-parameter logistic model, 81
two-parametric logistic model, 80
Guttman perfect scale model GPSM, 73
item characteristic curve, 74
Mokken model of double monotonicity,
84
invariant item, 84
ordinal specific objectivity, 84
Mokken model of monotone
homogeneity, 84
polytomous models, 88
non-determinate nature, 88
non-monotonic functions, 89
unidimensional models
guessing parameter, 82
item difficulty, 79
item discrimination, 80
log odds, 79
probability, 79
Items analysis
inter-item correlations, 380

483

item-difficulty index, 210


item-discrimination index, 211
item-to-total correlations, 380
Items development
general guidelines
content analysis, 151
critical incidents, 151
direct observations, 151
expert judgment, 151
instruction objectives, 151
review of research, 151
items construction, 151
appropriate item format, 151
monitoring
item writers and quality of the obtained items, 152
selection of item writers, 152
specification, 152
structure and format, 152
essay, 152
responses with possible selection, 152
short-answer open-ened questions,
152
training the item writers, 152
writing the items, 152
items identification, 150
items review
experts, 155
criteria
accuracy, 155
ambiguity, 155
grammar, 155
technical flaws, 155
wording, 155
pilot tests of items, 156
examination of descriptive statistics,
156
examinees reactions, 156
number of examinees, 156
personal judgment, 156
Items preparation, 345
declarative expressions of statements, 345
items semantic modification, 345
reading level of the examinees, 345
reverse of the items, 345
Iwawaki, S., 136
Izard, C.E., 334

484

Subject and author index

J
Jackson, D.L., 323, 325
Jackson, D.N., 235
Jackson, P., 187
Jackson, R.W.B., 180
Jarvis, C.B., 147
Jaworski, B.J., 37
Jennrich, R.I., 361, 363
Johnson, M.K., 29
Jolly, J.P., 118
Jones, R.R., 136
Jreskog, K.G., 216, 234, 235, 242244, 256,
268270, 295, 296, 300, 301, 305, 307, 310,
320, 325, 327, 229, 393, 396, 406, 409

K
Kahle, L.R., 30, 117, 343, 344
Kahneman, D., 337
Kaiser, H.F., 197, 198, 244246, 252, 253,
259, 261
Kano, Y., 235, 294, 324
Kanter, D.L., 35
Kaplan, B.J., 25
Kaplan, D., 313, 324
Kashy, D.A., 391, 392
Katz, D., 30
Katz, E., 339
Kaydos, W.J., 53
Kelley, T.L., 163, 216
Kelloway, K.E., 225
Kelly, G.A., 48, 379
Kendall, M.G., 216
Keninston, K., 136
Kenny, D.A., 289, 391, 392
Kerlinger, F., 96
Kernan, J.B., 122
Kiesler, S., 136
Kim, J.O., 239
Kivetz, R., 338
Kleinplatz, P.J., 121
Kline, P., 49
Kline, R.B., 219, 248, 325, 386, 388, 391, 416
Kluckhohn, C.K.M., 15, 25, 26, 119, 121, 125
Kluegel, J.R., 146
Kohn, M.L., 33

Konarski, R., 306, 386


Knig, C., 28
Koopmans, T.C., 242
Kotler, P., 37
Kowal, J., 105
Kozyra, C., 418
Krantz, D.H., 54
Kristiansen, C.M., 30
Krosnick, J.A., 133, 135
Krus, D.J., 73
Kuder, G.F., 180, 194
Kumar, A., 145
Kyburg, H., 58

L
Lamont, L.M., 22, 35
Lance, C.E., 327
Laurent, G., 340
Law, H.G., 119
Lawley, D.N., 242
Lawshe, C.H., 201
Lazarsfeld, P.F., 339
Leavitt, C., 339
Lee, H.B., 235
Lehmann, D.R., 108
Lembkin, M., 44
Lennox, R., 145
Lepnik, R.B., 242
Leung, K., 326
Levy. S.J., 335, 340
Lewis, C., 84, 299
Lichtenstein, S., 21
Likert, R., 105
Lillis, C.M., 336, 344
Lind, J., 306, 309, 325
Linden, W.J. van der, 112
Lindquist, E.F., 151
Linearity, 320
Livingstone, S., 25
Loehlin, J.C., 235, 247, 248, 250
Loevinger, J., , 86
Loewenstein, G., 338
Lomax, R., 209
Lonsdale, R.T., 43
Lord, F.M., 53, 64, 73
Lorr, M., 119

Subject and author index


Luce, R.D., 47, 104
Lumsden, J., 91
Lunt, P., 25

M
MacCallum, R.C., 235, 271, 272, 273, 276,
306, 313, 324, 325, 397
MacKay, D.B., 204
MacKenzie, S.B., 147
Macnab, D., 119
MacReady, G.B., 73
Madansky, A., 247
Madow, W.G., 264
Magnusson, D., 66, 166, 173, 174, 176, 181,
182, 190, 205
Mahoney, J., 18
Maio, G.R., 17, 29, 30, 31
Malhotra, N.K., 142
Mandler, G., 17
Mano, H., 337
Mariaski, J., 25
Marini, M.M., 17
Marketing scales, 426
consumers
attitudes about satisfaction, 427
information processing, 426
involvement, 426
personal values, 426
reactions to advertising stimuli, 426
traits and individual differences, 426
Markin, R.J., 336, 344
Marsh, H.W., 209, 299, 325
Martin, I.M., 219
Maslow, A., 22, 44, 117, 122
Mathens, H., 236
Maxwell, S.E., 330
Mazurek-opaciska, K., 43
McAlister, L., 339
McCammon, R.B., 260
McCarty, J.A., 134
McDonald, R.P., 77, 227, 256, 273, 274, 275,
299, 300, 309, 322, 325, 225, 360
McKeon, J.J., 260
McKinley, R.L., 77
McKnight, P.E., 324
McNemar, Q., 250

485

Measurement and mathematical model, 49


psychological measurement tests, 48
Measurement definition, 50
Measurement history, 47
Measurement invariance
full vs. partial invariance, 327
model
configural invariance, 329
factor variance-covariance invariance,
330
latent means invariance, 331
measurement errors invariance, 331
metric invariance, 329
scalar invariance, 330
notion, 325
testing
more vs. less restricted models, 327
setting cross-group constraints, 327
testing conditions
hierarchy, 328
within-society research, 327
Measurement invariance and cross-cultural
studies, 326
Measurement theory, 47
Measurement types
arbitrary, 52
based on indicators, 52
derivative, 52
extensive, 51
fundamental, 52
intensive, 51
relative, 52
standardized, 52
Meehl, P.E., 143, 163, 164, 202, 205, 330
Meer, D., 44
Mendras, H., 25
Meredith, W., 329
Merrens, M.R., 136
Messick, S., 163
Meyers, L.S., 322
Micceri, T., 320
Michell, J., 53, 58, 59, 94
Millsap, R.E., 49, 328
Missing data
expectation maximization, 324
listwise deletion, 324
mean substitution, 324

486

Subject and author index

multiple imputation, 324


pairwise deletion, 324
Modianos, D., 336, 344
Mokken, R.J., 64, 73, 74, 83, 84, 85, 86, 91
Molenaar, I.W., 87
Monroe, K.B., 340
Montanelli, R.G., 253
Morgenstern, J., 56
Morris, C.W., 119
Morrison, D.G., 141
Moschis, G.P., 127, 343
Mosier, C.I., 199
Mothersbaugh, D.L., 203
Motivation, 28
Mueller, C.W., 239
Mueller, R.O., 385
Muerle, J.L., 264
Mulaik, S.A., 220, 226, 235, 272, 273, 274,
275, 301, 305, 225, 394
Multi group confirmatory factor analysis,
325
Multivariate normality, 320
kurtosis, 320
outliers, 321
machalanobis distance, 322
univariate vs. multivariate, 322
skewness, 320
skewness and kurtosis index, 321
Munson, J.M., 45
Murphy, B.D., 136
Muthn, B., 295, 296, 324, 327
Myers, J.G., 338
Myers, J.H., 44
Mysakowski, Z., 25

N
Narayana, C.L., 336, 344
Narens, L., 47, 54, 55
Need, 28
Nelson, E., 105
Nering, L.M., 89
Netemeyer, R.G., 73, 141, 142, 195, 196, 203,
214, 101, 339, 299, 320, 329
Neumann, J. von, 56
Nevitt, J., 324
Newman, B.I., 37, 340

Niederee, R., 56
Niedrich, R.W., 107
Noll, G.A., 119
Norman, R.P., 136
Novick, M.R., 64, 66, 71
Nowakowska, M., 50, 67, 70, 71, 97, 169
Nunnally, J.C., 49, 139, 183, 185, 186, 271

O
OGuinn, T.C., 122
Ok, J., 240, 255
Oliver, R.L., 337, 340
Olson, J.M., 29, 30
OMalley, P.M., 136
Omoto, A.M., 29
Osborne, J.W., 236
OShaughnessy, N., 332
Ossowski, S., 26
Ostasiewicz, W., 49, 51, 63, 107
Ostini, R., 89
Oswald, F.L., 327
Ovadia, S., 132

P
Parameswaran, R., 161
Park, C.W., 203
Parsons, C.K., 90, 91
Parsons, T., 39
Pasamanick, B., 119
Payne, S., 133
Peacock, A.C., 235
Pearcy, D., 219
Peay, E.R., 18
Pecheux, C., 219
Pessemier, E., 339
Peter, J.P., 133, 139, 195, 379
Petrinovich, L.F., 103
Philips, D., 119, 120
Piers, S., 28
Ployhardt, R.E., 327
Plumer, J.T., 44
Pociecha, J., 42
Podsakoff, P.M., 146
Point-biserial correlation, 211
Popham, W.J., 152
Potthast, M.J., 296

Subject and author index


Powell, D.A., 324
Preferences, 21
Preferences scales
notion, 106
types
comparative scale with representative
point, 106
constant sum scale, 106
ranking scale, 106
Prelec, D., 338
Prentince, D.A., 33
Presser, S., 133
Principal component analysis PCA
components extraction, 236
eigenvalue-eigenvector decomposition, 236
maximum variance, 235
proportion of explained variance on component, 239
unities in the diagonal, 236
variance-oriented, 235
Principal component analysis PCA vs.
factor analysis FA, 234
Pritchard, R., 28
Prymon, M., 42
Psychometrics, 426

R
Rajaratnam, N., 197
Rajaratnan, N., 71
Raju, N.S., 75
Raju, P.S., 339
Ramsey, P.H., 247
Rao, C.R., 197, 245, 246
Rao, V.R., 109
Rasch, G., 59, 73, 77, 79, 80, 83, 84, 86, 88,
93, 94
Rating-ranking scales in values
measurement
comparison
attitudes and preferences, 134
differentiation and non-differentiation,
135
independence, 133
order of importance, 133
simplicity in response system, 135
statistical analysis, 135

487

Ravi, D., 337


Raykov, T., 209, 320, 418
Ray, M.L., 161
Reckase, M.D., 77, 91
Reilly, M.D., 224
Reilly, T., 291
Reise, S.P., 72, 78, 80, 82
Reliability
comparison of methods
alternate test forms, 179
internal consistency, 180
parallel tests, 178
test-retest, 178
estimation
alpha reliability coefficient, 171
coefficients
equivalence, 178
stability, 178
stability and equivalence, 179
correction for attenuation, 168
errors
standard error of estimation, 178
standard error of measurement, 176
homogeneity and heterogeneity of
group, 173
linear squared correlation coefficient,
167
methods
KR-20 and KR-21, 191
parallel-test, 184
split-half
balanced and random halves, 190
first-half and last-half, 189
odd-even form, 189
test-retest, 181
more items added, 383
notion, 161
random measurement error, 162
repeated measurements, 162
Reliability level, 424
evenness in scaling, 424
guessing, 424
interdependence of measured items, 424
number of items, 424
range of item difficulty, 424
Reliability of the measurement instruments
in marketing, 161

488

Subject and author index

Rettig, S., 119


Revelle, W., 140
Reyment, R., 229, 234, 235, 256, 268, 269
Reynolds, T.J., 118
Richardson, M.W., 180, 191, 192, 194
Richins, M.L., 126, 336, 344
Robertson, D.H., 161
Robertson, T.S., 339
Robinson, J.P., 53, 113, 114, 119
Rogers, H.J., 83
Rohan, M.J., 138
Rohatyn, D., 333
Rokeach, M., 12, 18, 36, 15, 16, 17, 22, 29,
31, 115, 344
Rokeachs values typology, 18
Roman, J.B., 25
Rook, D.W., 336
Rorer, J.G., 121, 136
Rosenbaum, P.R., 75
Rosenberg, M., 203
Rossiter, J., 49, 139, 140, 146, 157, 427
Rost, J., 77
Roussos, L., 77
Roznowski, M., 397
Rszkiewicz, M., 43, 62, 144
Rubik, H., 242
Rumiski, A., 25
Rummel, R.J., 245
Rushton, J.P., 120
Rusnak, Z., 247
Rutkowski, I., 42

S
Sackett, R.P., 91
Sagan, A., 23, 34, 61, 326
Salzberger, T., 73
Sampling error, 314
Sampson, P.F., 363
Sanders, K.R., 31
Saris, W.E., 313
Satorra, A., 294, 295, 296, 313, 324
Satterthwaite, F.E., 297
Sauer, P.I., 418
Scale
application in measurement of constructs
single vs. multi-items, 139

characteristics
absolute zero, 98
distinctiveness, 98
equal intervals, 98
ordering in magnitude, 98
definition, 98
levels of measurement, 98
absolute, 100
interval, 99
nominal, 98
ordinal, 98
ratio, 99
multi-items
process of development, 141
dimensionality
factor analysis, 140
Revelles coefficient beta, 140
unidimensional vs. multidimensional scales, 143
empirical indicators/items, 143
measurement system, 144
operationalisation, 143
theory and construct definition, 143
transformation
linear, 101
monotonic, 101
multiplication, 101
Scaling
general rule, 96
technique and type of scale, 97
theory, 96
types of scales
comparative scale
preference and similarity, 114
Thurstone, 114
cumulative scale, 111
AIDA scheme, 112
conditions, 111
Guttmans scalogramm, 111
reproducibility coefficient, 112
response categories, 112
summated scale, 107
balanced categories, 109
even number of alternatives, 109
forced-choice, 109
guidelines, 108
Likert, 110

Subject and author index


non-forced, 109
odd number of alternatives, 109
scale points, 108
stages, 110
statements, 107
unbalanced categories, 109
Schacter, S., 334
Schaefer, J.L., 324
Schafer, W.D., 324
Schater, D., 28
Scherer, K.R., 21
Schneeweiss, H., 236
Schooler, C., 33, 114
Schuman, H., 30, 133
Schwartz, S.H., 12, 19, 21, 22, 29, 30, 64, 18,
28, 31, 33, 115, 16, 117, 332
Schwartzs values typology, 18
Scott, J.E., 22
Scott, W.A., 119
Shafir, E., 338
Shapira, Z., 107
Sharma, S., , 101, 142, 195, 196, 203, 214,
107, 329, 299
Sharot, T., 21
Shaughnessy, J., 332
Shavelson, R.J., 327
Sheth, J.N., 37, 336, 340
Shrum, L.J., 134
Schuur, W.H. van, 73, 84, 87, 88
Siegel, S., 103
Sijtsma, K., 87
Simmons, C.B., 343
Simonson, I., 338, 339
Singer, J.E., 334
Skinner, B.F., 25, 31
Smart, M.S., 121
Smart, R.C., 121
Smith, J.B., 41
Smith, K.W., 199
Smith, M.B., 30
Snook, S.C., 235
Snyder, M., 29
Sobel, M.E., 311
Social sciences
psychology and marketing research, 61
research problems, 61
versus natural and technical sciences, 61

489

Social sciences measurement model theoretical and technical knowledge, 61


Social sciences measurement strategy
direct methods, 63
indirect method, 62
Software, 347
AMOS, 347
SPSS, 347
Statistica, 347
Srbom, D., 224, 300, 301, 305, 307, 310,
320, 325, 331, 393, 401, 406, 409
Soutar, G.N., 40
Spangenberg, E.R., 344
Spearman, C., 48, 65, 92, 171, 180, 187, 188,
190, 203, 213, 217, 383
Spector, P.E., 110, 111, 141, 143, 145, 218,
380, 383
Sproull, L.S., 136
Standard errors, 314
Statistical theory of estimation, 177
Staub, E., 29
Steenkamp, J.E.M., 327
Stegelmann, W., 77
Steiger, J.H., 235, 301, 306, 309, 325
Stevens, S.S., 48, 50, 53, 55, 63, 97, 98, 102
Stewart, D.W., , 239
Stine, W.W., 53
Stobiecka, J., 49, 50, 51, 52, 95, 106
Stone, M.A., 83
Stouffer, S.A., 113
Stout, W.F., 77
Structural equations modeling SEM,
275
endogenous variable, 276
exogenous variable, 276
LInear Structural RELationships
LISREL, 276
Sudman, S., 133
Sugawara, H.M., 272, 306, 313, 324
Summers, J., 204
Suppes, P., 57, 100
Suziedelis, A., 119
Swaminathan, H., 83
Sweeney, J.C., 40
Symonds, P.M., 424
Sztemberg-Lewandowska, M., 236
Szymaski, M.J., 22

490

Subject and author index

T
Tabachnick, B.G., 371
Tajfel, H., 32
Tanaka, J.S., 319
Tanaka, Y., 77
Tanur, J.M., 133
Tarkkonen, L., 162, 215
Tasaki, L., 122
Tatarkiewicz, W., 23
Tataryn, D.J., 250
Taylor, P.W., 23
Teng, G., 186
Test
conditions, 68
criteria, 64
definition, 64
essentially-tau-equivalent, 70
measurement model, 66
observed score, 66
true score, 66
parallel, 69
tau-equivalent, 70
usefulness, 65
Thaler, R.H., 337, 338
Theories of measurement
classical theory of measurement, 58
role of error in measurement, 58
operational measurement theory
operational abstracts, 56
operationalism, 56
underlying reality, 56
representational measurement theory
automorphisms, 55
homomorphism, 54
isomorphism, 54
Theory of measurement vs. theory of
measure, 49
Thoits, P.A., 32
Thomas, W.J., 105
Thompson, B., 277, 316, 355
Thompson, M.S., 327
Thorndike, R.L., 65
Thurstone, L.L., 90, 97, 105
Tonesk, X., 119
Triandis, H.C., , 121
Tucker, L.R., 299

Tukey, J.W., 104


Turner, J.C., 32
Tversky, A., 99, 337, 338

U
Unger, L., 122
Unstandardized Alpha Coefficient, 196
Utilitarian values, 335, 336, 337

V
Validation
content validation, 344
exploratory face validation, 344
Validity
argumentation, 163
notion, 163
reasons, 163
theory, 163
types
construct validity, 204
content validity, 200
adequate selection of items, 201
exploratory research, 201
face validity, 200
logical validity, 200
convergent validity, 203
discriminant validity, 203
pragmatic-criterion validity, 200
concurrent validity, 200
predictive validity, 200
validation methods and
change of scores in time, 206
correlation between construct and designated, 207
factorial validity, 207
group differences, 206
multi-trait-multi-method, 208
Validity of the measurement instruments in
marketing, 161
Value sysyems, 115
Values
beliefs, 16
cognitive system, 15
hierarchy, 22
central-peripheral dimension, 22

Subject and author index


market segmentation
AIO, 44
models, 43
objective information, 43
personal values, 44
process, 42
subjective information, 43
VALS, 44
motivational goal, 16
personal values vs. product, 35
system of values, 15
Values abstraction levels
domain-specific values, 36
evaluation of product attributes, 36
global values, 36
Values and
attitudes, 29
behavior, 30
motivation, 28
needs, 28
personal identity, 32
value-identities, 32
Values and theory of value utility, 34
Values comparison
axiology, 23
consumers personal vs. product values, 35
cultural anthropology, 26
economics, 27
objective theory, 27
subjective theory, 27
psychology, 24
sociology, 24
structure of relations, 225
Values instrumental, 16
Values studies in marketing
cognitive structure, 36
consumer dissatisfaction, 36
cross-cultured consumption patterns, 36
differential product preferences, 36
life style, 36
market segmentation potential, 36
Values terminal, 16
Vandenberg, R.J., 327
Variable vs. trait, 96
Veblen, T., 340
Vehkalahti, K., 162
Velicer, W.F., 235, 250, 271, 272, 312

Velleman, P.F., 103


Verplanken, B., 17, 31
Vijver, F.J.R. van de, 326
Vigneron, F., 340
Vinson, D.E., 22, 35, 36, 45, 133
Virshup, L.K., 32
Vispoel, V.P., 186
Voss, K.E., 344
Vyncke, P., 44

W
Wade, W.B., 343
Wainer, H., 90
Wakefield, K.L., 337, 339
Walesiak, M., 35, 62
Walesiak, W., 106
Walton, J., 339
Wart, P.B., 136
Waters, M.C., 138
Watson, D., 140, 141, 380
Weathers, D., 107
Wegener, D.T., 242
Weiss, D.J., 73
Wells, W.D., 122
Wertenbroch, K., 337
Werts, C.E., 224
Wesman, A.G., 150
Westbrook, R.A., 340
West, S.G., 294, 296, 322, 324
White, P.O., 263
Whitely, S.E., 77
Widaman, K.F., 145, 221, 235, 272
Wilcox, J.B., 146
Wilkinson, L., 103
Wille, R., 73
Williams, L.J., 29
Williams, R.M., 22
Wilson, E.B., 240
Wilson, M., 77, 179
Winklhofer, H.M., 145, 147, 149
Wojciszke, B., 31
Wood, J.M., 250
Wood, W., 29
Woodall, T., , 37
Woodruffe, H., 336
Worcester, J., 240

491

492

Subject and author index

Wright, B.D., 47, 83


Wrzosek, W., 42
Wyer, R.S. Jr., 136

Y
Yankelovich, D., 43
Yen, W.M., 68, 172, 188
Young, M., 418
Youth market
excessive style of consumption, 342
leisure activities, 420
modern individuality, 342
narcissism, 342
prospects
frequency of shopping, 343
in-store promotions, 344
interest in commercials, 344
interest in new products, 344

product expertise, 343


tv commercials and direct selling, 344
Yuan, K.H., 321, 324
Yule, G.U., 114, 256

Z
Zagra-Jonszta, U., 34
Zaichkowsky, J.L., 141
Zanna, M.P., 30
Zax, Z., 136
Zegers, E.F., 180
Zeller, R.A., 162, 163, 165, 180
Zhang, W., 297
Zinnes, J.L., 54, 100
Znaniecki, F., 105
Zuckerman, M., 339
Zuse, H., 51

Das könnte Ihnen auch gefallen