Sie sind auf Seite 1von 6

Sample size in factor analysis: why size matters

Helen C Lingard and Steve Rowlinson

Abstract
Factor analysis is a powerful and often-used technique in construction management and
real estate research. Although the process is relatively straightforward there are certain
rules in relation to data and sample size which must be considered in the analysis.
Small samples and low N:p ratios can lead to eroneous conclusions being drawn and the
strength of the data should be considered in such circumstances The use of factor
analysis should be carefully justified in order that research can be considered to be
rigorous, replicable and of high quality

Introduction
Factor analysis is one of the most commonly used methods for data reduction in social
science research. Factor analysis assumes that underlying dimensions or factors can be
used to explain complex phenomena. The goal of factor analysis is to identify not-
directly-observable factors based on a larger set of observable or measurable indicators
(variables). Norusis (1993) describes the process of factor analysis as follows:
1. The first step in factor analysis is to produce a correlation matrix for all variables.
Variables that do not appear to be related to other variables can be identified from
this matrix.
2. The number of factors necessary to represent the data and the method for
calculating them must then be determined. Principal component analysis 1 (PCA)
is the most widely used method of extracting factors. In PCA, linear combinations
of variables are formed. The first principal component is that which accounts for
the largest amount of variance in the sample, the second principal component is
that which accounts for the next largest amount of variance and is uncorrelated
with the first and so on. In order to ascertain how well the model (the factor
structure) fits the data, coefficients called factor loadings that relate variables to
identified factors, are calculated.
3. Factor models are then often rotated to ensure that each factor has non-zero
loadings for only some of the variables. Rotation makes the factor matrix more
interpretable.
4. Following rotation, scores for each factor can be computed for each case in a
sample. These scores are often used in further data analysis.

1
Factor analysis is used to discover patterns in the relationships amongst variables and
enables reduction of the number of variables into factors combined from these variables.
Principal component analysis (PCA) is a statistical technique which is used to replace a
large set of variables by a smaller set of variables which is the best representation of the
larger set. PCA is the most commonly used method for extracting factors in factor
analysis. The critique presented in this note applies to PCA but should generalize to other
extraction methods used in factor analysis.
For it to be robust, factor analysis requires the factor pattern to be stable, i.e. the solution
can be re-produced in different samples and accurately produce the true population
structure. This is a very important issue and the reason for this note.

In recent years, the number of examples of factor analysis in construction management


research has grown. However, in the social sciences there is continued discussion about
how large a sample is needed for meaningful factor analysis and the issue is by no means
resolved. Now, as it is based upon correlation matrices, factor analysis is subject to
sampling error associated with small samples. Hence, it is important that the construction
management research community understand the importance of sample size when using
factor analysis so as to prevent researchers from drawing erroneous conclusions as a
result of failing to recognize the problems associated with using factor analysis in small
samples.

In this note we highlight some of the problems associated with using factor analysis in
small sample studies and present an analysis of sample sizes in research published in
construction management journals, in which researchers have utilized factor analysis. In
doing so we draw attention to the fact that, in many instances, construction management
researchers use factor analysis without giving due consideration to the question of
whether their samples are sufficiently large to yield meaningful results. Lastly, we make
some recommendations for the design and planning of construction management research
to ensure factor analysis is used appropriately.

Sample size recommendations


A wide range of recommendations regarding sample size in factor analysis have been
made. These are usually stated in terms of either the minimum sample size (N) for a
particular analysis or the minimum ratio of N to the number of variables, p i.e. the
number of survey items being subjected to factor analysis (MacCallum et al 1999).
Gorsuch (1983) recommended five subjects per item, with a minimum of 100 subjects,
regardless of the number of items. Guilford (1954) argued that N should be at least 200,
while Cattell (1978) recommended three to six subjects per item, with a minimum of 250.
Comrey and Lee (1992) provided the following guidance in determining the adequacy of
sample size: 100= poor, 200 = fair, 300 = good, 500 = very good, 1,000 or more =
excellent. More demanding recommendations for sample size require a minimum of 10
subjects per item (Everitt 1975) or just a large sample, ideally several hundred (Cureton
& DAgostino, 1983). Before going further, it is useful to discuss the effect of size.

Problems arising as result of small samples in factor analysis


Small samples present problems due to various forms of sampling error, which can
manifest itself in factors that are specific to one data set. This bias limits the extent to
which data is representative of a larger population and generates factor structures which
elude replication. These rogue factors can, for example, occur as a result of unique
patterns of responding to a single survey question. Another problem associated with
small samples in factor analysis is the splintering of factors into smaller groupings of
items that really constitute a larger factor. Costello and Osborne (2005) empirically tested
the effect of sample size on the results of factor analysis reporting that larger samples
tend to produce more accurate solutions. Only 10% of samples with the smallest N:p
ratios (2:1) produced correct solutions. A solution was deemed to be correct if it was
identical to the solution derived from the total population. In contrast, 70% of the samples
with the largest N:p ratio (20:1) produced correct solutions. Costello and Osborne (2005)
also report that the number of misclassified items was also significantly affected by the
size of a sample. In the smallest samples, almost two out of thirteen items on average
were misclassified (i.e. found to belong to the wrong factor). Lastly, Costello and
Osborne (2005) report that two extreme problems in factor analysis, i.e. the Heywood
effect (in which the impossible outcome of factor loadings greater than 1.0 emerge) and
the failure to produce a solution, were only observed in small samples. The failure to
produce a solution occurred in almost one third of analyses in the smallest sample size
category.

The problems associated with rogue factors, splintered factors and/or misclassified items
usually only become evident when data collected from a sufficiently large and
representative sample is factor analysed. However, in many cases these problems are
either never discovered or are only discovered once an initial factor analysis has
produced misleading results.

MacCallum et al (1999) suggest that increasing the sample size is one means of
overcoming these problems. They argue that, as the sample size increases, sampling error
is reduced, factor analysis solutions become more stable and more reliably produce the
factorial structure of the population (MacCallum et al 1999).

Method
A search for research studies that reported using some form of factor analysis or principal
components analysis in the construction management literature between 2000 and 2005
was performed. Journals searched and search terms are provided in Appendix 1. A total
of 31 published articles were identified. Only studies in which the number of subjects and
the number of items were analyzed (31 in total). For each study, the subject to item ratio
(N:p) was calculated. Results are presented in Table 1. The N:p ratio was used rather than
the absolute sample size because Osborne and Costello (2004) report that N:p ratio is
consistent predictor of stability in factor structures, the occurrence of Type 1 errors and
the correctness of factor structures. Further, Osborne and Costello (2004) report a
relative lack of unique impact of the absolute number of subjects (N) after the N:p ratio
was accounted for. Table 1 indicates that nearly 60% of the studies had an N:p ratio of
less than 5 and Table 2 indicates that 70% of studies had N less than 100.
Table 1: Current practice in factor analysis in real estate & construction
management research
Subject to item ratio % of studies Cumulative % No. of articles
2:1 or less 35.48 35.48 11
>2:1 5:1 22.58 58.06 7
>5:1 10:1 29.03 87.09 9
>10:1 20:1 9.68 96.77 3
>20:1 100:1 3.23 100.00 1
>100:1 0.00 100.00 0

Table 2: Sample size and the number of articles reviewed


N = sample size No. of articles % of studies Cumulative %
30 or less 1 3.23 3.23
31 to 60 8 25.81 29.04
61 to 99 13 41.93 70.97
100 to 200 6 19.35 90.32
201 to 300 2 6.45 96.77
301 or above 1 3.23 100.00
Total 31

Discussion

The widely varying rules of thumb relating to sample size in factor analysis present a
problem for researchers who want simple and definitive guidelines about how big a
sample must be to produce meaningful factor analysis results. It is fair to say that no
absolute rules can exist. MacCallum et al (1999) suggest that definitive recommendations
regarding sample size in factor analysis are based upon the misconception that the
minimum sample or N:p ratio for meaningful factor analysis is invariant across studies.
Rather, MacCallum and his colleagues suggest that the minimum sample size depends
upon the nature of the data itself, most notably its strength. Strong data is data in which
item communalities 2 are consistently high (in the order of .80 or above), factors exhibit
high loadings on a substantial number of items (at least three or four) and the number of
factors is small.

Empirical evidence supports the argument that sample size is less important where data
are sufficiently strong. For example, in an empirical analysis of data originally
published by Guadagnoli and Velicer (1988), Osborne and Costello (2004) found that
sample size had less of an impact in factor analysis when there were fewer variables
(items) and that both N and N:p had a larger effect on the goodness of a factor analysis

2
Communalities explain the amount of variance accounted for by each factor
when item loadings were small. Similarly, MacCallum et al (1999) report that, when data
are strong, the impact of sample size is greatly reduced. Under these conditions,
MacCallum et al (1999) conclude that factor analysis can produce correct solutions, even
with samples that would traditionally have been determined to be too small for
meaningful factor analysis.

However, one caveat to this assertion is that, as Costello and Osborne (2005) note,
uniformly high item communalities are unlikely to occur in real data and that more
common magnitudes in social science research are in the order of .40 to.70. As
communalities become lower, the size of the sample has a greater impact upon factor
analysis outcomes. Also, when dealing with empirical data, it is rare to observe item
loadings of 0 or .60. In social science research, moderate and weak item loadings ranging
from .30 to .50 are the norm. Thus, in construction management research, it would be rare
for data to be of sufficient strength to justify the use of factor analysis in small samples.

Conclusions
The general implication of this note is that construction management researchers need to
be more conscious of the impact of sample size when using factor analysis. Our analysis
reveals that researchers in the construction management discipline frequently apply factor
analysis to small sample datasets, without considering the consequences. The likely result
is that frustrating, confusing and misleading results emerge and erroneous conclusions are
drawn. It is critically important that the construction management research community
becomes mindful of when factor analysis should be used and under which circumstances
it is permissible to use factor analysis in small samples. While definitive rules of thumb
for sample size in factor analysis are probably inappropriate, construction management
researchers need to carefully consider expectations about the strength of their data when
determining the size of their sample. Datasets with large numbers of variables (i.e survey
questions) and/or the expectation that large number of factors will emerge should be
avoided unless an extremely large sample is likely to be achieved. Conversely, the use of
factor analysis in small samples must be carefully considered and explicitly defended in
terms of the strength of the data.

References
Cattell, R. B. (1978), The Scientific Use of Factor Analysis. New York: Plenum.
Comrey, A. L. and Lee, H. B., (1992), A first course in factor analysis, Hillsdale, New Jersey: Erlbaum.
Costello, A. B. & Osborne, J. W., (2005), Best practices in exploratory factor analysis: four
recommendations for getting the most from your analysis, Practical Assessment, Research &
Evaluation, 10, (7). http://pareonline.net/getvn.asp?v=10&n=7
Cureton, E. E. & D'Agostino, R. B. (1983). Factor Analysis: An Applied Approach. Hillsdale, NJ: Erlbaum.
Everitt, B. S., (1975), Multivariate analysis: the need for data and other problems, British Journal of
Psychiatry, 126, 237-240.
Gorsuch, R. L. (1983), Factor Analysis (2nd. Ed). Hillsdale, NJ: Erlbaum.
Guadagnoli, E. & Velicer, W. F., (1988), Relation of sample size to the stability of component patterns,
Psychological Bulletin, 103, 265-275.
Guilford, J. P., (1954), Psychometric methods, 2nd edition, New York: McGraw Hill.
MacCallum, R. C., Widaman, K. F., Zhang, S. & Hong, S., (1999), Sample size in factor analysis,
Psychological Methods, 4, 84-99.
Osborne, J. W. and Costello, A. B., (2004), Sample size and subject to item ratio in principal components
analysis, Practical Assessment, Research & Evaluation, 9 (11)
http://PAREonline.net/getvn.asp?v=9&n=11
Velicer, W. F. & Fava, J. L., (1985), Effects of variable and subject sampling on factor pattern recovery,
Psychological Methods, 3, 231-251.

Acknowledgement: The authors wish to recognise the contributions made in the production of this note by
Yip, L.P.B., Barima, O., Tuuli M.M.and Koh T.Y. of Dept REC, HKU

Appendix 1: Journals searched and search terms

Journals searched: Construction Management & Economics (16 articles)


Journal of Construction Engineering & Management (5
articles)
Journal of Professional Issues in Engineering Education &
Practice (1 article)
Journal of Management in Engineering (2 articles)
Engineering, Construction, & Architectural Management (6
articles)
International Journal of Service Industry Management (1
article)
Structural Survey (2 articles)
International Journal of Quality & Reliability Management (1
article)
Journal of Property Investment & Finance (1 article)
Engineering, Construction & Architectural Management (1
article)
HKIE Transactions (1 article)
(Total 31 articles)
Keyword: Factor analysis, principal components analysis, construction.
Years searched: 2000 to 2005.

Das könnte Ihnen auch gefallen