Beruflich Dokumente
Kultur Dokumente
3.1
Thursday, 13 July 2017 3:01 PM
Correspondence Analysis
Correspondence analysis (CA) may be defined as a
special case of Principal Components Analysis (PCA) of
the rows and columns of a table, especially applicable
to a cross-tabulation.
Over the past few decades correspondence analysis has gained an international reputation as a
powerful statistical tool for the graphical analysis of contingency tables. This popularity stems
from its development and application in many European countries, especially France, and its use
has spread to English speaking nations such as the United States and the United Kingdom. Its
growing popularity amongst statistical practitioners, and more recently those disciplines where
the role of statistics is less dominant, demonstrates the importance of the continuing research
and development of the methodology.
The aim of this paper is to highlight the theoretical, practical and computational issues of simple
correspondence analysis and discuss its relationship with recent advances that can be used to
graphically display the association in two-way categorical data.
3.5
Correspondence Analysis
Correspondence Analysis Applied to Psychological
Research
P.M. Yelland
Giovanni Di Franco
1 Liberal
2 Tend Lib
3 Moderate
4 Tend Cons
5 Conservative
This document is loosely based on SPSS 10; Correspondence
Analysis Output, Faculty of Social and Behavioural Sciences,
Leiden University, Leiden-Netherland. 3.11
Correspondence Analysis
The data summarises individuals political
affiliation (1,,5) and geographic region (1,,4) .
1 Northeast
2 Midwest
3 South
4 West
3.12
Correspondence Analysis
The data (a) summarises individuals political
affiliation (1,,5) and geographic region (1,,4) .
725 individuals,
so 725 rows of
data
3.13
Correspondence Analysis
Analyze > Dimension Reduction > Correspondence Analysis
3.14
Correspondence Analysis
Select row/column variables. And define the ranges.
3.19
Correspondence Analysis
Select Plots
3.20
Correspondence Analysis
Syntax
CORRESPONDENCE
TABLE = region4(1 4) BY politics(1 5)
/DIMENSIONS = 2
/MEASURE = CHISQ
/STANDARDIZE = RCMEAN
/NORMALIZATION = SYMMETRICAL
/PRINT = TABLE RPOINTS CPOINTS RPROFILES CPROFILES RCONF CCONF
/PLOT = NDIM(1,MAX) BIPLOT(20) RPOINTS(20) CPOINTS(20) TRROWS(20)
TRCOLUMNS (20) .
3.21
Correspondence Analysis
The Correspondence Table is simply the cross-
tabulation of the row and column variables,
including the row and column marginal totals,
serving as input.
Politic al Outlook
Region Liberal Tend Lib Moderate Tend Cons Conservative Ac tive Margin
Northeast 19 23 58 16 15 131
Midwest 26 31 71 47 35 210
South 18 27 75 46 70 236
W est 30 19 40 26 33 148
Ac tive Margin 93 100 244 135 153 725
3.22
Correspondence Analysis
The Row Profiles are the cell contents divided by their
corresponding row total (eg. 19/131=0.145 for the first
cell). This table also shows the column masses (column
marginals as a percent of n) (eg. 93/725=0.128). These
are intermediate calculations on the way toward
computing distances between points. Note the column
of 1s.
Row Profiles
Politic al Outlook
Region Liberal Tend Lib Moderate Tend Cons Conservative Ac tive Margin
Northeast .145 .176 .443 .122 .115 1.000
Midwest .124 .148 .338 .224 .167 1.000
South .076 .114 .318 .195 .297 1.000
W est .203 .128 .270 .176 .223 1.000
Mass .128 .138 .337 .186 .211
3.23
Correspondence Analysis
Column Profiles are the cell elements divided by
the column marginals (ex. 19/103=0.204). This
table also shows the row masses (row marginals as
a percent of n) (ex. 131/725=0.181). These are
intermediate calculations on the way toward
computing distances between points. Note the row
of 1s.
Colum n Profi les
Politic al Outlook
Region Liberal Tend Lib Moderate Tend Cons Conservative Mass
Northeast .204 .230 .238 .119 .098 .181
Midwest .280 .310 .291 .348 .229 .290
South .194 .270 .307 .341 .458 .326
W est .323 .190 .164 .193 .216 .204
Ac tive Margin 1.000 1.000 1.000 1.000 1.000 3.24
Correspondence Analysis
In the Summary table, we first look at the
chi-square value and see that it is significant,
justifying the assumption that the two
variables are apparently related.
Summ ary
Confidence Singular
Proportion of Inertia Value
3.25
Correspondence Analysis
SPSS has computed the interpoint distances and
subjected the distance matrix to principal
components analysis, yielding in this case three
dimensions.
Summ ary
Confidence Singular
Proportion of Inertia Value
3.26
Correspondence Analysis
Only the interpretable dimensions are reported, not the
full solution, which is why the eigen values add to
something less than 100% (labelled Inertia; these are the
percent of variance explained by each dimension) - in this
case only 0.057 = 5.7%. This reflects the fact that the
correlation between region and political outlook, while
significant, is weak.
Summ ary
Confidence Singular
Proportion of Inertia Value
3.27
Correspondence Analysis
The eigen values (called inertia here) reflect the relative
importance of each dimension, with the first always being
the most important, the next second most important, etc.
Summ ary
Confidence Singular
Proportion of Inertia Value
3.28
Correspondence Analysis
The singular values are simply the square roots of the
eigen values. They are interpreted as the maximum
canonical correlation between the categories of the
variables in analysis for any given dimension.
Summ ary
Confidence Singular
Proportion of Inertia Value
3.29
Correspondence Analysis
Note that the "Proportion of Inertia" columns are the
dimension eigen values divided by the total (table) eigen
value. That is, they are the percent of variance each
dimension explains of the variance explained: thus the
first dimension explains 62.7% of the 5.7% of the
variance explained by the model.
Summ ary
Confidence Singular
Proportion of Inertia Value
Summ ary
Confidence Singular
Proportion of Inertia Value
3.31
Correspondence Analysis
Keyword interpretations
Inertia: Variance
3.32
Correspondence Analysis
Contribution of points to dimensions: as factor loadings
are used in conventional factor analysis to ascribe
meaning to dimensions, so "contribution of points to
dimensions" is used to intuit the meaning of
correspondence dimensions.
3.33
Correspondence Analysis
The Overview Row Points table, for each row point in the
correspondence table, displays the mass, scores in
dimension, inertia, contribution of the point to the inertia
of the dimension, and contribution of the dimension to
the inertia of the point.
Overview
OverviewRow Pointsaa
RowPoints
3.34
Correspondence Analysis
The Overview Column Points table is similar to the
previous one, except for the column variable (party
rather than region) in the correspondence table.
Overview Column aa
Overview Col umnPoints
Points
3.35
Correspondence Analysis
The Confidence Row Points tables display the standard
deviations of the row scores (the values used as
coordinates to plot the correspondence map) and are used
to assess their precision.
Standard Deviation in
Dimension Correlation
Region 1 2 1-2
Northeast .190 .307 .528
Midwest .169 .323 .066
South .122 .206 -.685
West .339 .148 -.026
3.36
Correspondence Analysis
The Confidence Column Points tables display the standard
deviations of the column scores (the values used as
coordinates to plot the correspondence map) and are used
to assess their precision.
Standard Deviation in
Dimension Correlation
Political Outlook 1 2 1-2
Liberal .387 .221 -.694
Tend Lib .072 .117 .801
Moderate .171 .122 .575
Tend Cons .215 .406 .095
Conservative .127 .302 .304
3.37
Correspondence Analysis
The plots of transformed categories for dimensions
display a plot of the transformation of the row category
values and of column category values into scores in
dimension, with one plot per dimension.
The x-axis has the category values and the y-axis has the
corresponding dimension scores. Thus the category
"Northeast" in the Overview Row Points table above had a
score in dimension of -0.702, as shown on the plot.
3.38
Correspondence Analysis
3.43
Correspondence Analysis
3.46
Correspondence Analysis
Finally the biplot correspondence map is obtained.
3.50
Correspondence Analysis
The data editor appears below. If you wish you may name the
columns. These names will then appear in the final plots.
Transpose the matrix for the row names to be employed.
3.52
Correspondence Analysis
With the prepared commands in an ascii file
3.54
Correspondence Analysis
With the commands input into the Syntax Editor
3.55
Correspondence Analysis
The solution is, of course, unchanged.
3.56
Correspondence Analysis
Two more illustrative examples.
M10
M13
M12
1 M14
M8
M9 M4
M1
Component 2
0 M2M3
M5
M6
-1 M11
-2
M7
-3
-3 -2 -1 0 1 2
Component 1
3.59
Correspondence Analysis
Column Plot
2
fast
rats
abnormal
1 respect
generation
age
rise
Component 2
patients
culture
0 pressure
oestrogen
study
depressed
discharged
blood
disease behavior
-1
-2
close
-3
-3 -2 -1 0 1 2
Component 1
3.60
Correspondence Analysis
Finally a set you might recognise!
3.61
Correspondence Analysis
Row Plot
Preston
1.0 Stevens
SmithClarkson
Negus
Wills
Whelan
Toward Hall
ThompsonShaikh
Bell
Coward-Whittaker
Pickard
Randall
CookeWiddrington Whitlock
0.5 Pearson
Subhedar Oliver
Tyrer MarsdenTang Temple
Patel
MooresRoberts
Holdsworth Hill Atlan Norris
Ratcliffe Moxham
Bainbridge
Rowley Gerrard
Denton
Huggins
Maunder
Bamber
Todd Clegg
Taylor
Component 2
Hudson Bushell
Fraser
Grahamslaw
Tithecott
Hunter
Coates
Wong
Halligan
Elliott
Hickford
Woods
James Lloyd
Barrett
Moss LeslieGallagher
Daley
Kite
Newham
Scrivener
Downing WebbSparrow
0.0 Brennan Stapleton
Douthwaite
Townsend
Mccartney
IrvingBolton
Sayer
TimoneyLam
Lee Waller
Ward
Cobb
Ferguson
Brown
Henly
Appleby NikoletsopoulouMacdonald
Petersson
Wallace
Hutton
Baker
Sams
Davis
Van Harber
Gancarczyk
Fitzpatrick
Froggatt
Pearson Harland
Hawkins
Smith
Fairs Papantoniou Akhtar
Nichol
Lau Grencis
-0.5 Maslen-Jones
ScottTaylor
Grafton-Clarke
Ballard
Simpson
Lilley
Pearson
Harrison
Hudson
-1.0 Parkes
-1.5
-1.5 -1.0 -0.5 0.0 0.5 1.0
Component 1
3.62
Correspondence Analysis
Column Plot
PSY3028
1.0 PSY3006
PSY3020
PSY3009
0.5 PSY3008
PSY3016 PSY3026
PSY3001
Component 2
PSY3013
PSY3097
PSY3029
0.0 PSY3030
PSY3002
PSY3027 PSY3018
-0.5
PSY3022
-1.0
-1.5 PSY3031
3.64
3.65