Guide

A Guide to the Practical Use of
Multivariate Analysis in SIMS
Joanna Lee, Ian Gilmore

National Physical Laboratory, Teddington, UK
Email: joanna.lee@npl.co.uk
Web: http://www.npl.co.uk/nanoanalysis
Crown Copyright 2010

Contents
1. Introduction
What is multivariate analysis?
10
Some matrix algebra
20
2. Identification
h t
Principal component analysis (PCA)
r ig
y
Multivariate curve resolution (MCR)
p
Co
3. Quantification and prediction
Partial least squares regression (PLS)
w
4. Classification n
r o
PCA classification
) C
Principal Component Discriminant Function Analysis (PC-DFA)
(C
Partial Least Squares Discriminant Analysis (PLS-DA)
5. Conclusion
Slide 2
Why are we here?
120
10
PCA
20
100 MCR
h t
Number of Publications
80
PLS
DFA
r ig
ANNs
p y
60
Co
40
w n
20 r o
) C
0
1990 (C 2000 2010
Year of Publication
Slide 3
Data analysis
Calibration /
0
Quantification
1
20
Identification
h t
How is it related to
known properties?
What chemicals SIMS
SIMS
r ig Can we predict
are on the surface?
Dataset
p
Dataset y these properties?
Where are they
located? Co
w n
r o
) C Classification
(C Which group does

it belong to?
What are the differences
between groups?
Slide 4
Contents
1. Introduction
10
Some matrix algebra
20
2. Identification
h t
r ig
4. Classification
p y
5. Conclusion
Co
w n
r o
) C
(C
Slide 5
Chemometrics
Chemometrics is the science of relating measurements made on a chemical
methods
10
system to the state of the system via application of mathematical or statistical
20
h t
r ig
Chemometrics
p y
SIMS
Data C o
Knowledge of Surface Chemistry
and Instrumental Influences
Statistical
Results
w n
r o
) C Statistical methods e.g.
(C multivariate analysis
From A. M. C. Davies Spectroscopy Europe 10 (1998) 28 Slide 6

Multivariate analysis
Multivariate = More than 1 variable

0
Multivariate analysis is the statistical study of the dependence (covariance)
between different variables
1
20

h t
Variables are numerical values that we can measure on a sample
Example 1: A sample of people
r ig
y
Variables: Height, weight, shoe size, days since last haircut
p
Example 2: A sample of weather
Co
Variables: Temperature, humidity, wind speed, visibility, UV index
n
Example 3: A sample of SIMS spectra
w
r o
Variables: Intensity of Si+ peak, intensity of O+ peak, intensity of C3H5+
peak
) C
(C
Key points
Many surface analytical technique, incl. SIMS and
XPS, gives data that are multivariate in nature
Finding correlations in the data is the key to
multivariate analysis!
Slide 7
Why use multivariate
analysis?
Modern ToF-SIMS instrument
72 ion images (out of > 400!) generates huge, multivariate
data sets
10

20
Manual analysis involves
h t
selecting a sub-set of most
interesting features for
r ig
analysis by eye
p y
Co Multivariate analysis involves
simultaneous statistical
w n analysis of all the variables
r o Multivariate analysis can

) C summarise the data with a
(C large number of variables,

using a much smaller number
of factors
Slide 8
Advantages and
disadvantages
Advantages
Fast and efficient on modern computers
10
Uses all information available
20
Improves signal to noise ratio
h t
r ig
Statistically valid, removes potential bias
Disadvantages
p y
Co
Lots of different methods, procedures, terminologies
n
Can be difficult to understand and interpret
w
r o
) C
(C
Slide 9
Why use multivariate
analysis?
4
10
10
20
Peaks or information bins
10
3
h t
r ig
as PCA, PLS usefulp y
Multivariate methods such
o
2
10
n C
o w Ordinary
r
1
10
x + -
) C x+-
jolly good
(C
10 0
0
1 2 3 4
here
10 10 10 10 10
Counts per peak
Slide 10
Contents
1. Introduction
10
Some matrix algebra
20
2. Identification
h t
r ig
4. Classification
p y
5. Conclusion
Co
w n
r o
) C
(C
Slide 11
Data matrix
A matrix is simply a rectangular Mass spectrum of Sample 1

40
table of numbers!
0
Intensity
30
20
10
0 1
9 32 10 1 21
0
t 2
h
1 2 3 4 5
Samples
Mass
X = 18 20 22 4 12
r ig
y
Mass spectrum of Sample 2
p
30
24 12 30 6 6
o
Intensity
20
Variables
n C 10
0
ow 1 2 3
Mass
4 5
Cr
X has 3 row and 5 columns
3 5 data matrix
) 40
(C
Intensity
30
20
Each row (spectra) is 10
0
represented by a vector 1 2 3 4 5
Mass
Slide 12
Matrix algebra
Matrix multiplication
Matrix addition
A B = C
A+ B = C 10
(I N ) (N K ) = (I K )
20
(I K ) + (I K ) = (I K )
t
No. of columns of A must be equal
h
ig
no. of rows of B
r
A and B must be the same size
Each corresponding element is p y
Row i of A times column j of B gives
the row i and column j of the product
added
Comatrix AB
w
2 4 1 1 2 0 1 6 1
n 1 4 1 1 + 4 3 1 2 + 4 2
r o
3 8 6 + 0 1 2 = 3 9 4 2 2 1 2 = 2 1 + 2 3 2 2 + 2 2
3 2

) C
4 2 4 1 + 2 3 4 2 + 2 2
(C
(e.g. pure spectra + noise =
experimental data)
13 10
=8 8

10 12
Slide 13
Matrix algebra and SIMS

40
Intensity
30 Mass spectrum of Chemical 1

20 8
0
10 Chemical Chemical 6
Intensity
0
1 2 3
Mass
4 5
Sample
1 2 4
2
0 1
5 1
2
Mass spectrum of Sample 2 0
30 1
= 1 2 3 4 5
t
Mass
Intensity
20
Sample
10
0
2
2
i
4
gh 6
Mass spectrum of Chemical 2
Intensity
r
1 2 3 4 5 Sample 4
Mass 0 6
3
y
2
p
40 0
Intensity
30 1 2 3 4 5
o
Mass
20
C
10
0
1 2 3 4 5
Data matrix
Mass
w =
n Sample
Chemical spectra
9 r
o composition
C
32 10 1 21 5 1
Samples
20 22 )4 12
Chemicals
Samples
18 2 4 1 6 1 0 4

24
C
12 ( 30 6 6
=

0 6
4 2 5 1 1

Variables [mass] Variables [mass]
Chemicals
Slide 14
Matrix algebra and SIMS
1. Each spectrum can be represented by a vector

2. Instead of x, y, z in 3D real space, the axes are mass1, mass2,
10
mass3 etc in variable space (also data space)
20
h t
3. Assuming the data are a linear combination of chemical spectra, we
can write it as a product of two matrices
r ig
p
4. There are infinite number of possible solutions! y
Co
Data matrix
w n
=
Sample
Chemical spectra
9 r
o composition
C
32 10 1 21 5 1
Samples
20 22 )4 12
Chemicals
Samples
18 2 4 1 6 1 0 4

24
C
12 ( 30 6 6
=

0 6
4 2 5 1 1

Variables [mass] Variables [mass]
Chemicals
Slide 15
Factor analysis
1. Each spectrum
We can describe canthebedata
represented bycombination
as a linear a vector of spectra,
by writing
2. Instead of x, y, the
z indata matrix
3D real as product
space, the axesof are
twomass1,
matrices:
mass2,
10
mass3 etc in variable space (also data space)
20
3. Assuming the One
data contains the spectra
are a linear (loadings)
combination
h t
of chemical spectra, we
One containing the contributions (scores)
can write it as a product of two matrices
r ig
4. There are infinite
This number of possible
is the basis p
solutions!
of factor y
analysis!
Co
Data matrix
w = n
Scores Loadings
9 r
o
C
32 10 1 21 3.6 1.9
Samples
20 22 )4 12
Samples
Factors
18 3.7 4.7 5.7 5.7 1.0 3.5
-0.2

24
C
12 ( 30 6 6

3.8
=
-1.6
3.9 5.9 5.3 1.3 4.4

Variables [mass]
Variables [mass] Factors
Slide 16
Contents
1. Introduction
2. Identification
10
20
PCA walkthrough
h t
Data preprocessing
r ig
PCA examples
p y
MCR examples Co
w n
r o
4. Classification
) C
5. Conclusion
(C
Slide 17
Data analysis
Calibration /
0
Quantification
1
20
Identification
h t
known properties?
What chemicals SIMS
SIMS
r ig Can we predict
are on the surface?
Dataset
p
Where are they
located? Co
w n
r o
) C Classification
(C Which group does

it belong to?
between groups?
Slide 18
Terminology
In order to clarify existing terminology and emphasise the

relationship between the different multivariate techniques, we
are going to adopt the following terminology in this lecture
10
20
Terms
Here
Symbol Definition PCA
h t
MCR PLS
r ig
Factor -
p
An axis in the data space of a factory Principal Pure Latent
Co
analysis model, representing an
underlying dimension that contributes
Component Component Vectors,
Latent
w original data setn

to summarising or accounting for the Variables
Loadings P
r o Projection of a factor onto the Loadings, Pure Loadings
) C variables Eigenvector Component

Spectrum
Scores
(C
T Projection of the samples onto the
factors
Scores,
Projections
Pure
Component
Concentration
Scores
ISO FDIS 18115-1 Surface chemical analysis Vocabulary

Part 1: General terms and terms for the spectroscopies Slide 19
Principal component
analysis (PCA)
m2
PCA Factor 2
PCA Factor 1
10
PCA Factor 2
20
h t PCA Factor 1
m1
r ig
p y
Co
w n

o
Factors are directions in the data space chosen such that they reflect
r
C
interesting properties of the dataset
)
(C
Equivalent to a rotation in data space factors are new axes
Data described by their projections onto the factors
Slide 20
Principal component
I = no. of samples
analysis (PCA) K = no. of mass units
m2 D = dimensionality of data
PCA Factor 2
PCA Factor 1
10
PCA Factor 2
20
h t PCA Factor 1
m1
r ig
p y
Co
w n
The projection of the PCA factors onto the
X = TP
r o
original variables (m1, m2) are loadings
(I K ) = (I D )(D K )
) C
The projection of the samples (stars) onto the
Data matrix
(C
PCA factors are scores
The data is fully described by D factors, where Scores matrix Loadings matrix
D is the dimensionality of the data (number of
samples or variables, whichever is smaller)
Slide 21
Principal component
I = no. of samples
m2 D = dimensionality of data
PCA Factor 2 (22%)

PCA Factor 1
10
PCA Factor 2
20
h t PCA Factor 1
(78%)
m1
r ig
p y
Co
w n
PCA extract orthogonal (uncorrelated) X = TP
r o
factors that successively capture the
(I K ) = (I D )(D K )
) C
largest amount of variance within the data
Data matrix
(C
The amount of variance described by each
factor is called eigenvalue Scores matrix Loadings matrix
Slide 22
Principal component
I = no. of samples
m2 N = no. of PCA factors
PCA Factor 1
10
20
h t PCA Factor 1
(78%)
m1
r ig
p y
Co

w
By removing higher factors (smalln XX= =TT PP + E
r o
variance due to noise) we can reduce (I K(I)=K(I) =N
(I)(NN)(K
N)+K(I) K )
C
the dimensionality of data factor
) PCAmatrix
Data reproduced Residuals
(C
compression (noise)
data matrix
Often hundreds of variables can be Scores
Scores
matrix
matrix
Loadings
Loadings
matrixmatrix
described with just a handful of factors!
Slide 23
Number of factors
Data set of 8 spectra from mixing 3 pure compound spectra
1010 (a) 1. Prior knowledge of system

10
20
Variance captured (Eigenvalues)
105
no noise
100
2. Scree test:
h t
Eigenvalue plot levels off in a linearly
10-5
ig
decreasing manner after 3 factors
r
10-10
p y
3. Percentage of variance captured by
10-151
108
2 3 4 5 6 7 8
Co Nth PCA factor:
107
w
(b) n N th eigenvalue
sum of all eigenvalue s
100%
106
105
r o Poisson noise
104
) C max 5000 counts
4. Percentage of total variance captured
(C
103 by first N PCA factors:
102
sum of eigenvalue s up to N
1011 2 3 4 5 6 7 8 100%
sum of all eigenvalue s
PCA Factor
Slide 24
Contents
1. Introduction
2. Identification
10
20
PCA walkthrough
h t
Data preprocessing
r ig
PCA examples
p y
MCR examples Co
w n
r o
4. Classification
) C
5. Conclusion
(C
Slide 25
PCA walkthrough
Eight library spectra:
PS 2480
PS 3550 10
PMMA 2170
20
PMMA 2500
h t
PEG 1470
PEG 4250
r ig
PPG 425
p y
PPG 1000
Co
Unit mass binned
w n
o
and mean centered
r
C
prior to analysis
)
(C
Calculation using
MATLAB with PLS
Toolbox 4.0
Slide 26
PCA walkthrough
Perform scree test using the log eigenvalue plot
10
20
h t
r ig
p y
Co
w n
r o
) C
(C
Slide 27
PCA walkthrough
First PCA factor (PC1):
10
20
h t
r ig
p y
Co
w n
r o
) C
(C
Slide 28
PCA walkthrough
Second PCA factor (PC2):
10
20
h t
r ig
p y
Co
w n
r o
) C
(C
Slide 29
PCA walkthrough
Third PCA factor (PC3):
10
20
h t
r ig
p y
Co
w n
r o
) C
(C
Slide 30
PCA walkthrough
Biplot - PCA factor 1 against PCA factor 2:
10
20
h t
r ig
p y
Co
w n
r o
) C
(C
Slide 31
PCA walkthrough
Using PCA we have effectively reduced 300 correlated variables (mass units) to 3
0
independent variables (factors) by which all the samples can be characterised.
1
20
h t
r ig
p y
Co
w n
r o
) C
(C
Slide 32
Contents
1. Introduction
2. Identification
10
20
PCA walkthrough
h t
Data preprocessing
r ig
PCA examples
p y
MCR examples Co
w n
r o
4. Classification
) C
5. Conclusion
(C
Slide 33
Data preprocessing
Data preprocessing is the

manipulation of data prior to
data analysis
10
20
h t
r ig
p y
Co
w n
r o
) C
(C
Slide 34
Data preprocessing
Enhances PCA by bringing out important variance in dataset

1
Makes assumption about the nature of variance in data0
Can distort interpretation and quantification
20
h t
Includes:
r ig
p
Peak selection and binning y
Centering
Co
Mean centering
w n
Scaling
r o
) C
Normalisation
Variance scaling
(C Poisson scaling
Binominal scaling
Slide 35
Peak selection and
binning
Manual selection
Peaks of interest only
Unexpected features lost
10
20
Auto peak search
h t
All peaks of interest included?
r ig
What threshold to use?
p y
Unit mass binning Co
w
Straightforward to use but n
r o
detailed information lost
) C
(C
0.5 u binning*
Separates organic from inorganics
A Henderson et al., Surf. Interface

Anal. 41 (2009) 666-674 Slide 36
Peak selection and
binning
Manual selection
Peaks of interest only
Unexpected features lost
10
Important considerations
20
Auto peak search
h t
All peaks of interest included?
r g
What information are we putting into PCA?
i
What is included? What is omitted?
What threshold to use?
p y
Unit mass binning Co
Do we need to apply further processing
w
Straightforward to use but n
e.g. dead time correction?
r o
detailed information lost
) C
(C
0.5 u binning*
Separates organic from inorganics
A Henderson et al., Surf. Interface

Anal. 41 (2009) 666-674 Slide 37
Mean centering
X ik = X ik mean(X :k )
~
Subtract mean spectrum from each sample

10
PCA describes variations from the mean
20
h t
m3 Raw data
r i
m3
g Mean Centering
p y
Factor 2
Co Factor 2 Factor 1
w n m1 m1
r o
) C
(C
Factor 1
m2 m2
1st factor goes from origin to 1st factor goes from origin and
centre of gravity of data accounts for the highest variance
Slide 38
Normalisation
~ 1
X ik = X ik
sum (X i : )
Divide each spectrum by a

10
constant for each sample e.g.
intensity of a specific ion, total 20
ion intensity
h t
r ig
Assumes chemical variances
can be described by relative p 0
y 50 100 150
o
Mass, u
changes in ion intensities

n C Normalised by
total ion intensity
w
Preserves the shape of spectra
o
Cr
Reduces effects of topography,
)
sample charging, changes in
(C
primary ion current
0 50 100 150
Mass, u
Slide 39
Variance scaling
~ 1
X ik = X ik
var (X :k )
Divide each variable by its standard deviation in the dataset

10
Equalises importance of each variable (i.e. mass)
20
Problematic for weak peaks usually used with peak selection
h t

g
Called auto scaling if combined with mean centering
r i
For each variable (mass,
p
Raw data y
Mean- Variance Auto
in SIMS spectrum)
Co centering scaling scaling
w n
o
Variance
Cr
Mean
)
(C
Diagram from P. Geladi and B. Kowalski, Partial Least-Squares Regression:
A Tutorial, Analytica Chimica Acta, 185 (1986) 1 Slide 40
Poisson and binomial M R Keenan et al. Surf. Interface Anal., 36 (2004) 203
M R Keenan et al., Surf. Interface Anal., 40 (2008) 97
scaling
SIMS data is dominated by Poisson counting noise statistical

uncertainty of a peak is proportional to intensity
10
0
The noise becomes binomial for saturated data with dead time correction
2
t
Divide data by the estimated noise variance of each data point
h
r ig
No preprocessing
No. of factors?
p y
Poisson scaling
4 factors needed
Emphasises weak peaks
Co which vary above the

expected counting noise,
w n over intense peaks

varying solely due to
r o counting statistics
) C Provides better noise

(C rejection in PCA
Slide 41
Contents
1. Introduction
2. Identification
10
20
PCA walkthrough
h t
Data preprocessing
r ig
PCA examples
p y
MCR examples Co
w n
r o
4. Classification
) C
5. Conclusion
(C
Slide 42
PCA example (1)
PC1 Scores (62%)

Three protein compositions
(100% fibrinogen, 50%
10
fibrinogen / 50% albumin,
20
100% albumin) adsorbed onto
h t
poly(DTB suberate)
r ig
Loadings on first factor (PC1) p y
shows relative abundance of
Co
PC1 Loadings (62%)

Alb
amino acid peaks of two
proteins
w n
r o
) C
Scores on PC1 separates
(C
samples based on protein
composition Fib
D.J. Graham et al, Appl. Surf. Sci., 252 (2006) 6860 Slide 43
PCA example (2)
SIMS spectra acquired for

antiferritin with or without
10
trehalose coating
20
Largest variance (PC 1)
h t
arises from sample
r ig
heterogeneity
p y
PC 2 distinguishes samples
protected by trehalose Co
w
higher intensities of polar n
o
and hydrophilic amino acid
r
fragments
) C
(C
Trehalose preserves protein
conformation in UHV
N. Xia et al, Langmuir, 18 (2002) 4090 Slide 44

PCA example (3)
PC2 Scores (19%)

16 different single protein
films adsorbed on mica
10
20
Excellent classification of
h t
proteins using only 2 factors
r ig
p y PC1 Scores (53%)
Loadings consistent with total
amino acid composition of Co
various proteins
w n
r o PCA Loadings
) C
95% confidence limits
provide means for
(C
identification / classification
M. Wagner & D. G. Castner, Langmuir, 17 (2001) 4649 Slide 45

PCA image analysis
Datacube contains a raster of

I x J pixels and K mass peaks
10
The datacube is rearranged
20
into 2D data matrix with
dimensions [(I J) K] prior to h t
PCA unfolding
r ig
PCA results are folded to form p y
scores images prior to
Co
interpretation 1 1 1
w n
o
2 2 2
1
1
4
4
7
C
7
r 3 3 3
I,
rows
)
4 4 4
1 24 57 8 unfold
(C
2
2
35
5
68
8
9
5 5 5
s s
6 6 6 a
3 6 9 m
3 6 9
7 7 7 K, aks
8 8 8 J, columns pe
9 9 9
Slide 46
PCA image example (1) 2
Mean centering
log(eigenvalues)
1
0
Immiscible PC / PVC polymer blend
-1
42 counts per pixel on average
Total ion image
-2
2 4 6 8
10
10 12 14 16
0
18 20
Sorted eigenvector index
-1.5
t 2
h
-2
log(eigenvalues)
Normalisation
g
-2.5
-3
-3.5
-4
yr i
op -4.5
-5
C
2 4 6 8 10 12 14 16 18 20
w n 0.8
log(eigenvalues)
r o 0.6 Poisson scaling
C
0.4
)
Only 2 factors needed 0.2
(C
dimensionality of image reduced
by a factor of 20!
0
2 4 6 8 10 12 14 16
18 20
J L S Lee, I S Gilmore, in Surface Analysis: The Principal

Techniques 2nd edition (eds J C Vickerman, I S Gilmore), Wiley. Slide 47
PCA image example (1)
PCA results after Poisson scaling and mean centering

10
2
1.5 PC1 loadings

10
5
1 35Cl + 37Cl 1st
20
factor distinguishes
0.5
h t
PVC and PC phases
0
0
O + OH
r ig
PC1 scores -5
-0.5
0 5 10
p
15 20
Mas s, u
25 30
y
35 40
10 1
Co
w
8
0.8
0.6
n
PC2 loadings
r o 6
0.4 2nd factor shows
C detector saturation
4
0.2
) 2
0
for intense 35Cl peak
(C
PC2 scores
0
-2
-0.2
-0.4
0 5 10 15 20
Mas s, u
25 30 35 40
J L S Lee, I S Gilmore, in Surface Analysis: The Principal

Techniques 2nd edition (eds J C Vickerman, I S Gilmore), Wiley. Slide 48
PCA image example (2)
72 ion images (out of > 400!)
10
Image courtesy of
20
Dr Ian Fletcher
h t
Intertek MSG
r ig
p y
Co
w n
Total Spectra
r o
) C
m
50
(C Mass, u
Slide 49
Image courtesy of Dr Ian
PCA image example (2) Fletcher, Intertek MSG
3000 0.8
B
0.6
2000
Factor 3
Hair fibre with multi-component 1000
0.4
0.2
D
pretreatment
0
0
0
1
-0.2
PCA scores PCA loadings -1000 -0.4
20 A
5000
0.4
A
B
h
2000
t
0.6
B
Factor 1
Factor 4
g
4000 0.4 A
3000
2000
0.2
C
yr i 1000
0
0.2
1000
D E
op -1000
-0.2
-0.4 C
E
C
0
0.6
B B
n
C 2000 0.8
1000
Factor 5
0.4 0.6
Factor 2
ow 0.2
1000 0.4
r
0.2 C
-1000 0
0
C
0
)
-2000 -0.2
-0.2 -1000
E -0.4
(C
-3000 Mass (arb. scale)
0.8
PCA factors are abstract combinations of chemical components and

optimally describe variance PCA results can be difficult to interpret!
J L S Lee et al., Surf. Interface Anal. 2009, 41, 653-665 Slide 50

PCA summary
I = no. of samples
K = no. of mass units
N = no. of factors
X = TP + E
Data matrix
10
Residuals (noise)
Projection of samples Projection of factors
20
t
onto factors (scores matrix) onto variables (loadings matrix)
h
r ig

y
PCA describes the original data using factors, consisting of loadings
p
and scores which efficiently accounts for variance in the data
Co
Eigenvalues give the variance captured by the corresponding factors
w n
Data preprocessing method needs to be selected with care
r o

C
PCA is excellent for discrimination and classification based on
)
differences in spectra, and for identifying important mass peaks
(C
PCA factors optimally describe variance PCA results may be
difficult to interpret
Slide 51
Contents
1. Introduction
2. Identification
10
20
PCA walkthrough
h t
Data preprocessing
r ig
PCA examples
p y
MCR examples Co
w n
r o
4. Classification
) C
5. Conclusion
(C
Slide 52
Multivariate curve
resolution (MCR)
PCA factors are directions that describes variance

positive and negative peaks in the loadings
10
can be difficult to interpret
20

t
What if we want to resolve original chemical spectra and reverse
h
the following process?
r ig
Data matrix =
p
Sample
composition y Chemical spectra
9 32 10 1 21
Samples
Co
5 1
n 2 4
Samples
Chemicals
18 20 22 4 12 1 6 1 0 4
=

ow

24 12 30 6 6
4 2 5 1 1
Cr 0 6
)
Variables [mass] Chemicals
Variables [mass]
(C
Try multivariate curve resolution (MCR)!
Slide 53
Multivariate curve
I = no. of samples
resolution (MCR) K = no. of mass units
X = TP + E
N = no. of factors
(I K ) = (I N )(N K ) + (I K )
Data matrix
10
Residuals (noise)
20
h t
r ig
MCR is designed for recovery of chemical
p y m2 MCR Factor 2
spectra and contributions from a multi-
Co
component mixture, when little or no prior
information about the composition is
available w n
r o MCR Factor 1
) C
MCR assumes linear combination of
(C
chemical spectra (loadings) and
contributions (scores) only an
m1
approximation in SIMS
Slide 54
Multivariate curve
I = no. of samples
resolution (MCR) K = no. of mass units
X = TP + E
N = no. of factors
(I K ) = (I N )(N K ) + (I K )
Data matrix
10
Residuals (noise)
20
h t
r ig
MCR uses an iterative least-squares
p y m2 MCR Factor 2
algorithm to extract solutions, while
applying suitable constraints
Co
w n
r o
With non-negativity constraint, MCR
MCR Factor 1
) C
factors resemble SIMS spectra and
chemical contributions more directly, as
(C
these must be positive
m1
Slide 55
Outline of MCR Raw Data
X = TP + E
Data Matrix X
Initial Estimates of T or P
10
Random initialisation
Reproduced
PCA
20
PCA loadings or scores
Data Matrix X
h t
Varimax rotated PCA loadings
Noise filtered data
Ensures MCR
r ig or scores
Pure variable detection
solution is robust
Number of Factors
p y algorithm e.g. SIMPLISMA
Co
Constraints
Non-negativity
w n MCR Convergence
Equality
r o alternating-least-squares criterion
) C optimisation Non-negativity
Equality
(C
MCR MCR
Scores T Loadings P
Slide 56
Rotational ambiguity
MCR solutions are not unique!

0
Accuracy of resolved spectra depends on the existence of pixels or
1
samples where there is only contribution from one chemical
component (selectivity)
20
h t
m2 MCR Factor 2 m2
r ig
MCR Factor 2
p y
Co
w n
MCR Factor 1
MCR Factor 1
r o m1 m1
) C
(C
Good initial estimates, suitable data preprocessing and correct
number of factors are essential
Slide 57
Contents
1. Introduction
2. Identification
10
20
PCA walkthrough
h t
Data preprocessing
r ig
PCA examples
p y
MCR examples Co
w n
r o
4. Classification
) C
5. Conclusion
(C
Slide 58
MCR calculations using Matlab with MCR-ALS toolbox,
MCR image example (1) freely available from http://www.mcrals.info/
Simple
10 PVC / PC
20 polymer
h t blend
r ig
p y
Co
w n
r o
) C
(C
Slide 59
MCR image example (1)
MCR scores
10
(pure component
0
concentration)
2
h tthese will be
r ig folded to form
projection images
p y
Co
w n MCR loadings
r o (pure component
) C spectra)
(C
MCR calculations using Matlab with MCR-ALS toolbox, freely available
from http://www.mcrals.info/ Slide 60
(a)
Simple PVC / PC polymer(b)
blend
Loadings on MCR factor 1 Scores on MCR factor 1
2
10
1.5 PVC 10
20
1
h t
MCR extracts two
distinctive factors,
0.5
r ig 5
corresponding to
0
p y 0
PVC and PC
0 10 20 30
Loadings on MCR factor 2

40
Co
Scores on MCR factor 2
respectively
1.5
w n
1
PC
r o 10 Straightforward
interpretation
) C 5
(C
0.5
0 0
0 10 20 30 40
Mass, u
72 ion images (out of > 400!)
10
Image courtesy of
20
Dr Ian Fletcher
h t
Intertek MSG
r ig
p y
Co
w n
Total Spectra
r o
) C
m
50
(C Mass, u
Slide 62
Image courtesy of Dr Ian
MCR image example (2) Fletcher, Intertek MSG
MCR scores MCR loadings

5000
0.6
MCR loadings resemble SIMS B
4000
Factor 3
spectra (characteristic peaks A-E)
0
0.4
3000
and fragments, and scores directly
reveal spatial distributions!
0
2000
1
2
0.2
D
t
1000
MCR scores
1
B
MCR loadings
igh 0
4000
0
r
6000
0.8 0.4
y
E
Factor 1
Factor 4
5000
3000
4000 0.6
3000 0.4
op 2000
0.2
2000
1000
0.2
n C 1000
w
0 0 0 0
3000 0.8
o
1
A
Cr 0.6 2000 0.8

C
Factor 2
Factor 5
2000
) 0.4
0.6
(C 1000
0
0.2
0
1000 0.4
0
0.2
0
Mass (arb. scale) Mass (arb. scale)

We take three pictures and assign each with a SIMS spectra

(PBC, PC, PVT)
10

0
The pictures are combined to form a multivariate image dataset
2

t
Poisson noise are added to the image (avg ~50 counts per pixel)
h
PCA Scores 1 PCA Scores 2
r ig PCA Scores 3
p y
Co
w n
r o
) C
(C
Slide 64
We take three pictures and assign each with a SIMS spectra

(PBC, PC, PVT)
10

0
The pictures are combined to form a multivariate image dataset
2

t
Poisson noise are added to the image (avg ~50 counts per pixel)
h
PCA Scores 1
MCR MCR
PCA Scores 2
r ig MCR
PCA Scores 3
p y
Co
w n
r o
) C
(C
MCR resolves the original images unambiguously!
Slide 65
MCR spectra example
Scores and loadings for 3 of the MCR factors
ToF-SIMS depth profiling of

copper film grown on TaN
10
coated silicon wafer
20
Manual analysis is difficult, e.g.
h t
Si- can arise from SiOx-, SiN- or
r ig
silicon substrate
p y
MCR resolves 8 factors.
Loadings resemble SIMS Co
w
spectra of individual phases n
o
and scores resemble their
r
Scores for all 8 MCR factors
C
contribution to the depth profile
)
(C
Improve signal to noise and
correlation of related peaks
K G Lloyd. J. Vac. Sci. Technol. A 25 (2007) 878 Slide 66

MCR summary I = no. of samples
N = no. of factors
X = TP + E
Data matrix
10
Residuals (noise)
Projection of samples Projection of variables
20
onto factors (scores matrix)
t
onto factors (loadings matrix)
h
r ig

y
MCR describes the original data using factors, consisting of loadings
p
Co
and scores which which resembles chemical spectra and
contributions from a multi-component mixture, respectively

w n
MCR uses an iterative algorithm to extract solutions, while applying
r o
suitable constraints e.g. non-negativity

) C
Good initial estimates and suitable data preprocessing are essential
(C
MCR is excellent for identification and localisation of chemicals in
complex mixtures and allows for direct interpretation
Slide 67
Identification summary
Attributes Manual analysis PCA MCR

Ease of Easy Medium / Difficult Easy
interpretation Single ion images Abstract,
10
Non-negative scores
orthogonal factors
20
and loadings
h t
Chemical
identification
Difficult
Characteristic
Medium
r ig
Important peaks
Easy
Full spectra obtained
peaks only
p y
and correlation
Detection of Difficult
Co
Easy Difficult ?
minor
components
Only if substance
w
is known n Higher factors
capture small
Possibly depend on
system studied
r o variance
)
Most suitableC Simple dataset Discrimination of Identification for
for
(C with good prior
knowledge
similar chemical
phases
unknown mixtures
Slide 68
Contents
1. Introduction
2. Identification
10
20
h t
ig
Calibration, validation and prediction
r
PLS examples
p y
4. Classification
5. Conclusion Co
w n
r o
) C
(C
Slide 69
Data analysis
Calibration /
0
Quantification
1
Identification
20
h t
known properties?
What chemicals
are on the surface? SIMS
SIMS
r ig Can we predict
Where are they
Dataset
p
located?
Co
w n
r o
) C Classification
(C Which group does What are the differences

it belong to? between groups?
Slide 70
Regression analysis
We use regression analysis to

find a predictive relationship 30
y = 0.6944*x 0 10
between two variables 25
t 2
Response variable, y
y = b*x + e 20
ig h e
15
yr
op
Response
variable
n C 10
ow
Cr
Regression Predictor 0
)
coefficient variable 0 10 20 30
(C
Predictor variable, x
Slide 71
Multivariate regression
Mass spectrum of Sample 1 Measured properties

40
Intensity
30
20
10
Molecular
weight
Solution
concentration
10
Reaction
time
0
0
1 2 3 4 5
2
Mass Sample 1 5 1 3
t
30 Sample 2 2 4 7
h
Intensity
20
10
0
1 2 3 4 5
Sample 3 1
r ig 6 4
40
Mass
p y
o
Can we predict the properties of similar
Intensity
30
20
10
0
1 2 3
Mass
4 5
n C
materials from their SIMS spectra?
o w y = f (x) + e
yC
r
= b x + b x + b x + ... + b x +e
) 1 1 2 2 3 3 m m
(C
Response variable Regression Predictor variable
i.e. measured property coefficient i.e. intensity at mass m
Slide 72
Multivariate regression
I = no. of samples
M = no. of response variables
Extending to I samples and M response variables

Y = XB + E 10
(I M ) = (I K )(K M ) + (I M )
20
Response
h t
Residuals (noise)
variables
SIMS
r ig
Regression
p
data matrix
y matrix
Co
1. We can calculate B to gain an understanding of the covariance
n
relationship between X and Y
w
o
e.g. relating SIMS spectra with sample preparation parameters
r
C
2. B can be applied to future samples in order to predict Y using only
)
(C
measurements of X
e.g. quantifying surface composition or coverage of samples using
only their SIMS spectra
Slide 73
Partial least squares
regression (PLS) Y = XB + E
(I M ) = (I K )(K M ) + (I M )
Partial least squares regression (PLS) is a multivariate regression

method for data X containing a large number of strongly correlated
10
variables
20
t
PLS finds factors (called latent variables) that successively
h
ig
accounts for the largest covariance between X and Y
r
y
Removes redundant information from the regression i.e.
p
o
information describing X that has no correlation with Y
C
w
Y can be discarded n
Higher PLS factors that describe little covariance between X and
r o

C
The regression vectors B are a linear combination of PLS loadings
)
that best predict Y from X
(C
Important to determine the number of factors to include in PLS!
Slide 74
Calibration, validation,
prediction Y = XB + E
(I M ) = (I K )(K M ) + (I M )
Calibration
Use cross-validation to
Fit a PLS model to a calibration data determine number of
10
set with known X and Y factors
20
h t
Validation
r ig
Apply model to an independent
p y
Y, and calculate error between
Co
validation data set with known X and
predicted Y and measured Y

w n Use validation to
r o determine prediction error

Prediction
) C
(C
Apply model to future samples to
predict Y using only measurements
of X
Slide 75
Number of factors
cross validation
PLS can be used to build predictive models (calibration)

Cross-validation can be used to determine the number of factors and
guard against over-fitting
10
20
Good predictive model h t
Data is overfitted!
r ig
35
1storder
p
45
y 5th order
30
polynomial
o 40
polynomial
Response variable, Y
Response variable, Y
25
n C 35
30
w
20 25
15
r o 20
10
) C 15
10
5
0
0
(C 10 20 30
5
0
0 10 20 30
Predictor variable, X Predictor variable, X
Slide 76
Number of factors
cross validation
0.7
0.6 RMSEC (Root Mean Square Error
10
of Calibration) goes down with
RMSECV, RMSEC
0.5
0
RMSECV
increasing number of factors
0.4
RMSEC
t 2
0.3
gh
To decide optimal number of
i
0.2
yr
factors use minimum of RMSECV
(Root Mean Square Error of Cross
0.1
op Validation)
0
1 2 3 4 5 6 7 8 9
Number of PLS Factors
n C
10 11 12
ow 1 2 3

r
Leave one out cross validation most popular
C
Calculate PLS model excluding sample i, for N PLS factors 4 5
)

(C
Use model to predict sample i and calculate error
Repeat for all different samples
Calculate root mean square error of cross validation (RMSECV)
Repeat for different number of PLS factors
Slide 77
Validation and prediction
population
next set of
Calibration
next set of
data setsample 10
sample?
20
h t
r ig
p y
Validation data should be statistically independent from calibration data
e.g. data taken on a different batch of samples, on a different day
Co
Calculate RMSEP (Root Mean Square Error of Prediction)
w n
r o Calibration
) C
Independent validation set is
(C
essential if we want to use model to Validation
predict new samples!
Prediction
Slide 78
Contents
1. Introduction
2. Identification
10
20
h t
Calibration, validation and prediction
r ig
PLS examples
p y
4. Classification
5. Conclusion Co
w n
r o
) C
(C
Slide 79
PLS example (1)
13 SIMS spectra of thin films

of Irganox were compared
10
with their thicknesses
20
measured with XPS
h t
Both X and Y are mean
centered before analysis,
r ig
and peaks below 50 u were
p y
removed
Co
Two PLS factors are
w
retained, explaining 99.8%
n
r o
of the variance in X (SIMS
) C
data) and 98.8% of the
(C
variance in Y (thicknesses)
Calculation using MATLAB with PLS Toolbox 4.0 Slide 80

PLS example (1)
Leave-one-out cross validation

shows 2 PLS factors to include in
10
model
20
h t
Scores and loadings on PLS r ig
factor 1 and factor 2 are calculated
p y
0.8
Co -11
x 10
0.6
w n
Factor 1
Factor 2
2 Factor 1
Factor 2
o
PLS Loadings
r PLS Scores
0.4
0.2
) C 0
(C
-1
0
-2
-0.2
500 1000 1500 2 4 6 8 10 12
Mass Sample
Slide 81
PLS example (1) 12
x 10
10
10 231
Regression Vector for Y

8
PLS model able to predict 59
thicknesses for t < 6 nm 6
4
10
PLS regression vector shows
Irganox characteristic peaks most 2
277
20 1176
correlated with thickness 0
h t
Irganox dewets on the surface so -2
r ig
initial thickness is proportional to
p y 200 400 600
Mass, u
800 1000 1200
surface coverage!
Co 6
Thickness predicted by SIMS (nm)

n
5
4 nm
ow 4
Cr 3
)
(C
2
0
0 1 2 3 4 5 6
Thickness measured by XPS (nm)
FM Green et al., Anal. Chem. 2009, 81, 7579 Slide 82
PLS example (2)
ToF-SIMS spectra of 576 copolymers are related to

their experimental water contact angles (WCA)
10
Positive and negative ion spectra are normalised
separately, then concatenated (combined) into
20
single data matrix X
h t
r ig
Hydrocarbons
p y
e.g. CnHm+
hydrophobic Co
w n
r o
) C
(C Polar species
e.g. CnHmO+ & CN-
hydrophilic
A. J. Urquhart et al., Anal. Chem., 80 (1), 135 -142, 2008 Slide 83

Quantification and
prediction summary
Y = XB + E
Response
variables 0
Residuals (noise)
1
Data matrix Regression
matrix
20
h t

ig
PLS is a multivariate linear regression technique
r
y
PLS find factors that best describe the structure of covariance
p
between X and Y
Co

n
Data preprocessing method needs to be selected with care
w

o
PLS is excellent for calibration and quantification, and for studying
r
C
the relationship between SIMS data and other measured properties
)
(C
Properly validated PLS models can be used for predictions of these
properties using SIMS spectra
Slide 84
Contents
1. Introduction
2. Identification
10
20
4. Classification
h t
PCA classification
r ig
y
Principal Component Discriminant Function Analysis (PC-DFA)
p
5. Conclusion Co
Partial Least Squares Discriminant Analysis (PLS-DA)
w n
r o
) C
(C
Slide 85
Data analysis
Calibration /
0
Quantification
1
Identification
20
What chemicals
h t
known properties?
are on the surface? SIMS
SIMS
r ig Can we predict
Where are they Dataset
p
located?
Co
w n
r o
) C Classification
(C Which group does

it belong to?
between groups?
Slide 86
PCA classification
PC2 Scores (19%)

16 different single protein
films adsorbed on mica
10
20
Excellent classification of
h t
proteins using only 2 factors
r ig
p y PC1 Scores (53%)
Factors consistent with total
amino acid composition of Co
various proteins
w n
PCA Loadings
r o
) C
95% confidence limits
provide means for
(C
identification / classification
M. Wagner & D. G. Castner, Langmuir, 17 (2001) 4649 Slide 87

Discriminant analysis
PCA Discriminant analysis

Describes the spread of the data
0
Describes the difference between
1
in order of importance:
0
the smiley and the heart:
2
h t
r ig
2 p y
Co
w n
1 r o
) C 1
(C
Slide 88
Example 1 PC-DFA
PC-DFA = Principal Component Discriminant Function Analysis

(mean1 mean2 ) 10
Discriminant functions maximizes the Fishers ratio between groups
Fisher' s ratio =
2
20
var1 + var2
h t
Used to distinguish strains of bacteria
r ig
p y
Co
w n
r o
) C
(C
J. S. Fletcher et al, Appl. Surf. Sci. 252 (2006) 6869 Slide 89
Example 2 PLS-DA
Example: Hyperaccumulator plant which stores high levels of Ni in epidermal

cells. What is the difference between epidermal and other cells?
10
PCA
20
PC1, 2 and 3 overlay PC10 scores and loadings
h t
r ig 0.6
Variables/Loadings Plot for data
213
y
0.5
p
0.4
mesophyll
Loadings on PC 10 (0.58%)
o
0.3
175
215
C
0.2 86
95 184
58 257
epidermal 0.1 73
23 37 97104 137 157 249
6371 85 113127
n
3142 91 109 166177 197
187 225235 253
4755627283 99 145159 205218 237 261 273 299 337
309 335350
0 4812
162232
3850 748289 102 122131
188203
167182 223
212 231
243259 289
281
250267284 297311 327344
319
2 192834 5666 6879 92100 111 140155
130
121 164173 195
178 194
220 241
208 232247 275 294
302
310323336
30 112 148
156
707784 135147 172 317
w
-0.1 57 94 116134 170
76 132
93
75
o trichome -0.2
r
39
-0.3
50 100 150 200 250 300 350
C
Variable
Decluttered
)
(C
PC 10 describes differences between epidermal cells and other areas
but this is not efficient!
Image courtesy of Dr Kat Smart and Prof Chris Grovenor at the University of Oxford
Slide 90
Example 2 PLS-DA
Step 2 PLS-DA factors: The largest co-variance

between the data and the group assignments (0 or 1)
Step 1 Define
epidermal cells
10
20
h t
LV1
r ig LV2 LV3
p y
o
PLS-DA prediction
C
Regression vector
n
Variables/Loadings Plot for data
0.15
Step 3 1
w
39
0.1
0.8
Regression vector:
o
57 170
76 91
r
0.6 0.05 37 94 145
93 109 132 163
Combination of peaks 114129 151 172 199 226 255 278 288 317 326 344
Reg Vector for Y 1

55
3 19 44 65 77 96105119135147 169183200 217 235
191208 243 263 298 316
290 308325
333
C
0 45 112233 53627279 8492 108118 144
154 179
171 209
198
219
230
238 254 274
246 284 300
292 314 349
341
330
0.4 1723323850606774 89 136 164 182195212225 251 257
267 339350
9 16 111126141158
95103 173 214 233 259
276
27 455461 81 8798 123 149 187
197
which best predicts the 2431 131139 177
184 249
)
42 113
71 8697104 157
0.2 -0.05 40 215
differences between
(C
175
0 -0.1
epidermal and other cells -0.2 -0.15 213
-0.4 -0.2
50 100 150 200 250 300 350
Variable
Decluttered
Image courtesy of Dr Kat Smart and Prof Chris Grovenor at the University of Oxford
Slide 91
Classification summary
PCA allows for quick grouping of samples based on their similarities

PC-DFA and PLS-DA are supervised classification methods prior knowledge
about groups are required
10

20
Properly validated classification models are needed for predictions

t
There also exists unsupervised clustering methods, e.g. hierarchal cluster
h
r ig
analysis, K-nearest-neighbours, artificial neural networks..
analysis
p y
dissemination
creation
Co
w n chemical
chemical
information
use
o
design
r
information
organisation
) C retrieval
(C visualisation
management
All these (and much, much more) belong to the

wider field of chemoinformatics!
Slide 92
Contents
1. Introduction
2. Identification
10
20
4. Classification
h t
5. Conclusion
r ig
p y
Co
w n
r o
) C
(C
Slide 93
Data analysis
PLS
Calibration /
PCA, MCR 10
Quantification
20
Identification
h t
known properties?
What chemicals SIMS
SIMS
r ig Can we predict
are on the surface?
Dataset
p
Where are they
located? Co
n
w PC-DFA, PLS-DA
r o
) C Classification
(C Which group does What are the differences

it belong to? between groups?
Slide 94
Conclusion
In this lecture we looked at

Identification, quantification & classification using multivariate analysis
Importance of validation for predictive models
10
Data preprocessing techniques and their effects
20
h t
http://www.npl.co.uk/nanoanalysis/
chemometrics.html
r ig
http://mvsa.nb.uw.edu/
Community website with
Copy of tutorial slides
p y
tutorials, links and software
Draft terminology for ISO18115
MVA vocabulary
Co Developed by Dan Graham and
hosted by NESAC/BIO
w n
r o
Surface and
) CInterface Analysis
Surface Analysis: The
Principal Techniques 2nd
(C
Multivariate Analysis
edition, Chapter 10 The
special issues
application of multivariate
(Volume 41 Issue 2 &
data analysis techniques in
8, Feb/Aug 2009)
surface analysis
Slide 95
Bibliography
General
J. L. S. Lee et al, The application of multivariate data analysis techniques in surface analysis, in Surface Analysis: The
Principal Techniques 2nd edition (eds J C Vickerman, I S Gilmore), Wiley.
E. R. Malinowski, Factor analysis in Chemistry, John Wiley and Sons (2002)

10
S. Wold, Chemometrics; what do we mean with it, and what do we want from it?, Chemom. Intell. Lab. Syst. 30 (1995) 109
P. Geladi et al, Multivariate image analysis, John Wiley and Sons (1996)
20
J. L. S. Lee et al, Quantification and methodology issues in multivariate analysis of ToF-SIMS data for mixed organic systems,
Surf. Interface Anal. 40 (2008) 1
h t
D. J. Graham, NESAC/BIO ToF-SIMS MVA web resource, http://nb.engr.washington.edu/nb-sims-resource/
PCA
r ig
p y
D. J. Graham et al, Information from complexity: challenges of ToF-SIMS data interpretation, Appl. Surf. Sci. 252 (2006) 6860
o
M. R. Keenan et al, Accounting for Poisson noise in the multivariate analysis of ToF-SIMS spectrum images, Surf. Interface
Anal. 36 (2004) 203
40 (2008) 97
n C
M. R. Keenan et al, Mitigating dead-time effects during multivariate analysis of ToF-SIMS spectral images, Surf. Interface Anal.
MCR
ow
Cr
N. B. Gallagher et al, Curve resolution for multivariate images with applications to TOF-SIMS and Raman, Chemom. Intell.
Lab. Syst. 73 (2004) 105
)
J. A. Ohlhausen et al, Multivariate statistical analysis of time-of-flight secondary ion mass spectrometry using AXSIA, Appl.
(C
Surf. Sci. 231-232 (2004) 230
R. Tauler, A. de Juan, MCR-ALS Graphic User Friendly Interface, http://www.ub.edu/mcr/
PLS
P. Geladi et al, Partial Least-Squares Regression: A Tutorial, Analytica Chimica Acta 185 (1986) 1
A. M. C. Davies et al, Back to basics: observing PLS, Spectroscopy Europe 17 (2005) 28
Slide 96

Guide

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Guide

Hochgeladen von

Copyright:

Verfügbare Formate

A Guide to the Practical Use of

Multivariate Analysis in SIMS

Joanna Lee, Ian Gilmore

Crown Copyright 2010

(C Which group does

Chemometrics is the science of relating measurements made on a chemical

From A. M. C. Davies Spectroscopy Europe 10 (1998) 28 Slide 6

Multivariate = More than 1 variable

w n analysis of all the variables

r o Multivariate analysis can

(C large number of variables,

A matrix is simply a rectangular Mass spectrum of Sample 1

Mass spectrum of Sample 1

30 Mass spectrum of Chemical 1

1. Each spectrum can be represented by a vector

(C Which group does

In order to clarify existing terminology and emphasise the

w original data setn

) C variables Eigenvector Component

ISO FDIS 18115-1 Surface chemical analysis Vocabulary

PCA Factor 2 (22%)

Data set of 8 spectra from mixing 3 pure compound spectra

1010 (a) 1. Prior knowledge of system

Eight library spectra:

Perform scree test using the log eigenvalue plot

First PCA factor (PC1):

Second PCA factor (PC2):

Third PCA factor (PC3):

Biplot - PCA factor 1 against PCA factor 2:

Data preprocessing is the

Enhances PCA by bringing out important variance in dataset

A Henderson et al., Surf. Interface

A Henderson et al., Surf. Interface

Subtract mean spectrum from each sample

Divide each spectrum by a

changes in ion intensities

Divide each variable by its standard deviation in the dataset

SIMS data is dominated by Poisson counting noise statistical

Co which vary above the

w n over intense peaks

) C Provides better noise

PC1 Scores (62%)

PC1 Loadings (62%)

SIMS spectra acquired for

N. Xia et al, Langmuir, 18 (2002) 4090 Slide 44

PC2 Scores (19%)

M. Wagner & D. G. Castner, Langmuir, 17 (2001) 4649 Slide 45

Datacube contains a raster of

J L S Lee, I S Gilmore, in Surface Analysis: The Principal

PCA results after Poisson scaling and mean centering

1.5 PC1 loadings

J L S Lee, I S Gilmore, in Surface Analysis: The Principal

72 ion images (out of > 400!)

PCA scores PCA loadings -1000 -0.4

PCA factors are abstract combinations of chemical components and

J L S Lee et al., Surf. Interface Anal. 2009, 41, 653-665 Slide 50

PCA factors are directions that describes variance

MCR solutions are not unique!

Loadings on MCR factor 2

72 ion images (out of > 400!)

MCR scores MCR loadings

Cr 0.6 2000 0.8

J L S Lee et al., Surf. Interface Anal. 2009, 41, 653-665 Slide 63

We take three pictures and assign each with a SIMS spectra