Sie sind auf Seite 1von 50

1

Error structure of spectroscopic data (NIR, FTIR etc)


- and how to deal with them .
Harald Martens and Achim Kohler
Centre for Biospectroscopy and Data Modelling, Nofima Food, s,
Norway
CIGENE Center for Integrative Genetics, University of Life Sciences,
s,
Department of Mathematical Sciences and Technology (IMT), Norwegian
University of Life Sciences, s, Norway
2
DNA mRNA Proteome Metabolome Biological
Structure
Environment, human activity
Data analysis: Integrating different types of bio-data
Look for common variation patterns
Make quantitative prediction and forecasting
Identify outliers
Other
phenotypes
1D-, 2D -
Electrophoresis
MALDI-TOF
LC-MS
GC,LC
(-MS)
Sequencing,
SNP, AFLP,
NIR, FT-IR
Raman
Flourescence
Serotyping
Realtime PCR
Micro-array
My own field:
Measurements and modelling in systems biology
Disease incidence
Virulence
Drug sensitivity
Biofilmformation
Sensory Science
Economy
3
DNA mRNA Proteome Metabolome Biological
Structure
Environment, human activity
Other
phenotypes
1D-, 2D -
Electrophoresis
MALDI-TOF
LC-MS
GC,LC
(-MS)
Sequencing,
SNP, AFLP,
NIR, FT-IR
Raman
Flourescence
Serotyping
Realtime PCR
Micro-array
Now the real fun starts: feed-back !
Disease incidence
Virulence
Drug sensitivity
Biofilmformation
Sensory Science
Economy
High-dimensional dynamic, non-linear ODEs
Spatial PDEs
Possible, since we how are getting relevant and reliable
high-throughput, high-dimensional instrumentation
4
Biospectroscopy
Wavelength ranges:
UV-Vis (<750 nm)
Near Infra Red (NIR) 750-2500 nm,
Fourier Transform Infra Red (FTIR) >2500 nm
Raman Scattering - -
Fluorescence: (mainly <750 nm)
Modes of measurement:
Raman, Fluorescence: Measure the light reaching the detector
Measured signal is 0% at analyte level 0
Analyte measurement
Noisy
UV-Vis, NIR, FTIR: Transmittance and/or reflectance measured
Measured signal is 100% at analyte level 0
Analyte log(1/measurement)
Precise
5
Biospectroscopy
Errors in measurements:
White noise: Random measurement errors
(usually heteroscedastic: higher numbers have higher errors)
Coloured noise: Systematic errors
Several undesired, but unavoidable interferants
From measurement
sample thickness,
temp. effects
From samples
light scattering (simple, complicated)
constituent interactions
Several analytes, with overlapping spectra,
Model-based pre-processing: Identify and correct for systematic errors .
Turn systematic errors into valuable sources of information.
6
Water variations in tissues Mie Scattering
Dispersive artefact
1000 1500 2000 2500 3000 3500
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Wavenumber [cm
-1
]
A
b
s
o
r
p
t
i
o
n
Wavenumber-dependent effects
Baseline shift
Multiplicative effect
Examples for undesired phenomena in FTIR
7
Chemical absorption
Physical contribution
Pre-
Processing
model
Measured spectra
Principle of model-based pre-processing:
Mie Scattering of individual liver cancer cells in Synchrotron FTIR
8
Example: Light microscopy of muscle,
one wavelength in visible range
9
Hyperspectral FTIR microscopy of same sample:
Traditional Chemical imageat the bestwavelength
(1240cm
-1
) - the UNIVARIATE TRADITION!
like playing complex music on a grand piano with one finger at a time
10
Hyperspectral FTIR microscopy of same sample:
Chemical imageat same wavelength after pre-processing
like playing SIMPLE music on a grand piano with one finger at a time
11
Hyperspectral FTIR microscopy of same sample:
Chemical imagefrom pre-processing parameters,
based on all wavelengths
like playing complex music on a grand piano with all fingers and toes (+ nose)
12
Analysing/Visualising estimated parameters/scatter effects
Estimated parameters can be used for making physical images:
b, proportional to the effective
optical path length, is estimated
for each pixel spectrum
Kohler A, Bertrand D, Martens H, Hannesson K, Kirschner K, and Ofstad R (2007) Multivariate image
analysis of a set of FTIR microspectroscopy images of aged bovine tissue combining image and design
information. Analytical and Bioanalytical Chemistry 389, 1143-1153.
13
Pre-processing
Model-based pre-processing: parameterize the problems
Combine knowledge-driven and data-driven modelling
Use linear data models (fast, simple, robust), but use both
additive and multiplicative operators
Complicated non-linear mathematical models replaced by
bilinear, compressed summaries of model behaviour
14
Abstract: Short version, overviewof what
to do to handle various error types:
Error structure of spectroscopic data (NIR, FTIR etc)
www.specmod.org)
Measurements are usually done in order to quantify an analyte a chemical or physical property of the samples analyzed. But measured data
usually reflect several sources of variation, not only the desired chemical or physical property. This is clearly true for spectroscopic
measurements. By proper design of the measurements and proper modelling of the data, it is possible to separate the measured signals into
the various sources of variation.
Of course, different measurement types and different sample types offer different error structures. But most systematic errors have a
surprisingly systematic nature, from a mathematical point of view, and can therefore be discovered and corrected for by the same procedures.
This lecture addresses some psychological, technical, mathematical and statistical issues in how to reduce undesired variations in measured
spectra.
Keywords:
Remember to address also the undesired phenomena the interferants, not only the desired analytes! Usually, this requires multi-channel
profiling.
Increase the number and type of measurement channels in the profile, so that even the interferants can be quantified and subtracted. This may
increase the cost of the measurements a little, but saves you a lot of problems later.
If possible, measure each sample under a set of different conditions this increases the information value of a given instrument type.
Study the raw spectra graphically, in 1-way, 2-way and 3-way plots, to detect unexpected phenomena. Check them also by PCA etc to look for
more hiddenstructures.
Model-based pre-processing should then be applied, building on a combination of approximate causal understanding and empirical
covariations. The purpose is to identify and correct for various signal contribution types: random noise, wavelength shifts, multiplicative
amplifications (e.g. instrument amplification, sample thickness, effective optical path length), additive contributions (analyte and interferant
concentrations), log-additive contributions (stray light) and response non-linearities of various kinds. Interference effects should be removed,
but stored for later studies, - not just filteredaway and lost.
Multivariate calibration models should finally be developed, in order to enhance the selectivity and provide graphical insight into the main
structures in the pre-processed data.
Analyte predictions and multivariate scores are in turn obtained by passing new spectra through the calibration models, and can in turn be
related to other types of information, e.g. from genetics and genomics.
Automatic outlier warnings of various kinds should be used in order to detect anomalies.
Selectivity enhancement by multivariate analysis has been well known for 25 years in Chemometrics. But the field is still developing, as the
lecture will illustrate, e.g. based on high-throughput instrument types (FTIR robotics) and various new uses of the Extended Multiplicative Signal
Correction (EMSC).
15
Notation for model-based pre-processing:
ref = a referencespectrum
z = an input sample spectrum
(EXAMPLE: zz
True
! But z
True
= ref )
m = mean of z,ref (and possibly some others)
Error model: 1) m z
True
2) z= f(m) + random noise
f()=is estimated from input spectra z and m
Error correction: z
Corr
= z
True
= f
-1
(z)
16
0
Spectra z and ref
0
Mean and diff.
0
0
z vs Ref
0
z
corr.
and Ref
0
0
0
0 0
0
0
0
0 0
Wavelength Wavelength Wavelength Absorb.(ref)
A
b
s
o
r
b
a
n
c
e
A
b
s
o
r
b
a
n
c
e
A
b
s
o
r
b
.
(
s
a
m
p
l
e
)
A
b
s
o
r
b
a
n
c
e
ref.
Simple error types; assume z(true)=ref
z =ref +a zc =z a
Input
spectra
Visualization tools Corrected
spectra
17
Simple error types
0
Spectra z and ref
0
Mean and diff.
0
0
z vs Ref
0
z
corr.
and Ref
0
0
0
0 0
0
0
0
0 0
Wavelength Wavelength Wavelength Absorb.(ref)
A
b
s
o
r
b
a
n
c
e
A
b
s
o
r
b
a
n
c
e
A
b
s
o
r
b
.
(
s
a
m
p
l
e
)
A
b
s
o
r
b
a
n
c
e
ref.
z =ref +a
z =ref b
z =ref b +a
z
corr
=z a
z
corr
=z / b
zc =(z a) / b
18
Simple error types
0
Spectra z and ref
0
Mean and diff.
0
0
z vs Ref
0
z
corr.
and Ref
0
0
0
0 0
0
0
0
0 0
Wavelength Wavelength Wavelength Absorb.(ref)
A
b
s
o
r
b
a
n
c
e
A
b
s
o
r
b
a
n
c
e
A
b
s
o
r
b
.
(
s
a
m
p
l
e
)
A
b
s
o
r
b
a
n
c
e
ref.
z =ref +a
z =ref b
z =ref b +a
z
corr
=z a
z
corr
=z / b
z
corr
=(z a) / b
19
Simple error types
0
Spectra z and ref
0
Mean and diff.
0
0
z vs Ref
0
z
corr.
and Ref
0
0
0
0 0
0
0
0
0 0
Wavelength Wavelength Wavelength Absorb.(ref)
A
b
s
o
r
b
a
n
c
e
A
b
s
o
r
b
a
n
c
e
A
b
s
o
r
b
.
(
s
a
m
p
l
e
)
A
b
s
o
r
b
a
n
c
e
ref.
z =ref +a
z =ref b
z =ref b +a
z
corr
=z a
z
corr
=z / b
z
corr
=(z a) / b
Method: Multiplicative Signal Correction (MSC)
or Standard Normal Variates (SNV)
20
Multiplicative Signal Correction and its
extension (EMSC)
Model: z = b m + a +
z
corr
= (z a) / b
MSC:
i.e. z = b (m+ cK
analytes
+ dG
interferants
) + a +
z
corr
= (z a G
interferants
/ b
EMSC:
Model: z = b z
True
+ a +
Regression b, a
Regression b, , , a
Assumption: z
True
= m + cK
analytes
+ dG
interferants
Assumption: z
True
= m
i.e. z = b m + K
analytes
+ G
interferants
+ a +
21
Multiplicative Signal Correction and its
extension (EMSC)
Model: z = b m + a +
z
corr
= (z a) / b
MSC:
i.e. z = b (m+ cK
analytes
+ dG
interferants
) + a +
z
corr
= (z a G
interferants
/ b
EMSC:
Model: z = b z
True
+ a +
Regression b, a
Regression b, , , a
Assumption: z
True
= m + cK
analytes
+ dG
interferants
Assumption: z
True
= m
i.e. z = b m + K
analytes
+ G
interferants
+ a +
22
H.Martens is co-owner of EMSC patent, but academic
use is of course free.
Algorithms for EMSC are available in Matlab Toolbox
etc and in The Unscrambler, for free research use.
23
Example: Model FTIR effects of varying
sample temperature in aquous samples
Input spectra: water at
different temperatures
Simple EMSC
G
interferants
=wavelength
dependent baseline
EMSC with model of
water, K
analytes
and its
temperature effects, G
interferant
Outside instrument range
24
Example: Model FTIR effects of varying
sample temperature in aquous samples
Input spectra: water at
different temperatures
Simple EMSC
G
interferants
=wavelength
dependent baseline
EMSC with model of
water, K
analytes
and its
temperature effects, G
interferant
Outside instrument range
25
Example: Model FTIR effects of varying
sample temperature in aquous samples
Input spectra: water at
different temperatures
Simple EMSC
G
interferants
=wavelength
dependent baseline
EMSC with model of
water, K
analytes
and its
temperature effects, G
interferant
Outside instrument range
26
0 20 40 60 80 100
1.5
2
2.5
3
3.5
Input, EMSC
Z
.MAT
R
e
s
p
o
n
s
e
Channel #
0 20 40 60 80 100
2.4
2.5
2.6
2.7
2.8
Output, DataCase=155, EMSC, opt.an extra Bad spectrum, in addition to input B
R
e
s
p
o
n
s
e
Channel #
0 20 40 60 80 100
-1
-0.5
0
0.5
1
Input, EMSC
Z
.MAT
M
e
a
n
-
C
e
n
t
r
e
d

R
e
s
p
o
n
s
e
Channel #
0 20 40 60 80 100
-0.04
-0.02
0
0.02
0.04
Output, DataCase=155, EMSC, opt.an extra Bad spectrum, in addition to input B
M
e
a
n
-
C
e
n
t
r
e
d

R
e
s
p
o
n
s
e
Channel #
850 1050 nm
Mixtures of
protein and
starch
powders
A
b
s
o
r
b
a
n
c
e
l
o
g
(
1
/
T
)
Example of EMSC:
Pre-processing of
NIR spectra of
powder mixtures
27
-3
0
3
6
-2 -1 0 1 2 3
YGlutenFromXOD,X-expl: 42%,58% Y-expl: 74%,21%
100L
100L
100L
100L 100L
100L
100L
100L
100L 100L
100H 100H
100H
100H
100H
100H 100H
100H 100H
100H
075L
075L
075L
075L
075L
075L
075L
075L
075L
075L
075H
075H
075H
075H
075H
075H
075H
075H
075H
075H
050L
050L
050L
050L
050L
050L
050L
050L
050L
050L
050H
050H
050H
050H
050H
050H
050H
050H
050H
050H
025L
025L
025L
025L
025L
025L 025L
025L
025L
025L
025H
025H
025H
025H
025H
025H
025H
025H
025H
025H
000L
000L
000L
000L
000L
000L
000L
000L
000L
000L
000H 000H
000H
000H
000H 000H
000H
000H
000H
000H
PC1
PC2 Scores
-4
-2
0
2
4
0 20 40 60 80 100
YGlutenFromXOD, (Y-var, PC): (Gluten,5)
X-variables
Regression Coefficients
0
50
100
P
C
_
0
0
P
C
_
0
1
P
C
_
0
2
P
C
_
0
3
P
C
_
0
4
P
C
_
0
5
P
C
_
0
6
P
C
_
0
7
P
C
_
0
8
YGlutenFromXOD, Variable: c.Total v.Total
PCs
Y-variance Explained Variance
0
0.5
1.0
0 0.2 0.4 0.6 0.8 1.0
YGlutenFromXOD, (Y-var, PC): (Gluten,5)
Measured Y
Predicted Y
No preprocessing of log(1/T) spectra:
Standard model output from a multivariate calibration program
(The Unscrambler)
28
0 20 40 60 80 100
1.5
2
2.5
3
3.5
Input, EMSC
Z
.MAT
R
e
s
p
o
n
s
e
Channel #
0 20 40 60 80 100
2.4
2.5
2.6
2.7
2.8
Output, DataCase=155, EMSC, opt.an extra Bad spectrum, in addition to input B
R
e
s
p
o
n
s
e
Channel #
0 20 40 60 80 100
-1
-0.5
0
0.5
1
Input, EMSC
Z
.MAT
M
e
a
n
-
C
e
n
t
r
e
d

R
e
s
p
o
n
s
e
Channel #
0 20 40 60 80 100
-0.04
-0.02
0
0.02
0.04
Output, DataCase=155, EMSC, opt.an extra Bad spectrum, in addition to input B
M
e
a
n
-
C
e
n
t
r
e
d

R
e
s
p
o
n
s
e
Channel #
850 1050 nm
Mixtures of
protein and
starch
powders,
BEFORE
PRE-
PROCESSING
A
b
s
o
r
b
a
n
c
e
l
o
g
(
1
/
T
)
A
b
s
o
r
b
a
n
c
e
l
o
g
(
1
/
T
)
850 1050 nm
Mixtures of protein and
starch powders,
AFTER EMSC PRE-
PROCESSING,
G
interferants
found by
Simplex opt. of
prediction ability
29
More nasty error types
0
z=Ref & nonlin. stray light
0
Mean and diff.
0
0
z vs Ref
0
z
corr.
and Ref
0
0
0
0 0
0
0
0
0
0
Wavelength Wavelength Wavelength Absorb.(ref)
A
b
s
o
r
b
a
n
c
e
A
b
s
o
r
b
a
n
c
e
A
b
s
o
r
b
.
(
s
a
m
p
l
e
)
A
b
s
o
r
b
a
n
c
e
Response
curvature
e.g. stray
light or
detector
saturation
z=f(z
true
)
z
corr
=f
-1
(z)
Sideways
shift
(from
instrument or
sample)
z
corr
=f
-1
(z)
Random
noise,
hetero-
scedastic
z
corr
=filt(z)
Method: Non-linear parameter estimation or
Extended Multiplicative Signal Correction (EMSC)
30
0
z=Ref & nonlin. stray light
0
Mean and diff.
0
0
z vs Ref
0
z
corr.
and Ref
0
0
0
0 0
0
0
0
0
0
Wavelength Wavelength Wavelength Absorb.(ref)
A
b
s
o
r
b
a
n
c
e
A
b
s
o
r
b
a
n
c
e
A
b
s
o
r
b
.
(
s
a
m
p
l
e
)
A
b
s
o
r
b
a
n
c
e
Response
curvature
e.g. stray
light or
detector
saturation
z=f(z
true
)
z
corr
=f
-1
(z)
Sideways
shift
(from
instrument or
sample)
z
corr
=f
-1
(z)
Random
noise,
hetero-
scedastic
z
corr
=filt(z)
Method: Non-linear parameter estimation or
Extended Multiplicative Signal Correction (EMSC)
More nasty error types
31
0
z=Ref & nonlin. stray light
0
Mean and diff.
0
0
z vs Ref
0
z
corr.
and Ref
0
0
0
0 0
0
0
0
0
0
Wavelength Wavelength Wavelength Absorb.(ref)
A
b
s
o
r
b
a
n
c
e
A
b
s
o
r
b
a
n
c
e
A
b
s
o
r
b
.
(
s
a
m
p
l
e
)
A
b
s
o
r
b
a
n
c
e
Response
curvature
e.g. stray
light or
detector
saturation
z=f(z
true
)
z
corr
=f
-1
(z)
Sideways
shift
(from
instrument or
sample)
z
corr
=f
-1
(z)
Random
noise,
hetero-
scedastic
z
corr
=filt(z)
Method: Non-linear parameter estimation or
Extended Multiplicative Signal Correction (EMSC)
More nasty error types
32
1000 2000 3000
0
0.2
0.4
0.6
0.8
Wavenumber [cm
-1
]
A
b
s
o
r
b
a
n
c
e
0 0.2 0.4 0.6
0
0.2
0.4
0.6
0.8
Absorbance
A
b
s
o
r
b
a
n
c
e
01
1000 2000 3000
0
0.2
0.4
0.6
Wavenumber [cm
-1
]
A
b
s
o
r
b
a
n
c
e
0 0.2 0.4 0.6
0
0.2
0.4
0.6
Absorbance
A
b
s
o
r
b
a
n
c
e
01
Estimating baseline and multiplicative effect and pre-processing
Raw spectra MSC/EMSC (basic)
Raw spectra vs. mean Corrected spectra vs. mean
33
1000 2000 3000
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Wavenumber [cm
-1
]
A
b
s
o
r
b
a
n
c
e
0 0.1 0.2 0.3 0.4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Absorbance
A
b
s
o
r
b
a
n
c
e
07
1000 2000 3000
0
0.1
0.2
0.3
0.4
0.5
Wavenumber [cm
-1
]
A
b
s
o
r
b
a
n
c
e
0 0.1 0.2 0.3 0.4
0
0.1
0.2
0.3
0.4
0.5
Absorbance
A
b
s
o
r
b
a
n
c
e
07
1000 2000 3000
0
0.1
0.2
0.3
0.4
0.5
Wavenumber [cm
-1
]
A
b
s
o
r
b
a
n
c
e
0 0.2 0.4
0
0.1
0.2
0.3
0.4
0.5
Absorbance
A
b
s
o
r
b
a
n
c
e
07
Examples for EMSC replicate correction (Ed Stark)
Raw EMSC (basic) EMSC rep.
34
1000 2000 3000
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Wavenumber [cm
-1
]
A
b
s
o
r
b
a
n
c
e
0 0.1 0.2 0.3 0.4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Absorbance
A
b
s
o
r
b
a
n
c
e
07
1000 2000 3000
0
0.1
0.2
0.3
0.4
0.5
Wavenumber [cm
-1
]
A
b
s
o
r
b
a
n
c
e
0 0.1 0.2 0.3 0.4
0
0.1
0.2
0.3
0.4
0.5
Absorbance
A
b
s
o
r
b
a
n
c
e
07
1000 2000 3000
0
0.1
0.2
0.3
0.4
0.5
Wavenumber [cm
-1
]
A
b
s
o
r
b
a
n
c
e
0 0.2 0.4
0
0.1
0.2
0.3
0.4
0.5
Absorbance
A
b
s
o
r
b
a
n
c
e
07
Examples for EMSC replicate correction (Ed Stark)
Raw EMSC (basic) EMSC rep.
35
1000 2000 3000
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Wavenumber [cm
-1
]
A
b
s
o
r
b
a
n
c
e
0 0.1 0.2 0.3 0.4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Absorbance
A
b
s
o
r
b
a
n
c
e
07
1000 2000 3000
0
0.1
0.2
0.3
0.4
0.5
Wavenumber [cm
-1
]
A
b
s
o
r
b
a
n
c
e
0 0.1 0.2 0.3 0.4
0
0.1
0.2
0.3
0.4
0.5
Absorbance
A
b
s
o
r
b
a
n
c
e
07
1000 2000 3000
0
0.1
0.2
0.3
0.4
0.5
Wavenumber [cm
-1
]
A
b
s
o
r
b
a
n
c
e
0 0.2 0.4
0
0.1
0.2
0.3
0.4
0.5
Absorbance
A
b
s
o
r
b
a
n
c
e
07
Examples for EMSC replicate correction (Ed Stark)
Raw EMSC (basic) EMSC rep.
36
1000 2000 3000
0
0.2
0.4
0.6
0.8
1
Wavenumber [cm
-1
]
A
b
s
o
r
b
a
n
c
e
1000 2000 3000
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
Wavenumber [cm
-1
]
A
b
s
o
r
b
a
n
c
e
1000 2000 3000
-0.2
0
0.2
0.4
Wavenumber [cm
-1
]
A
b
s
o
r
b
a
n
c
e
1000 2000 3000
-0.2
-0.15
-0.1
-0.05
0
0.05
Wavenumber [cm
-1
]
A
b
s
o
r
b
a
n
c
e
1000 2000 3000
0
0.1
0.2
0.3
0.4
0.5
0.6
Wavenumber [cm
-1
]
A
b
s
o
r
b
a
n
c
e
1000 2000 3000
-0.04
-0.02
0
0.02
0.04
Wavenumber [cm
-1
]
A
b
s
o
r
b
a
n
c
e
Examples for EMSC replicate correction
Kohler A, Bcker U, Warringer J , Blomberg A, Omholt SW, Stark E, Martens H (2008) Reducing inter-replicate
variation in FTIR spectrosocopy by extended multiplicative signal correction (EMSC). Applied Spectroscopy.
Raw EMSC (basic) EMSC rep.
37
How to obtain more advanced pre-
processing models
1. By estimating unwanted variation from the data itself
2. By estimating unwanted variation from mathematical
models about known scatter effects, instrumental
information etc.
But how to mix complicated mathematical models and simple,
linear pre-processing models?
Solution, e.g. for Mie light scattering ( lense effects ) of
individual cells in synchrotron FTIR microscopy
38
Estimating Mie scattering
Theory
EMSC
subspace
model
Kohler A, Sul-Suso J , SockalingumGD, Tobin M, Bahrami F, Yang Y, Pijanka J , Dumas P, Cotte M, Martens H
(2008) Estimating and correcting Mie scattering in synchrotron based microscopic FTIR spectra by extended
multiplicative signal correction (EMSC). Applied Spectroscopy , 62, 259-266.
Corrected spectra
Mie scattering
39
Chemical absorption
Physical contribution
Pre-
Processing
model
Measured spectra
Using Mie scattering model for new samples
40
0
z=Ref & nonlin. stray light
0
Mean and diff.
0
0
z vs Ref
0
z
corr.
and Ref
0
0
0
0 0
0
0
0
0
0
Wavelength Wavelength Wavelength Absorb.(ref)
A
b
s
o
r
b
a
n
c
e
A
b
s
o
r
b
a
n
c
e
A
b
s
o
r
b
.
(
s
a
m
p
l
e
)
A
b
s
o
r
b
a
n
c
e
Response
curvature
e.g. stray
light or
detector
saturation
z=f(z
true
)
z
corr
=f
-1
(z)
Sideways
shift
(from
instrument or
sample)
z
corr
=f
-1
(z)
Random
noise,
hetero-
scedastic
z
corr
=filt(z)
More nasty error types
41
Milk FTIR spectra
-1
0
1
2
3993.03 3649.668 3306.306 2962.944 2619.582 2276.22 1932.858 1589.496 1246.134
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 5
Variables
-0.2
0
0.2
0.4
0.6
3055.536 2839.488 2623.44 2407.392 2098.752 1882.704 1539.342 1323.294 1107.246
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 5
Variables
-0.5
0
0.5
1.0
1.5
2.0
3993.03 3649.668 3306.306 2962.944 2619.582 2276.22 1932.858 1589.496 1246.134
1T 2T 3T 4T 5T 6T 7T 8T 9T 10T 11T 12T 13T 14T 15T 16T 17T 18T 19T 39T 40T 41T 42T 43T 45T 46T 47T 48T 49T 52T 53T 54T 55T 56T 57T 58T 59T
Variables
Dried samples,
lab instrument
Wet samples Minus water,
routine instrument
Useful spectrum
wet samples-water,
routine instrument
42
-0.02
-0.01
0
0.01
3055.536 2839.488 2623.44 2407.392 2098.752 1882.704 1539.342 1323.294 1107.246
Variables
Other components
Cal. models
Wavenumber Wavenumber
Milk FTIR spectra:
and functional
genomics for optimized milk and meat quality
6 million milk spectra/year
Calibration milk samples
Reference
measurements,
fatty acids (GC-MS)
Feeding experiments:
Pred. fatty acids etc
Routine milk analysis:
Background knowledge
QTLs etc ?
20K SNPs
Large-scale FTIR-bioscreening project in Norway
Heritability,
feeding effects etc
Cal. models
FA
Combinations
43
Estimated effect on human total cholesterol level
(assuming 20% of energy intake from milk fat)
0 5 10 15 20 25
0
0.01
0.02
0.03
0.04
0.05
EstCholesterol : RMSECV
Comp no.
R
M
S
E
C
V
0 5 10 15 20 25
0
0.2
0.4
0.6
0.8
1
R
2
cv
, R2AOpt=0.81913
Comp no.
R
2
0 10 20 30
0
0.1
0.2
0.3
0.4
0.5
PC #
E
x
p
o
n
e
n
t
PPLS exponents
1000 2000 3000 4000 5000
-200
-100
0
100
200
EstCholesterol :Regression coeffs ppls
0.4 0.5 0.6 0.7
0.4
0.45
0.5
0.55
0.6
0.65
0.7
Min.MSECV: Fit(r) and cv(co,R
2
=0.81913)
Y
H
a
t
,
A
M
i
n
C
V
=
2
4
0.4 0.5 0.6 0.7
0.4
0.45
0.5
0.55
0.6
0.65
0.7
AOpt: Fit(r) and cv(go,R
2
=0.81913)
Y
H
a
t
,
A
O
p
t
=
2
4
Prediction error RMSEP
CV
Prediction ability R
2
CV
Wavenumber Analyte conc.
R
e
g
r
e
s
s
i
o
n
c
o
e
f
f
i
c
i
e
n
t
s
R
2
CV
= 0.82
PLSR model rank PLSR model rank
A
n
a
l
y
t
e
p
r
e
d
.




f
i
t
,

c
r
o
s
s
-
v
a
l
(
C
V
)
44
DNA mRNA Proteome Metabolome Biological
Structure
Environment, human activity
Other
phenotypes
1D-, 2D -
Electrophoresis
MALDI-TOF
LC-MS
GC,LC
(-MS)
Sequencing,
SNP, AFLP,
NIR, FT-IR
Raman
Flourescence
Serotyping
Realtime PCR
Micro-array
Now the real fun starts: feed-back !
Disease incidence
Virulence
Drug sensitivity
Biofilmformation
Sensory Science
Economy
Models: Dynamic, non-linear ODEs
Spatial PDEs
Different feedback control (Jacobi matr.) in different parts of
state space
10000-dimensional input data
Eigenvalues vs singular values of the Jacobi matr.
Identify outliers
45
1000 1100 1200 1300 1400 1500 1600
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Input spectra
Wavenumber
A
b
s
o
r
b
a
n
c
e
Wavenumber of the FTIR light
F
T
I
R

l
i
g
h
t
a
b
s
o
r
b
a
n
c
e
Monitoring dynamic processes
by biospectroscopy
A fermentation process in dairy industry
monitored by FTIR (ATR) for 26 hours
46
-0.02
0
0.02
0.04
0.06
0.08
0.1
0.12
-0.05
-0.04
-0.03
-0.02
-0.01
0
0.01
0.02
0
0.01
0.02
0.03
PC 1,
89.6 % variance
PC 2,
8.7 % variance
P
C

3
,

0
.
9

%

v
a
r
i
a
n
c
e
k
5
k
3
k
4
k
2
k
1
t = 0
6 hrs
19 hrs
21.5 hrs
26 hrs
Three first principal component scores
47
Semi-soft modelling of the process
s
2
-
s
1
s
3
-
s
2
s
4
-
s
3
State fingerprints State amounts
Wavenumber, cm
-1
Time, hrs
c
1
c
2
c
3
c
4
c
5
1000 1100 1200 1300 1400 1500 1600
0
0.05
0.1
0.15
0 5 10 15 20 25
0
0.5
1
1000 1100 1200 1300 1400 1500 1600
-8
-6
-4
-2
0
2
x 10
-3
0 5 10 15 20 25
0
0.5
1
1000 1100 1200 1300 1400 1500 1600
-5
0
5
10
x 10
-3
0 5 10 15 20 25
0
0.5
1
1000 1100 1200 1300 1400 1500 1600
0
0.02
0 5 10 15 20 25
0
0.5
1
1000 1100 1200 1300 1400 1500 1600
-2
0
2
4
6
x 10
-3
0 5 10 15 20 25
0
0.5
1
s
1
-0.02
s
5
-
s
4
48
Non-linear dynamic model identification
My other activity in CIGENE:
Cell differentiation model: computer simulation, sensory analysis of
mathematical solutions
The Physiome Project: human heart
Individual heart muscle cell, 36 state variables, 72 param.
Sets of adjacent, interacting cells
Assessing large non-linear dynamic models too complex for theory
Nominal-level (Leiden-school!) PLSR of rates vs states
Study local J acobians and their eigenvalues vs singular values
Represent /replace a mathematical form by its behavioural
repertoire, by exhaustive simulation(factorial designs to chosen
resolution), in compressed Data Base.
49
Conclusions
Many error-types are in fact sources of valuable information.
Model-based pre-processing: identify, quantify and separate out
systematic error-types.
Model-based pre-processing in biospectroscopy requires an
understanding of the different errorsthat create the unwanted
variation.
As usual:
It is better to be approximately right than precisely wrong
It is better to be aggressive/humble, than to be passive/arrogant
.
50
Acknowledgements
People who contributed:
Centre for Integrative Genetics (CIGENE), Norw. U. Life Sci. :
Stig Omholt, Erik Plahte, Arne Gjuvsland, Sigbjrn Lien,
Hanne Gro Olsen, shild Randby
NOFIMA /Matforsk:
AchimKohler, Ulrike Bdtker,Nils Kristian Afseth,Martin Hy
TINE: Kjetil J rgensen
GENO: Morten Svendsen

Das könnte Ihnen auch gefallen