Beruflich Dokumente
Kultur Dokumente
An integrated concept for post-acquisition spectrum analysis was and also may influence detection limits. Note that spectra of
developed for in-line (real-time) and off-line applications that preserves complex specimens are often evaluated using, e.g., principal
absolute spectral quantification; after the initializing parameter setup, component analysis (PCA)/chemometrics instead of peak
only minimal user intervention is required. This spectral evaluation suite fitting; the aim then is classification rather than precise
is composed of a sequence of tasks specifically addressing cosmic ray
quantification.
removal, background subtraction, and peak analysis and fitting, together
with the treatment of two-dimensional charge-coupled device array data.
Most modern data analysis software packages, either
One may use any of the individual steps on their own, or may exclude commercial or user-written, incorporate some or all the
steps from the chain if so desired. For the background treatment, the aforementioned post-acquisition functionality. But, normally
canonical rolling-circle filter (RCF) algorithm was adopted, but it was the full analysis chain, i.e., acquisition ! spectra correction !
coupled with a SavitzkyGolay filtering step on the locus-array generated data analysis, is only semiautomatic (i.e., some or all parts
from a single RCF pass. This novel only-two-parameter procedure vastly require human input, and one often encounters strong model
improves on the RCFs deficiency to overestimate the baseline level in dependence in the chosen correction routines). Furthermore,
spectra with broad peak features. The peak analysis routine developed they are often black-box implementations, and the sophis-
here is an only-two-parameter (amplitude and position) fitting algorithm tication of individual procedures can vary significantly,
that relies on numerical line shape profiles rather than on analytical
depending on which user community is targeted. This variation
functions. The overall analysis chain was programmed in National
becomes problematic in situations (i) where near real-time
Instruments LabVIEW; this software allows for easy incorporation of
this spectrum analysis suite into any LabVIEW-managed instrument
(feedback) response is required to maintain stable operational
control, data-acquisition environment, or both. The strength of the conditions, and (ii) when experiments run over extended
individual tasks and the integrated program sequence are demonstrated periods of time (weeks, months or even years), and thus
for the analysis of a wide range of (although not necessarily limited to) perpetual user attention is often out of the question.
Raman spectra of varying complexity and exhibiting nonanalytical line Specific in-line experiments that may be seen as exemplary
profiles. In comparison to other analysis algorithms and functions, our for being affected by all the above-mentioned effects and
new approach for background subtraction, peak analysis, and fitting caveats are encountered in the International Thermonuclear
returned vastly improved quantitative results, even for hidden details in Experimental Reactor (ITER) project and in the Karlsruhe
the spectra, in particular, for nonanalytical line profiles. All software is Tritium Neutrino (KATRIN) experiment. For example, the
available for download.
KATRIN experiment1 is set up to measure the neutrino mass
Index Headings: Automated baseline subtraction; Cosmic ray removal; by means of analyzing the electron-energy spectra associated
Peak analysis; Quantitative spectroscopic analysis; Raman spectroscopy; with the b-decay of tritium, bound to run nearly continuously
Rolling-circle filter. over a period of three to five years. Thus automation of any
monitoring and control mechanisms during the gathering of
spectral information is paramount. Besides the recording of the
b-decay spectrum itself, knowledge of the temporal stability
INTRODUCTION and chemical composition of the tritium gas feed from the inner
Analytical (laser) spectroscopy techniques have become tritium loop into the source tube of KATRIN is of ultimate
commonplace in a wide range of applications in, e.g., analytical importance.2 Thus the amount and the purity of the tritium gas
chemistry, chemical processing, biomedical research, and injection have to be monitored continuously and can be
realized by using a laser Raman system.3 The Raman systems
nanotechnology. Absolute and relative quantifiable results as
requirements for data treatment and analysis procedures are
well as chemometric classification, based on the recorded
rather demanding:
spectra, are of vital importance in all of these fields. Generally,
(i) The operation and analysis procedure needs to be
spectral responses are treated numerically during post-acquisi-
automated, unstaffed, able to run for 60 days nonstop, and
tion analysis, namely, to (i) remove background, (ii) eliminate
provide real-time feedback of the gas composition (specifically
spurious cosmic ray events, (iii) fit spectral line profiles, or a
T2, DT, and HT) to the KATRIN run-control.
combination. If not applied with care, each of these measures
(ii) Acquisition and analysis time need to be minimized to
may affect the quantitative information contained in the spectra
enable fast feedback of the monitored gas composition.
(iii) The extracted Raman line intensities have to be free
Received 20 June 2012; accepted 18 March 2013 from systematic shifts to provide reliable and quantifiable
* Author to whom correspondence should be sent. E-mail: h.h.telle@
swansea.ac.uk. results, and the precision of the analysis output has to be of the
DOI: 10.1366/12-06766 order 0.1% or better.
0003-7028/13/6708-0949/0
Volume 67, Number 8, 2013 2013 Society for Applied Spectroscopy APPLIED SPECTROSCOPY 949
These requirements lead to some specific problems encoun- It is worth noting that by no means are we the first to attempt
tered in the evaluation of recorded (Raman) spectra. For to assemble a suite for the evaluation of Raman spectra. Since
example, the baseline in the spectra may change nonlinearly Raman spectroscopy has become a widely accepted analytical
with time, and transient changes in the gas composition can method, nearly all manufacturers of related instrumentation
occur. Both changes affect the sensitivity of KATRIN or, for provide software packages that incorporate data acquisition,
that matter, any other experiments experiencing such baseline pre-evaluation data treatment, and final spectrum evaluation. In
variations. The former change is mainly encountered in long- addition, quite a few research groups have attempted to tailor
term operations, caused by, e.g., temperature effects, the certain aspects to specific needs in their work (e.g., Reisner et
generation of color centers in the Raman cell window, (trace) al.9 and Vicarra Rossel10). Three aspects of all approaches
generation of chemical reaction products,4 or a combination; seem to be common: (i) usually, the data evaluation is geared
the latter change is due to system-specific injection of fresh toward chemometrics, i.e., sort-of pattern-recognition
gases, and retraction of waste. against library data; exact quantification is often of lesser
Any post-acquisition analysis still has to work fast and interest; (ii) because the chemometric aspect is the key in most
reliably under the aforementioned circumstances, and it should of the published works, detailed information on the pre-
incorporate subtasks to deal with background subtraction, evaluation procedures is normally sparse (e.g., it is not always
cosmic ray removal, and quantitative spectral line evaluation. clear how well the particular, selected procedure would suit
This analysis should happen with as little user intervention as quantification, as we require in our research); and (iii) data
possible and should be fast enough to provide the desired near acquisition and evaluation are normally sequential.
real-time response. The latter response should not be too
problematic for KATRIN Raman monitoring, where response DATA ANALYSIS METHODS
times of approximately 60100 s are specified. However, in the The overall analysis procedure described here is composed
context of the ITER project, the requirements are much more of a sequence of individual steps, each associated with its own
challenging, since real-time process control with response LabVIEW subroutine (subVI); these subroutines also can be
times of less than or equal to 1 s are required. used on their own, in principle. The schematic flow chart for
Furthermore, it should be possible to seamlessly link the this concept is shown in Fig. 1; the individual routines are
evaluation procedure to any data acquisition process, ideally to described in the sections below, in the sequence as they are
run in parallel to an acquisition to provide near real-time executed in the overall program chain.
analysis; but which, conversely, is equally applicable to other For on-line applications, all steps are fully incorporated
off-line evaluation of spectra. Also, it would be advanta- into a program flow and require only minimal user intervention
geous if such an integrated routine were suitable for other during the initial set-up; for off-line applications, the sequence
experimental situations in which spectra are generated in need shown in Fig. 1 is overlaid with a graphical front end. By
of similar data treatment. setting option switches in the program flow, individual steps in
For example, recently depolarization measurements for the the sequence may be skipped, should they not be required for a
Q-branch Raman lines of the hydrogen isotopologues and other particular spectrum analysis. It should be noted that LabVIEW
molecules have been carried out by our group.5 These stands out when seamless integration of instrument control,
measurements may require very long acquisition times of up data acquisition, and signal analysis is desired. A similar
to the order of 1000 s, rather than the few seconds in the LabVIEW approach, albeit for the analysis of biochemical
aforementioned dynamic KATRIN and ITER response de- samples and chemometric evaluation, has been described by
mands, to gather the related spectra of huge intensity Vicarra Rossel.10 Of course, all individual tasks may be
differences with the necessary high signal-to-noise ratio. Such programmed differently; thus, the underlying generic algo-
long signal integration times lead to a large number of cosmic rithms for each are summarized in a supplemental material
ray events captured by the charge-coupled device (CCD) document wherein we also provide download options for the
detector during an acquisition, and the cosmic ray removal task documented programs.
would need to be able to deal with this outcome. To provide Cosmic Ray Removal. In any spectrum recorded by photon
accurate depolarization ratios for individual rotational Raman detectors, cosmic ray events are encountered on a frequent but
lineseven those with weak intensity or overlapping with each random basis. For CCD array detectors, they manifest
otherthe precision in the peak evaluation would need to be themselves as (mostly) single-pixel responses where the
even higher than for the monitoring and control tasks in particular pixel carries a far greater intensity in comparison
KATRIN, ITER, or other experiments. with that of neighboring pixels. For accurate analysis of such
The integrated spectrum analysis procedure described here affected spectra, the cosmic-ray events need to be removed.
evolves from cosmic ray removal via astigmatism correction Numerous techniques and algorithms exist that can be
and background subtraction to peak intensity extraction. The implemented for cosmic ray removal from one-dimensional
interplay of individual spectra treatment steps and the success (1D) or two-dimensional (2D) spectral recordings (e.g.,
of the overall concept are presented for some selected Home,11 Kelson,12 Li and Dai,13 and Mozharov et al.14). Since
examples. It should be noted that parts of the analysis in the work described here we only deal with 1D spectrum
procedure described in this publication have been successfully traces, we do not elaborate further on 2D methodology (for a
applied to KATRIN-related measurements2,6 and other Raman brief summary on the latter, see Section 1 of the supplemental
experiments.7,8 Once operating parameters are set, it constitutes material).
a fully automated procedure for post-acquisition data treatment Having sets of spectra recorded over time, like in our case
of (Raman) spectra and analysis of spectral line peaks. The (large sets of spectra are recorded during KATRIN runs and
necessity for model input has been reduced to the lowest level during off-line control measurements), the least complicated
possible while maintaining full control on the methods. and very efficient method for identifying and eliminating
the filter rolls into the peak and hence would falsify the actual
baseline level. The application of the SCARF routine with the
same width parameter improves this but cannot fully
compensate. In fact, one may even complicate matters: due
to the sharp edges of the RCF (20) signal at the position of the
peak, the SCARF routine introduces a negative-going second-
derivative shape.
The implication is that the underlying RCF algorithm has to
start with a sensible value that needs to account for the width
at the base of any peak of interest for quantitative analysis. It FIG. 4. Background removal test for a Raman spectrum of N2, overlaid with
is also clear that in that case even RCF on its own results in shaped background light; traces are offset by 1500 units, consecutively from
deviations of the order of noise median. However, it should be top to bottom. The traces are annotated with the respective filter actions used,
RCF(r) and SCARF(r,s).
noted that this only holds true for nearly flat background
levels. Finally, a nearly perfect background removal and
baseline correction function is achieved when applying the RCF-routine, RCF(r), with two different radii, r = 20 and r =
SCARF routine twice, with staggered number of side points 60; our SCARF-routine SCARF(r,s) with (r = 60, s = 60); and
included. Note that in this repetitive application the second the repetitive SCARF-routine with varied SavitzkyGolay
SCARF-passage acts on a modified data set, namely, the parameters (r = 60, s = 240) followed by (r = 60, s = 120).
original (raw) spectral data from which the background Clearly, in the standard RCF routine, the filter has rolled into
estimate of the first SCARF-run has been subtracted. Note the spectral lines for RCF with r = 20 pixels; but even for r =
also that in general the parameters for the first SCARF run are 60 pixels that is much wider than the narrow S1- and O1-branch
set to remove an overall offset and slowly varying slope Raman lines, hints of roll-in are evident. This results in loss of
features and that the in the second run (on the modified data quantifiable spectral information because incorrect amounts of
set) aims at dealing with rapidly varying background features
background intensity are subtracted. Although this may not be
(see the example in the supplemental material for a
seen as critical for the large-intensity feature around 2330 cm1
demonstration example); in general, this means for the related
(the capped Q1-branch of 14N2 in Fig. 4), where the lost
SG parameter that srun#1 . srun#2.
It is also worth noting that repetitive application of RCF with amount accounts for roughly 102, or less for the wider circle
appropriately selected parameters may improve the overall radius, the low-intensity feature near 2285 cm1 (Q1-branch of
14 15
background subtraction. However, since RCF always acts on N N) suffers a loss of as much as 2030%. Therefore, for
the dataother than SCARF that incorporates coupled residual the latter, one would have extracted incorrect values from the
smoothing before subtraction of the background estimate spectra, and reliable quantification is most likely lost.
there will always be a noticeable effect on the peak data, which As in the synthetic spectrum case discussed above, applying
may affect exact quantification. the SCARF routine has nearly eliminated the problem of roll-
The same procedure as mentioned above was applied to a in. However, with the larger radii parameters r = 60 and
real N2-Raman spectrum that was, in addition, superimposed s = 60, the background trace now does not follow the
with a nonlinear background contribution (here, e.g., light from background curvature correctly; consequently, the actual
a 605 nm light-emitting diode). In Fig. 4, the results from a background is not yet completely removed. Full background
selected range of filter functions on this spectrum are shown. compensation is finally achieved by applying SCARF a second
The original spectral data traces are overlaid with the derived time on the adjusted spectrum from the first run (see the lowest
background functions, namely, the circle loci for the normal trace in Fig. 4). The actual quantitative data for the abundance
Nominal peak amplitude Apeak (counts) 100 1000 10 000 100 000
Peak width at base (pixel) 26 48 62 77
(a) RCF (20) 67 6 12 829 6 17 8590 6 30 86 413 6 35
(b) SCARF (20,20) 76 6 13 899 6 27 9350 6 67 94 218 6 98
(c) SCARF (20,80) (20,40) 82 6 13 916 6 26 9574 6 58 96 430 6 87
(d) RCF (60) 83 6 10 980 6 15 9963 6 22 99 940 6 25
(e) SCARF (60,60) 89 6 10 984 6 14 9985 6 20 99 968 6 22
(f) 23SCARF (60,240) (60,120) 98 6 8 993 6 12 9991 6 15 99 990 6 15
of N2 isotopologues, based on the data in Fig. 4, are included in minimization problems rises by four for each peak (position,
Table II. intensity, FWHM, and GaussLorentz fraction).
Finally, for quantitative analysis of spectral lines via peak-fit The method we have used throughout in this work is a
routines, a flat background with the noise oscillating around routine we named ShapeFit. The method can be applied
zero is required. However, the application of RCF/SCARF for without restriction, if the line shape of all spectral lines of
background removal as is exhibits a minor deficiency. Since interest is the same. This is the case if (i) the line width is
the circle always rolls below all data points, this slightly offsets limited by the slit width of the spectrometer, optical fiber, or
any noise baseline to a minute value above zero, namely, about both; or (ii) the spectral line width (i.e., natural line width plus
the half-width of the peak-to-peak noise fluctuations. This broadening) of all lines is the same. Note that if neither of these
noise-median value for shifting the SCARF-treated spectrum to conditions is met, the procedure becomes more complex, since
a nominal zero-level can be obtained semiautomatically. For multiple peak shapes and their relative intensity weighting are
this, one simply calculates the median of a (reasonably) flat, involved; this generalized case is not treated here, although we
noise-only region of the spectrum. Note that the flattest have already programmed a suitable routine.
background, most suitable for the determination of the noise In the first step of the routine, a line is selected from the
median, is achieved by tuning the radius of the circle in the spectrum that has sufficient intensity and stands isolated (no
SCARF routine to rather small values; however, other than for convolution with other lines); for example, this could be a line
the (once or occasional) determination of the noise median, and from a spectral calibration lamp. The shape of this peak is
the SCARF radius and the side-point ratio needs to be run at stored pixel by pixel in an auxiliary data array.
the task-optimum. In the second step, all lines within a spectrum are fitted,
Analysis of Peak Intensities. The last step in the post- using the previously determined digital peak-shape function
acquisition data treatment is the extraction of peak intensities and multiplying it by the appropriate amplitude factor. The
(or areas). Two types of techniques have been considered for process incorporates the LevenbergMarquart algorithm23 in
the determination of intensities (or areas). which amplitude and the center-position of each peak are
The first technique uses integration by a simple summing of treated as fitting parameters. Subpixel translation of the peak
the intensity of all pixels within a peak profile, arriving at a positions is enabled by interpolation of the numerical peak
peak-area value. However, this method is unsuitable if one shape. The baseline of the spectrum can be either fixed, or
wishes to separate overlapping lines. Fitting combinations of added as a fitting parameter. Since the peak shape is stored
Voigt profiles21,22 or other line shape functions may in general, numerically, very complicated and odd shapes can be used,
treat convoluted lines. This approach has some notable whereas the number of free parameters per peak is still only
drawbacks. First, the spectral line emission traverses through two (position and amplitude). The principal details of the
an imaging system that normally contains line shape- routine algorithm are summarized in the supplemental material
influencing components, like small-core optical fibers, the section.
spectrometer entrance slit, the grating and the CCD-detector Particular examples for the application of ShapeFit to
pixel structure. As a consequence, the original line shape measured Raman spectra are discussed in the Results section.
normally deviates from pure Voigt profile functions that are There, it will be seen that despite strong convolution of some
frequently used in conjunction with quantum-dominated peaks and the occasional use of intentionally bad line shapes,
spectral lines. Second, the number of free parameters in the fitting could be applied successfully.
TABLE II. Isotopologue abundance for O2 and N2, extracted from Raman spectra, after application of RCF or SCARF in conjunction with ShapeFit.
Isotopologue ratio RCF (20) 3 103 RCF (60) 3 103 SCARF (60,240) (60,120) 3 103 Theoreticala 3 103
16
O17O/16O2 (b)
0.51 6 0.08 0.63 6 0.09 0.75 6 0.06 0.76 6 0.03
16
O18O/16O2 3.54 6 0.20 3.81 6 0.18 4.02 6 0.15 4.02 6 0.14
14 15 14
N N/ N2 5.70 6 0.22 6.93 6 0.21 7.27 6 0.16 7.35 6 0.20
a
Stochastic distribution data for isotopologues for atmospheric molecules27; errors estimated from isotope abundance data.26
b
Data for 16O17O corrected for hot-band contribution from 16O2.
subtraction, as required for the extraction of the minor O2 and improve on the quality of relevant, quantitative results. These
N2 isotopologues. procedures include, amongst other routines: cosmic ray
In contrast, using our ShapeFit routine based on the removal, specifically; background subtraction; and peak
numerical representation of the profile, one is able to fit each analysis and fitting.
of the peaks and extract the exact convoluted intensities, The individual spectra-treatment algorithms were selected to
despite the very odd line shape. The residuals for the line only minimally, or not at all, affect the quantitative integrity of
profile fits are of the order 103 to 104, being close to the shot- any spectrum. This spectral evaluation suite is composed of a
noise limit in the measurements. The apparently lower sequence of the aforementioned tasks, and further steps.
residuals are due to the fact that the overall signal amplitudes However, individual steps may be easily used on their own if
are four times lower than for the optimum-imaging case, so desired, or skipped if not required. Having all steps fully
depicted in Fig. 6a; note that the overall integrated line incorporated into a single program flow that does require only
intensities are nearly equal in both cases. minimal user intervention after the initializing parameter setup
The example shown in Fig. 6 demonstrates the improvement is not only paramount in on-line applications but also desirable
our ShapeFit approach can have over standard, analytical- for the majority of (routine) off-line applications. In our
function fitting routines. The relative line amplitudes obtained approach, we adapted and expanded some of the most elegant
for the two line shape cases and the different fitting approaches ideas reported in the literature to arrive at solutions with low
are collected in Table III, and the experimental values are coding effort and as little intervention as possible after
compared to theoretically derived amplitudes, based on the training the procedure for specific, repetitive spectrum types.
Boltzmann distribution function with T = 299 6 1 K, Within the overall program flow, the two subtasks central to
reflecting the enclosure temperature of Texp = 298 6 1 K in this work were those of background treatment (together with
the experimental setup. cosmic ray removal) and peak analysis and fitting. Generally,
All data were normalized to the Q1(J = 2) line, the first these are the most important subtasks when it comes to
(nearly) fully resolved line. Clearly, the ShapeFit results quantification of spectrum content information.
reproduce the theoretical values best, being off by less than 1% For the former, we adopted the canonical RCF algorithm,
for most lines; larger deviations are only observed for the two but coupled it with a SGF step on the locus-array generated
overlapping lines (deconvolution algorithms always introduce from a single RCF pass. This vastly improved on RCFs
some bias errors), and for the very weak lines J = 7 and deficiency to overestimate the baseline level in spectra with
J = 8, for which background noise starts to become a broad peak features. This only two-parameter procedure,
significant contributing uncertainty. dubbed SCARF, was surprisingly robust; it resulted in superb
It should be noted that in all likelihood the deviations would background suppression, even if the parameters were not set to
be even less were the spectral sensitivity variations of the their full optimum. Of course, the parameters have to be
detection system taken into account. Within the spectral region sensibleeven the most intelligent routine normally cannot
displayed in Fig. 6, the transmission function of the specific cope with nonsensical requests. Also, in the majority of
532 nm Raman edge-filter (RazorEdge, Semrock), used to complex background cases dual application of SCARF with
suppress the laser excitation line, exhibits small oscillations in appropriate parameter settings is advisable.
its transmission function (of the order 0.20.3%). For the latter task of peak analysis, we developed a routine
we named ShapeFit. As outlined above, this is an only two-
parameter (amplitude and position) fitting algorithm that relies
CONCLUSIONS
on numerical line shape profiles rather than analytical
As described in this work, we developed an integrated functions, such as, e.g., a Gauss profile function. Of course,
concept for post-acquisition spectrum analysis that encom- the profile function has to be supplied by the user, and it is
passes the all important data treatment procedures necessary to normally based on an isolated spectral line feature. The great