Sie sind auf Seite 1von 18

Journal of Analytical and Applied Pyrolysis 133 (2018) 198–215

Contents lists available at ScienceDirect

Journal of Analytical and Applied Pyrolysis


journal homepage: www.elsevier.com/locate/jaap

Forensic comparison of pyrograms using score-based likelihood ratios T


a,⁎ a,b c
Agnieszka Martyna , Grzegorz Zadora , Daniel Ramos
a
University of Silesia in Katowice, Institute of Chemistry, Department of Analytical Chemistry, 9 Szkolna, Katowice 40-006, Poland
b
Institute of Forensic Research in Krakow, 9 Westerplatte, Krakow 31-033, Poland
c
Audias: Audio, Data Intelligence and Speech, Escuela Politecnica Superior, Universidad Autonoma de Madrid, C/Francisco Tomas y Valiente 11, 28049 Madrid, Spain

A R T I C LE I N FO A B S T R A C T

Keywords: The comparative analysis of chromatographic profiles of materials is the subject of interest in many scientific
Comparison/classification fields, including forensic science. Plastic microtraces collected during hit-and-run accidents and examined with
Likelihood ratio pyrolysis gas chromatography mass spectrometry (Py-GC–MS), may serve as an example. The aim of comparing
Data dimensionality reduction the recovered and control samples is to help reconstruct the event by commenting on their common, or not,
Pyrolysis gas chromatography
sources. The objective is to report the evidential value of data in the context of two competing hypotheses: H1 –
Forensic science
both samples share common origins (e.g. car) and H2 – they do not share common origins. The likelihood ratio
approach (LR) addresses this idea as an acknowledged method within the forensic community. However, con-
ventional feature-based LR models (using e.g. signal intensities of the chromatographically separated com-
pounds) suffer from the curse of multidimensionality. Their considerable complexity can be reduced in the score-
based LR models. In this concept the evidence expressed by the score, computed as a distance between the
recovered and control samples characteristics, is evaluated using LR. A score solely based on a distance or a
measure of similarity, without taking into account typicality, may not reflect the differences between similar
samples clearly in a highly multidimensional space. Here we show that boosting the between-samples variance
(B) whilst minimising the within-samples variance (W) helps distinguish between samples and improves the
score-based LR models performance. Instead of computing the distances in the feature space, the authors use the
space defined by ANOVA simultaneous component analysis, regularised MANOVA and ANOVA target projection
that find directions with the magnified differences between B and W. The concept was successfully illustrated for
22 plastic containers and automotive samples, examined using Py-GC–MS. The research shows that this so-called
hybrid approach combining chemometric tools and score-based LR framework yields a performing solution for
the comparison problem for Py-GC–MS chromatograms.

1. Introduction scene of the car accident and the car driven by the suspected perpe-
trator of the accident. The task is resolved by comparing the physico-
Chromatography plays an important role in the forensic evidence chemical characteristics, e.g. chromatograms, of the material recovered
analysis to detect the organic compounds. It is either used to identify from the scene of the car accident, collected in the form of microtraces
the unknown substances, e.g. drugs, or to record chromatographic of glass, automotive paints or plastics used for car body elements pro-
profiles (usually from the pyrolysis gas chromatography) of the mi- duction (e.g. bumpers) and control material from the suspected car.
crotraces of plastics, automotive paints, tires, fire debris, explosives and Then the question is raised whether the recovered material (e.g. found
fibres. In the latter case the chromatograms are usually recorded for on the victim clothes) and control material may have come from the
two samples, namely recovered and control. Then the task is to compare same source (i.e. suspected car) or not [1,2]. Such considerations refer
them and assess whether they may be two pieces of the same object. only to the common or separate source of data collected for case as-
This issue is known as the comparison task. sessment. This so-called source level is the first step within the hierarchy
In the era of developing society, the road transport holds an im- of propositions, which backbone embodies the source, activity and of-
portant place. This is also a subject of interest in the forensic field, fence levels [3–5]. In the comparison problem of plastics collected in
where the experts frequently face the problem of inferencing in the car hit-and-run car accidents investigation such source-generic hypotheses
accidents cases. Among many questions arising in the hit-and-run car can be expressed as:
accidents, experts may be asked to find the connections between the


Corresponding author.
E-mail addresses: agnieszka.martyna@us.edu.pl (A. Martyna), gzadora@ies.krakow.pl (G. Zadora), daniel.ramos@uam.es (D. Ramos).

https://doi.org/10.1016/j.jaap.2018.03.024
Received 12 January 2018; Received in revised form 29 March 2018; Accepted 29 March 2018
Available online 31 March 2018
0165-2370/ © 2018 Elsevier B.V. All rights reserved.
A. Martyna et al. Journal of Analytical and Applied Pyrolysis 133 (2018) 198–215

• H : compared recovered and control plastic fragments come from


1 approach were developed. A great majority of them use the measured
the same vehicle body. features of the samples to construct the feature-based LR models, which
• : compared recovered and control plastic fragments come from
H 2 require that:
two different vehicles.
(i) the number of samples substantially exceeds the number of vari-
In the forensic experts practice the chromatographic profiles are ables they are described by to make the matrix algebra feasible.
typically compared visually for detecting the similarities and dis- Hence highly multivariate data, such as chromatograms, delivering
crepancies between the location and shape of the leading peaks. Such a thousands of variables understood as signal intensities measured in
naked-eye comparison can only be credible for visually distinguishable time as the elution process continues, need some data dimension-
profiles. For highly similar chromatograms this approach lacks the ality reduction to make the LR models computationally available;
objectivity and precludes expressing the degree of similarity in a (ii) the (co)variance within each sample is constant and (multi)variate
quantitative manner. For objectifying the methodology the analytical normal;
results must be interpreted and reported according to the re- (iii) the average variance of the replicate measurements within each
commendations of the interpretation schemes acknowledged in the sample is much lower than the variance of samples averages. This
forensic sciences [6], i.e. using the likelihood ratio approach (LR) condition ensures that the samples are easily distinguished i.e.
[1,2,7]. This methodology provides a way for expressing the evidential replicate measurements for each sample are recognised as same-
value of the compared profiles in a reliable manner in view of two source, while these from different samples are indicated as coming
contrasting hypotheses (H1 and H2). Generally, the LR is computed as from different sources.
the probability of data characterising the evidence E, given the pro-
positions, H1 and H2: The obvious solution to (i), which is examining more samples than
the number of variables, involves considerable time and money con-
Pr(E|H1)
LR = . sumption and it is often infeasible. Practical way for muddling through
Pr(E|H2) (1)
the problem is by compressing the data dimensionality. Apart from the
H1 is supported by the LR values larger than 1 and the support is principal component analysis, the most efficient data dimensionality
strengthening with increasing LR. Conversely, the H2 is more likely reduction is by moving from feature representation to the (dis)simi-
when LR is below 1 and the support for this hypothesis reinforces with larity (or score) representation [16–19]. In feature representation both
the LR values approaching 0. Both hypotheses are equally likely when compared samples are characterised by a set of parameters (features/
LR = 1. The LR models reliably express the evidential value by ac- variables), which resemblance, variability, rarity and dependencies is
counting for: then studied using the feature-based LR. Contrary to that, in score re-
presentation individual multivariate observations are replaced by the
• similarities and discrepancies between the physicochemical data of pairwise measure of their mutual (dis)similarity and possibly typicality,
the compared samples; using the scores [19,20]. The so-called score-based LR approach is then
• typicality (rarity) of the data. Observing the similarity between rare used for studying whether pairwise scores between observations sup-
features assigns greater evidential value than when rife features port the hypothesis that they originate from the same source (H1) or
demonstrate similar resemblance; different sources (H2). The concept of score-based LR models simplifies
• the sources of uncertainty including the within- and between-sam- the typical approach for solving the comparison problem in the feature
ples variability computed from the relevant background population. space found in [1,7,10–15]. If the typicality is skipped, the (dis)simi-
If we are comparing two plastics of a particular kind, the back- larity is simply defined by the distances between observations, which
ground population is the set of chromatograms of plastics of that are computed in the same way for common or rare features. As has been
kind, recorded using the same methods as for the evidence material. noted in recent literature [20,21], this may lead to a severe loss of in-
If all replicate measurements for an object (e.g. car bumper) form a formation and degradation of the discrimination abilities of the score-
sample, then the between-samples variance is the variation of the based LR models. However, for forensic likelihood ratios, calibration is
averages of the samples. The variation of the replicates within each a critical measure of performance to be considered beyond dis-
sample, that is averaged over all samples, represents the within- criminating power [1,22–24], and good calibration can be achieved by
samples variance; using distance-only models. Consequently, sometimes distance-only
• statistical dependencies between the measured variables/features. models outperform feature-based models, despite the loss of dis-
criminating power. Nevertheless, it is recommended to include typi-
Considering the above aspects and viewing the data in the context of cality information in any score.
two contrasting, but equivalent, hypotheses, makes LR approach more An additional advantage of the score-based LR models is that the
suitable for forensic data interpretation than the typically used statis- requirement (ii) is of no importance for computing the distances.
tical tests (e.g. t-test) or chemometric methods. Also, it follows the rules However, the concept of score-based LR models is reasonable only
of probability, and integrates in a Bayesian decision framework in a when the features are much closer to each other among observations
natural way, allowing straightforward decision-making (Eq. (2)) [8,9]. from the same source than between different sources. This is equivalent
The Bayesian theory can be seen as an illustration of the trial course. to requirement (iii), thus the condition to keep greater between-sample
The prior assumptions (Pr(H1) and Pr(H2)) about the hypotheses (H1 variance than the within-sample variance for the features still holds for
and H2) stated before the evidence analysis are modified by the LR score-based LR models.
values computed after collecting more information in the course of the The comparison task that needs LR models to be engaged focuses on
evidence examination. Prior assumptions updated by LR values are the data that are visually hardly distinguishable. As a consequence of
expressed in the form of ratio of posterior probabilities Pr(H1|E) and Pr huge chemical similarity of the studied polymers, which contain mostly
(H2|E). These probabilities are the basis for a further decision by the the same constituents after the pyrolysis degradation, the analysed
fact finder. chromatograms differ only in small time ranges and a substantial part of
each chromatogram is identical throughout the database. For this
Pr(H1) Pr(E|H1) Pr(H1) Pr(H1 |E )
· = ·LR = . reason the variance of the majority of the variables is comparable
Pr(H2) Pr(E|H2) Pr(H2) Pr(H2 |E ) (2)
within and between samples. The concept of finding the directions
A large body of literature exists [1,7,10–15] in which the solutions along which the data within each sample are similar and differ between
to the comparison problem of physicochemical data using the LR samples is easily accomplished using chemometric techniques. In this

199
A. Martyna et al. Journal of Analytical and Applied Pyrolysis 133 (2018) 198–215

publication the authors wish to demonstrate the applicability of the reflected in just a few typical pyrolysis products giving very simple
ANOVA simultaneous component analysis (ASCA) [25–27], regularised chromatographic profiles. Consequently, the comparison problem was
MANOVA (rMANOVA) [27,28] and ANOVA target projection partial paradoxically far more complex than for richer profiles.
least squares (ANOVA-TP) [25,27] for finding the spaces where the In real forensic casework, due to the small size of the collected
between-samples variability is boosted, and the incidental within- samples, an expert is usually not capable of sampling the evidence ma-
samples variability is minimised. terial over a wide area to make sure the data are representative.
The combination of the chemometric techniques and likelihood Commonly an only opportunity for the recovered samples is to record
ratio for commenting on the evidential value of physicochemical data is multiple measurements for small fragments known as microtraces. For
known as the hybrid approach. The concept has been recently in- resembling the experimental design to the real scenarios, the following
troduced in forensic science [11,29–31] and seems promising due to the sampling scheme was proposed. There were 22 polymer objects subjected
augmenting complexity of data structures delivered by rapidly devel- to the analysis originating from 22 different vehicles or plastic containers
oping advanced analytical equipment. (hence from 22 different sources). Three microtrace size polymer frag-
The shortage of examples in the literature of developing the hybrid ments were cut off with a scalpel from each of four outermost parts of
score-based likelihood ratio models for highly multivariate data with a each object (usually corners being a few centimetres away), called spots,
focus on variance studies to make samples better distinguishable, was a and they were subjected to the Py-GC–MS. The linear dimension of the
motivation for this research. Thus the aim was to demonstrate the hy- samples for Py-GC–MS was typical for microtraces, which is not greater
brid approach combining the chemometric methods for maximising the than 1 mm. Moreover, this sample size prevented from overloading the
difference of between- and within-samples variance, score representa- column. The samples were placed in the quartz tube wrapped by a pla-
tion and the LR approach. The proposed models may be without hin- tinum coil and positioned in the pyrolyser (PyroProbe 2000 with
derance viewed as a method for evaluating the chromatograms for PyroProbe 1500 interface, CDS Analytix Ltd., United Kingdom) rod for
forensic purposes. thermal decomposition. Pyrolysis was performed at 750 °C without de-
The proposed workflow, illustratively presented in the forensic rivatisation. The produced pyrolysate was introduced on the gas chro-
context, may be utilised in any field of chemistry, when the issue of matography column and detected by mass spectrometer. Gas chromato-
comparing physicochemical features is raised. Various products or graph Auto System XL produced by Perkin Elmer, USA, was equipped
materials authentication may serve as an example. The presented with the DB-35MS (Agilent Technologies) semi-polar capillary column
scheme may be found indispensable for the scientists with a specialist (30 m × 0.25 mm × 0.25 μm) composed of diphenyl polysiloxane (35%)
knowledge in some field, who would be asked by the court re- and dimethyl polysiloxane (65%). The carrier gas was helium with
presentatives to express an opinion on the casework. pressure of 73 kPa and 30 ml/min flow rate. The gas chromatography
The paper is organised as follows. Section 2 includes the description program was: 40 °C held for 2.45 min, ramped 10.5 °C min−1 to 320 °C,
of the analysed samples, the steps of chromatograms pretreatment and 320 °C maintained for 5 min. Mass spectrometer TurboMass GOLD
the methodology applied for dealing with various sources of variance as manufactured by Perkin Elmer, USA, was applied as a detector. It was
well as a procedure for designing score-based LR models. This section equipped with the electron ionisation ion source (with electron energy of
atypically contains partial results, which are indispensable for taking 70 eV) and quadruple as an analyser. The ions were scanned in the range
any decisions on the subsequent steps of LR models construction. The 35–300 m/z. Pyrolyser was steered by a CDS 2000 Plus (CDS Analytix
article is crowned with results showing the performance of hybrid Ltd., UK) software and gas chromatograph coupled with mass spectro-
score-based LR models for solving the comparison problem of Py- meter was controlled by TurboMass 5.2 (Perkin Elmer, USA) program.
GC–MS chromatograms. The pyrograms that were recorded for total number of twelve mi-
crotrace size polymer fragments per object constituted a set of re-
2. Materials and methods plicates further referred to as a sample. The most characteristic peaks
noticed for the examined polypropylene samples are marked by arrows
2.1. Samples and equipment in one of the recorded pyrograms displayed in Fig. 1.

Polypropylene is often found as a material for automotive compo- 2.2. Computational methods
nents production. As a group of representative polymers utilised for the
purpose of manufacturing automotive elements, polypropylene objects 2.2.1. Signals preprocessing
(sometimes containing traces of other compounds such as polyethylene) 2.2.1.1. Chromatograms reconstruction. Each pyrogram constituted a
were studied within the comparison problem. The potential source of sequence of points being signal intensities of the chromatographically
the polypropylene on the scene of the car accident may be (i) plastic separated compounds measured in time. Due to indivertible
vehicles parts including caps, rails, bumpers, headlamps and mirror instrumental settings, the time series was not equally spaced
cases and (ii) polypropylene containers, which are packages for (sampled) and differed between chromatograms. The pyrograms
household products commonly used in our daily lives (e.g. cosmetics). demonstrated changeable ending points, which were around 34 min,
Thus the database was composed of the automotive and container and varying step between subsequent points, which was either 0.003 or
polypropylene samples examined by pyrolysis gas chromatography 0.004 min. Even though the inconsistency in the time series is not
mass spectrometry (Py-GC–MS). Py-GC–MS was merely an alternative detectable if the chromatograms are visualised as a picture, it becomes
method for polymer analysis, as, in contrast to the routinely used an obstacle for any computational methods for preprocessing or
Fourier transform infrared spectrometry (FTIR), the method entails comparing chromatograms. Majority of these methods require that
destroying the sample. This trait rather discredits the methods as sui- the profiles are represented by the same set of variables (e.g. signal
table for forensic purposes, since the evidence material cannot be to- intensities measured in time) that are equally spaced (sampled). The
tally consumed and should be available for other analyses. However, as problem needed sorting out by reconstructing the profiles so that they
capable of delivering detailed information about the polymer samples portray the signal intensities recorded for equally spaced time series,
structure and composition, Py-GC–MS was applied as a complimentary which is consistent between profiles.
technique to the leading FTIR. The results of the comparison problem Each chromatogram was then sampled every 0.0035 min, which is
for FTIR spectra recorded for nearly identical set of samples can be the mean of both observed initial steps (0.003 and 0.004 min), to
found in [11,12]. Undertaking the comparison problem for the poly- capture as many originally measured data points as possible. The re-
propylene as the main component was quite challenging. This is be- construction spanned the widest possible range between 0.004 and
cause of the poverty of the chemical structure of the polypropylene 34.003 min. Spline functions [32] with cubic polynomials were used for

200
A. Martyna et al. Journal of Analytical and Applied Pyrolysis 133 (2018) 198–215

Fig. 1. Pyrogram of polypropylene recorded in the first 12 min of the elution carried out under the experimental conditions given in Section 2.1 (A: propene, B: 2-
methyl-1-propene, C: 2-methylbutane, D: 2-methyl-1-pentene, E: 2,4-dimethyl-1-heptene, F: 2,4,6-trimethyl-1-nonene).

interpolating (fitting) the chromatogram and finding the signal in- aiming at features comparison is applied [33]. Since the shift appears
tensities for the new set of 9715 equally spaced points. The spline in- individual for each constituent, the alignment does not mean only
terpolation is evidenced as a flexible tool for fitting the curves that shifting the signal to minimise the distance between two profiles, but
resemble the shape of chromatographic profiles. involves signal contraction and expansion as well.
Each of the pyrograms was then extended at the beginning by In this study correlation optimised warping (COW [34]) was ap-
adding 200 points with mean centred at the first reconstructed data plied. It operates on specified parts of signals, called segments, and
point and standard deviation being six orders of magnitude lower. This aligns the query signal to the reference one by piecewise linear
operation is beneficial for conducting any signal treatment including stretching or compression. The correlation coefficient is used to assess a
e.g. warping and baseline correction in an optimal way as it prevents similarity between two chromatograms.
from generating the artefacts. Despite the foregoing discussion, it is still not clear what is the best
way for choosing the reference to perform the warping of all chroma-
tograms. In [35] the authors study various target selection options.
2.2.1.2. Warping. Chromatograms are the examples of the analytical According to the publication, the mean profile, which appears as the
profiles showing misalignment as the same features (here: peaks) most direct and obvious reference candidate, usually is not re-
appear in the chromatograms in different locations (here: time) as presentative of all of the profiles. Moreover, mean profile may be se-
illustrated in Fig. 2. Misalignment is caused by fluctuations of verely deformed by e.g. flattening, oversmoothing, merging some peaks
instrumental parameters in time, which entails elongating or or revealing some peak artefacts if the peak shifts are comparatively
shortening of the retention time of various constituents. It generates larger than their widths. As suggested in [35], this research uses the
artificial variance components and thus the alignment must be chromatogram with the highest correlation with all the others in the
completed as a preprocessing step before any chemometric technique

Fig. 2. Heat map of the most characteristic


part of the recorded polypropylene pyrograms
(500th to 2000th variable along the elution
progress) prior to warping (top) and after
warping (bottom). The bluish colours corre-
spond with larger signal intensity. The original
retention time is unsettled after warping and
therefore it is replaced by the variables num-
bers showing the elution progress. (For inter-
pretation of the references to colour in this
figure legend, the reader is referred to the web
version of the article.)

201
A. Martyna et al. Journal of Analytical and Applied Pyrolysis 133 (2018) 198–215

Fig. 3. The scheme pointing out the heteroscedasticity of the noise and the effect of denoising using the logarithmic transformation.

database as a reference. COW requires optimising of the segment size substantial part of all chromatograms is identical. For this reason the
(the number of points it contains) and the slack parameter, which is the variance of the majority of the variables is comparable within and be-
number of points the query may be compressed or stretched in each tween samples. If the goal of the analysis is to differentiate the samples
segment. The segment size was 300 points, while the slack parameter within the comparison problem, the replicates of each sample must be
was 200. similar and the samples averages should be kept distant. In other words,
The effect of warping is particularly evident when the map of the the between-samples variance must be significantly larger than the
recorded pyrograms shown in the top of Fig. 2 is compared with the within-samples variance, which should be minimised. Otherwise, the
image of aligned profiles (bottom of Fig. 2). Here the initial apparent distributions of the variables describing samples overlap, making it
curvature of vertical lines, pointing out the misalignment, has dis- difficult to distinguish between them. Thus care must be taken to find
appeared. The original retention time is unsettled after warping and the space of variables in which the between-samples variability is
therefore in Fig. 2 it is replaced by the variables numbers showing the maximised and that reduces the incidental within-samples variability.
elution progress. Principal component analysis (PCA) becomes usually the first natural
choice for constructing the space that maximises the variance.
2.2.1.3. Denoising and baseline correction. Fig. 3 portrays the However, it shows some shortcomings when various variance aspects
heteroscedasticity of the noise as the dispersion of the signal (e.g. within- and between-samples variability) must be considered se-
intensities at each time point is proportionally growing with the parately. This is because PCA aims at searching for the directions that
measured signal intensity. For removing this effect logarithm was maximise the overall variance of the data neglecting various variance
taken for each chromatogram, which made the variance comparable components. The clue, however, is to find the way that will only
regardless of the signal magnitude. maximise the between-samples variability and decrease the within-
The baseline was corrected using the asymmetrically reweighted samples variability. This is not feasible using PCA in a conventional
penalised least squares smoothing [36]. The process iteratively esti- way. Instead it is advisable to apply methods that study the data after
mates the baseline by updating the weights depending on whether the their decomposition into the components associated with various
signal is below or above the fitted baseline. sources of variation. ANOVA simultaneous component analysis (ASCA)
and regularised MANOVA (rMANOVA) are the most popular examples
thereof. Another approach suggests using ANOVA target projection
2.2.1.4. Normalisation. For the sake of making all of the
partial least squares (ANOVA-TP) for finding the directions that best
chromatographic profiles comparable despite the varying samples
separate data for different samples.
sizes, probabilistic quotient normalisation (PQN [37]) was applied for
normalisation purposes. PQN is initialised with integrating each signal
to unit area below the signal curve for making the signals of comparable 2.2.2.1. Feature selection. ASCA, rMANOVA and ANOVA-TP find the
magnitude. Then each signal is divided by the reference, which is new directions that maximise the between-samples variance and
usually the median signal. This produces the sets of quotients between minimise the within-samples variance much more efficiently when
each signal and the reference. The median quotient for each signal is these directions are searched for in the subspace of original variables
used as a normalisation factor for this signal. PQN works under which significantly differ between samples. Uninformative variables
assumption that the differences in the intensity of the majority of elimination partial least squares algorithm (UVE-PLS) was applied for
peaks result from the dilution of the samples rather than alterations of selecting the variables (signal intensities measured at 9915 time points)
the single constituents concentrations. that successfully differentiate the samples. To do so, the PLS response
matrix encodes the studied classes referring to samples replicates, using
2.2.2. Studies of various variance aspects – feature selection, ASCA, the bipolar or binary mode.
rMANOVA and ANOVA-TP After preprocessing steps (Section 2.2.1) the pyrograms database is
Since each examined sample has the same main component (poly- a matrix of size 264 chromatograms (m = 22 samples each measured 12
propylene), their chromatograms differ only in some peaks and times) × 9915 variables. Each signal was extended at the end by adding

202
A. Martyna et al. Journal of Analytical and Applied Pyrolysis 133 (2018) 198–215

Fig. 4. The results of the UVE-PLS algorithm.


In the upper plot noise regression coefficients
stability is marked in grey and separated by a
dashed vertical line from the stability of the
regression coefficients typical for the measured
signal. Red horizontal lines border the range of
the noise regression coefficients stability.
Green points beyond the horizontal lines in-
dicate the variables remaining in the final set.
These variables are shaded in the bottom
chromatogram. (For interpretation of the re-
ferences to colour in this figure legend, the
reader is referred to the web version of the
article.)

500 points acting as a noise and sampled from normal distribution with analogy to ANOVA, MANOVA computes the matrices of between-
mean 0 and variance 10−10, which did not influence the optimal groups variability (B) and within-groups variability (W), which in
complexity found for the PLS model including only 9915 original this particular study are represented by the between-samples and
variables. An extended set of data was subjected to PLS with the dis- within-samples variability matrices. It is worth noting that MANOVA
crete response matrix coding each sample measurements and using the and linear discriminant analysis (LDA) are quite similar to some extent.
model with optimal complexity. Leave-one-out cross validation was They both compute the eigenvectors of W−1B and thus find the
used that yielded a number of regression coefficients matrices. The directions that have the best group separation. However, their final
stability of each regression coefficient, defined as the ratio of its mean aims differ. While LDA is used as a classifier, the purpose of MANOVA is
and standard deviation, was investigated for each of 9915 original to test the statistical significance of the group differences. The
variables (black and green points in Fig. 4) and 500 new variables (grey drawback of both relates to the fact that they fail for highly
points in Fig. 4). Each original variable which stability fell in the range multidimensional data when the number of variables extends the
typical for the noise (grey points in Fig. 4) was believed to be unrelated number of samples. This is due to the inability to compute the
to differentiation of the samples and thus discarded. Finally there re- inverse of the variance matrices that do not have full rank or their
mained 3780 variables which most effectively separated the samples. instability when the number of variables is comparable to the number
of samples. To overcome the limitations of MANOVA or LDA (not
applicable for highly multidimensional data) and ASCA (assumption of
2.2.2.2. ANOVA simultaneous component analysis. ASCA is a method
the independence of variables with constant and equal within-samples
that easily joins the ANOVA and PCA [25–27]. Following the principles
variances) regularised MANOVA was proposed, which is a weighted
of ANOVA, the original centred data matrix X may be decomposed to
average of ASCA and MANOVA. rMANOVA [27,28] is applicable for
X = A + R, where A are the samples averages and R is the variation
highly multidimensional data and accounts for the covariance between
within each sample. The PCA is then performed on the A matrix
the variables. Its objective is to find the eigenvectors of the matrix
(A = TPT + E) producing loadings P that maximise the variation of the
((1 − δ)W + δT)−1B. These are the directions along which the
samples averages. T is the matrix of scores on the loadings and E
between-samples variance is the highest and the within-samples
represents the residuals from the model. In contrast to the simple PCA 1
variance the lowest. T is the target matrix which is either T = p tr(W)
carried out on X matrix (A + R), the ASCA loadings P neglect the
variation of the replicates. Consequently, they contain only the when the variances of p variables for each sample are equal or T = diag
information on the contribution of variables to the differences (W) when the variances for each sample are unique. δ is dependent on
between the samples averages that facilitates discrimination between the chosen target and expresses the variance of the W matrix
them. Ignoring the contribution of within-samples variance to the components according to the Ledoit–Wolf theorem [27,28].
overall variance prevents it from having a superfluous and excessive Investigating equation ((1 − δ)W + δT)−1B leads to the conclusions
influence on the generated space. For getting an insight into the natural that when δ = 1 and T = I rMANOVA becomes ASCA and when δ = 0 it
variation of the replicates, A + R is projected in the generated PCA turns into MANOVA. It also becomes clear that ASCA finds the eigen-
space. In this way the variance related to the differences between vectors of B matrix only, while rMANOVA takes a step further and in-
samples is captured and the random within-samples variability is cludes the within-sample variance. Contrary to ASCA, which assumes
ignored. Despite this undeniable simplicity of ASCA, the biggest constant and equal within-samples variance and implies that the vari-
objection towards its application is that it assumes constant and equal ables are uncorrelated, rMANOVA does not make such an assumption
variance for all studied samples and the independence of the variables. and accounts for the shape of the replicates data around the samples
averages.

2.2.2.3. Regularised MANOVA. MANOVA is a natural extension of


ANOVA for multivariate sets. It accounts for the covariance structure 2.2.2.4. ANOVA target projection. ANOVA-TP [25,27] is based on the
of the data, which is the piece of information ANOVA ignores. In partial least squares (PLS) regression, which fundamental goal is to find

203
A. Martyna et al. Journal of Analytical and Applied Pyrolysis 133 (2018) 198–215

scores and hence the typicality of the features was not included. In the
presented research the suitability of five different distance metrics in-
cluding Manhattan, Euclidean, squared Euclidean, correlation-based
and Chebyshev distance was examined. The correlation based distance
metric between two vectors of chromatograms projections, A and B
(one vector representing each chromatogram), is calculated as 1 − rAB,
where rAB is the Pearson's correlation coefficient between A and B. For
this reason the values it takes fall between 0 and 2, whereas the re-
maining considered distance metrics produce non-negative values. As
the optimal complexity of the model involving rMANOVA was equal
two, each chromatogram was projected on only two eigenvectors (di-
rections) and hence represented by only two numbers. This is not en-
ough to consider any correlations between vectors of projections re-
presenting the chromatograms and for this reason the correlation
metric was not examined for computing scores in the models engaging
Fig. 5. The illustration of the differences between the variance aspects ASCA, rMANOVA.
rMANOVA and ANOVA-TP address for two groups of samples marked by dots The proposed methodology reduced the multidimensional space of
and triangles. The lines illustrate the directions generated in each studied
9915 variables in a few steps: (i) to 3780 variables after UVE-PLS al-
methodology (ASCA, rMANOVA and ANOVA-TP).
gorithm, (ii) 3, 2 and 4 variables after ASCA, rMANOVA and ANOVA-
TP, respectively and (iii) finally to a single score reflecting the simi-
the linear relationship between data (X) and response (Y) in the form larity of the chromatograms, but ignoring the typicality. The scores
Y = XB + E, where B is a matrix of regression coefficients and E are the were then evaluated using the LR approach in the light of two com-
residuals. Instead of doing that directly, PLS predicts Y from X via peting hypotheses stating the same source or different sources of the
modelling the matrices by their principal component scores, which can samples the score describes. Fig. 7 depicts the scheme of the applied
be seen as an adequate summary of the X and Y matrices. The PLS procedure.
components are generated to highlight the relation between X and Y by
maximising the covariance between them. This procedure ensures 2.2.4. Score-based likelihood ratio
capturing the variance of X that is related and meaningful to the The LR was computed for the scores between pairs of samples re-
response Y. In ANOVA-TP the decomposition takes the form plicates coming either from the same or different sources. Two com-
Y = XTPBTP + ETP, where Y archives the studied groups (here: peting approaches were used. Firstly LR values were calculated using a
samples) using the binary or bipolar coding. The target projection of conventional score-based model with fitted distributions, following Eq.
the ith data point is given as t i,TP = Ŷi /||Bi ||. ANOVA-TP seems to (3). As an alternative the LR values were derived from the outcome of a
provide an efficient way for boosting between-samples variance and logistic regression [39] by applying the Bayes equation (Eq. (2)) linking
reducing within-samples variance. the posterior and prior probabilities. For controlling both LR models
Fig. 5 illustrates the differences in addressing various variance as- performance the appropriate validation scheme of the applied metho-
pects by ASCA, rMANOVA and ANOVA-TP. dology is described below.

2.2.2.5. ASCA, rMANOVA and ANOVA-TP complexity. The performance 2.2.4.1. Validation scheme. The ASCA/rMANOVA/ANOVA-TP spaces,
of the ASCA, rMANOVA and ANOVA-TP models is characterised by the scores representation and LR step of the model are generated
root mean square error [38]. It is defined as the root mean of the individually for comparing the replicates of each two samples,
squared differences between values predicted by a model and the namely recovered and control. The recovered and control samples are
original values. If RMS is computed for the training set it illustrates how simulated from the available database (details are given below). LR is
the model fits the data. RMS for the test set (RMSEP) shows the ability then received for each comparison to estimate the levels of misleading
of the model to predict the data for the test set. The RMS computed in evidence in support of H1 or H2, i.e. false positive or false negative rates.
the cross validation is denoted RMSCV. The minimum of root mean ASCA, rMANOVA and ANOVA-TP methods require training sets for
square error of cross validation (RMSCV) defines the optimal number of constructing the model that is then applied for the test sets. Dividing the
new directions in ASCA, rMANOVA and ANOVA-TP spaces, known as data into test and training sets is usually performed by using the
the optimal model complexity [38]. The optimal complexity of the techniques such as Kennard Stone or duplex algorithm for keeping the
ASCA (f1 in Fig. 6) and rMANOVA (f2 in Fig. 6) models was found by the representativeness of both sets. In the forensic scenarios splitting the
leave-one sample-out cross validation procedure. In ANOVA-TP database into the training and test sets is imposed by the way the
approach each sample appearing in the test set must be also performance of the likelihood ratio models is assessed, i.e. by esti-
represented in the training set to assign an appropriate coding in the mating the levels of misleading evidence in support of H1 and H2.
response matrix. Thus for ANOVA-TP the leave-one replicate per When estimating the rates of false positives pairwise comparisons of
sample-out cross validation procedure was applied. The root mean the replicates of each two samples from the database (thus from dif-
square error is illustrated as a function of the model complexity in ferent sources), one acting as recovered and the second as control
Fig. 6. The optimal complexity of ASCA was three, rMANOVA two and sample, were carried out. Hence there were two compared samples and
ANOVA-TP four. m − 2 =22 − 2 =20 samples that remained in the database. For de-
All the chromatograms forming the test sets were transformed in the veloping the model in each of these comparisons all m − 2 samples
new ASCA, rMANOVA and ANOVA-TP spaces with the optimal com- from the database and the recovered and control samples under in-
plexity (Fig. 6). vestigation were used to construct ASCA, rMANOVA an ANOVA-TP
spaces. Recovered and control samples must be included for building
2.2.3. Reduction of data dimensionality from feature to distance the ASCA, rMANOVA and ANOVA-TP models to account for the
representation variability that stands behind the differences between these samples
Finally the projections of each chromatogram in the test set on the and any other in the database. Each of m samples in the database (in-
new directions generated from ASCA, rMANOVA or ANOVA-TP were cluding the recovered and control samples under investigation as well)
converted into the score representation. The distances were used as was divided into training and test sets for ASCA, rMANOVA and

204
A. Martyna et al. Journal of Analytical and Applied Pyrolysis 133 (2018) 198–215

Fig. 6. The root mean square error illustrating


the models fit (RMS), complexity (RMSCV-
cross validation RMS) and performance
(RMSEP-RMS of prediction) in ASCA (top),
rMANOVA (middle) and ANOVA-TP (bottom)
models. Green circles point the optimal com-
plexity of the models. (For interpretation of the
references to colour in this figure legend, the
reader is referred to the web version of the
article.)

ANOVA-TP. To do so, four measurement spots of each sample were split Likewise for estimating the levels of false positives, ASCA, rMANOVA
into two parts, each with six measurements from two spots. ASCA, and ANOVA-TP models must be trained using the database samples and
rMANOVA and ANOVA-TP models were developed using the training the sample that is divided into recovered and control samples. Both
set accounting for training parts of all m samples. The projections of the samples should be regarded individually as if they came from different
test set data in the ASCA, rMANOVA and ANOVA-TP spaces with op- sources. This is the consequence of the presumption of innocence
timal complexity (Section 2.2.2.5) were used for computing scores in principle claiming that each two compared samples are believed to
the (dis)similarity representation. Then each LR value was calculated come from different sources unless proved otherwise. Thus the sample
for a pairwise distance between replicates of two different samples, one currently under investigation was firstly divided into two parts (pre-
pretending to be recovered and the second control sample. The same- tending to be recovered and control material), each consisting of two
source or different-source distributions were modelled using the scores measurement spots. One spot from each part (3 measurements) was
calculated for the remaining m − 2 samples. The same procedure was regarded for training set and the remaining for the test set. From the
repeated for all available pairs of recovered and control samples. remaining m − 1 samples in the database the measurement spots for
Each sample had six replicates and thus there were training and test sets were selected randomly for keeping the same
m
N2 = 6·6· ( )
2
m! 22 ! 22·21
= 6·6· (m − 2) ! 2 ! = 6·6· (22 − 2) ! 2 ! = 6·6· 2 = 8316 LR va- number of observations as for the recovered and control samples. ASCA,
rMANOVA and ANOVA-TP models were developed using the training
lues yielded for true-H2 comparisons when investigating all
available pairs of samples from the database. In each comparison set with m + 1 samples (one sample is divided into two, which adds 1 to
the same-source (SS) and different-source (DS) classes for fitting the number of samples). The projections of the test set data in the ASCA,
the distributions were derived in the cross-validation scheme rMANOVA and ANOVA-TP spaces with optimal complexity (Section
after excluding the recovered and control samples. There are 2.2.2.5) were used for computing scores in the (dis)similarity re-
in total (m − 2) · 6 = (22 − 2) · 6 =120 replicates giving presentation. Then each LR value was calculated for a pairwise distance
between the two replicates within the sample split in two parts, one
⎛ (m − 2)·6 ⎞ = ((m − 2)·6) ! = ((m − 2)·6)((m − 2)·6 − 1) = 120·119 = 7140 scores.
2 ((m − 2)·6 − 2) ! 2 ! 2 2 acting as recovered and the second as control sample. The same-source
⎝ ⎠
The number of SS distances computed within each of or different-source distributions were modelled using the scores calcu-
m − 2 =22 − 2 =20 samples from the database that contain 6 re- lated for the remaining m − 1 samples. The same procedure was re-
plicates is equal to () 6
2
6! 6! 5·6
= 2 ! (6 − 2) ! = 2 ! (4) ! = 2 = 15, which gives in peated for all available pairs of recovered and control samples.
Each sample had three replicates and thus there were
total (m − 2) · 15 = (22 − 2) · 15 = 300 SS scores. The remaining
N1 = 3 ·3 · m = 3 ·3 · 22 = 198 LR values yielded for true-H1 compar-
⎛ (m − 2)·6 ⎞ − (m − 2)·15 = 7140 − 300 = 6840 are the DS scores. isons when investigating all m = 22 samples from the database. In each
⎝ 2 ⎠
Experiments for true-H2 are expected to deliver LR below unity. Thus comparison the SS and DS classes for fitting the distributions were
values above unity produce misleading outcome. derived in the cross-validation scheme after excluding the sample that
When estimating the rates of false negatives, the replicates of two is split into recovered and control sample. There are in total
samples (recovered and control) must be compared that truly originate (m − 1) · 3 = (22 − 1) · 3 =63 replicates giving ⎛ (m − 1)·3⎞ =
from the same source. Since the samples in the database come from ⎝ 2 ⎠
((m − 1)·3) ! ((m − 1)·3)((m − 1)·3 − 1) 63·62
different sources and there is not a pair of samples that have common ((m − 1)·3 − 2) ! 2 !
= 2
= 2
= 1953 scores. The number
origins, the recovered and control samples must be simulated from the of SS distances computed within each of m − 1 =22 − 1 =21 samples
available replicates within each single sample. In a single comparison a from the database that contain 3 replicates is equal to
single sample was then divided into two equal parts, one simulating
recovered and the second control sample. Hence there were two com-
()3
2
3!
= 2 ! (3 − 2) ! = 3, which gives in total (m − 1) · 3 = (22 − 1) · 3 =63
(m − 1)·3⎞
pared samples but simulated from a single sample from the database SS scores. The remaining ⎛ − (m − 1)·3 = 1953 − 63 = 1890
⎝ 2 ⎠
and m − 1 =22 − 1 =21 samples that remained in the database.

205
A. Martyna et al. Journal of Analytical and Applied Pyrolysis 133 (2018) 198–215

Fig. 7. The scheme illustrating the experimental design. Notation: m – number of samples, N0 – number of original chromatograms (subscripts test and train refer to
the test and training sets respectively), ni – number of multiple chromatograms recorded for each of m = 22 samples, measured ni = 12 times, p(·) – number of
variables considered at each step indicated by a subscript, f1, f2 – optimal complexity of the ASCA and rMANOVA models. Black numbers of dimensions refer to the
experiments for estimating false positive rates, whereas grey numbers in brackets refer to the experiments for false negative rates.

are the DS scores. Experiments for true-H1 are expected to deliver LR the distribution of scores when the samples come from different sources
exceeding unity. Thus values below unity produce misleading outcome. (under H2). The scores established under H1 and H2 are collected within
The sampling scheme is shown in Fig. 8. The pairs of test and two classes (g = 1, 2) of the scores found between pairs of samples
training sets were generated randomly s = 10 times for each of the coming from the same source, SS (under H1), and when they come from
comparisons for averaging the results. different sources, DS (under H2) (Fig. 7, Section 2.2.3).
In this model each ith (i = 1, …, mg) score (pairwise similarity
2.2.4.2. Conventional score-based LR models. In the conventional LR measure) is described by a vector of p variables, xgi = (xgi1, …, xgip)T,
models the score is interpreted in the context of the distribution of where p refers to the number of scores describing the proximity of each
scores when both samples come from the same source (under H1) and two samples. In this research p = 1, however the mathematical

206
A. Martyna et al. Journal of Analytical and Applied Pyrolysis 133 (2018) 198–215

Fig. 8. The scheme illustrating the creation of test and training sets in the experiments conducted for estimating the levels of false positive and false negative rates.

expressions generalise the model also for cases in which p > 1. The The optimal model parameters (b0 and b1) are computed using the
general mean vector for gth class μg for the analysed scores is computed maximum likelihood methodology.
1 m
from each class and estimated from x g = m ∑i =g1 x gi . Vectors xgi are In this research the logistic regression model was built for catego-
g
either assumed normally distributed or their distribution is modelled in rical Y variable coding the scores from the SS and DS classes. The model
a non-parametrical way, e.g. using kernel density estimation (KDE). In was further used for predicting the posterior probabilities that a ques-
the former case xgi concentrate around the general mean μg for the gth tioned score between replicates comes from the SS or DS class. The LR
class estimated as x g and their dispersion is given by the variance- values can be then easily retrieved from the posterior probabilities
covariance matrix Cg, (xgi|μg, Cg) ∼ N(μg, Cg). KDE places a normal using the following Bayes equation (Eq. (2)) linking the posterior and
1
prior probabilities:
kernel with bandwidth parameter, hg = ( 4
mg (2p + 1) ) p+4
, centred at each
Pr(E|H1) Pr(H1 |E ) Pr(H1)
considered data point, xgi, sums and averages them so that the final area LR = = / .
under the curve integrates to unity. Pr(E|H2) Pr(H2 |E ) Pr(H2) (6)
For a vector of p scores y = (y1, …, yp)T LR is computed according to To compute the LR from the posterior probabilities the prior prob-
Eqs. (3) and (4) when the data variability is assumed normal or is abilities are essential. They can be either equal (Pr(H1) = Pr(H2) = 0.5)
modelled by KDE, respectively. The formulas are the simplified versions or taken as the proportion of scores in the SS and DS sets. In this study
of more general expressions (e.g. two-level models) for solving con- the SS and DS classes contained the same number of scores, so Pr
ventional classification problem within LR framework [1,29,30,40]. (H1) = Pr(H2) = 0.5.
1

LR =
f (y|μ1 , C1, H1)
=
{ 1
(2π )−p /2|C1|− 2 exp − 2 (y − μ 1)T (C1)−1 (y − μ 1) }
1 2.2.5. Performance metrics
f (y|μ 2 , C2 , H2) 1
{
(2π )−p /2|C2 |− 2 exp − 2 (y − μ 2)T (C2)−1 (y − μ 2) } The most straightforward method for commenting on the LR models
(3) performance is by estimating the levels of false positive and false ne-
gative responses. False positive rates appear when LR > 1 for com-
f (y x1i, … , x1m1, C1, h1, H1) paring two samples coming from different sources, i.e. for the experi-
LR =
f (y x2i, … , x2m2 , C2, h2, H2 ments under H2. Conversely, the false negative rates arise when LR < 1
1 m

=
(2π )−p /2 h12 C1 − 2
1
m1
∑i =11 { 1
exp − 2 ( y− x1i)T (h12 C1)−1 ( y− x1i) } for the comparison between samples sharing common origin, i.e. for the
experiments under H1. Despite the simplicity and workability of giving
− 12 m2
exp { − y− x ) }
1 1
(2π )−p /2 h 22 C2 m2
∑i =1 2
( y− x2i)T (h 22 C2)−1 ( 2i the false rates when assessing the model performance, this approach
underestimates the LR approach in that it is not only capable of in-
(4)
dicating the more probable hypothesis but also of measuring the
Due to the fact that the scores were not always normally distributed, strength of the support towards each of the hypotheses. The latter
KDE was applied for modelling the appropriate distributions. In order to feature is completely ignored when the models effectiveness is reported
avoid biased results DS class was limited to contain equal number of using only false rates. Under this approach two values, e.g. LR = 10 and
distances as the SS class, i.e. 63 for experiments under H1 and 300 in LR = 100, are equally considered in favour of the H1, even though
experiments under H2. LR = 100 supports the H1 tenfold stronger than LR = 10. Generalising,
the complete presentation of the LR models effectiveness should include
2.2.4.3. Score-based LR models from the logistic regression. Logistic the amount of the support towards each of the hypotheses. The LR value
regression [39] is a linear classifier for predicting the categorical Y in favour of the correct hypothesis should be as high as possible, i.e.
variable based on a set of x predictors. For a single predictor the logit LR ≫ 1 when H1 is correct and LR ≪ 1 when H2 is correct. LR values
(or logistic) transformation of the conditional probability Pr(Y|x), concentrating around 1 should be received for misleading responses to
known also as log odds of the outcome, is modelled with a linear show weak support for the incorrect hypothesis. Thanks to the greatest
function of x according to the equation: advantage of the LR methodology capable of pointing out the strength
Pr(Y |x ) of support towards each of the hypotheses, the performance metrics
log = b0 + b1 x . such as empirical cross entropy can be exploited for thorough com-
1 − Pr(Y |x ) (5)
menting on the models performance.

207
A. Martyna et al. Journal of Analytical and Applied Pyrolysis 133 (2018) 198–215

Fig. 9. (a) Logarithmic strictly proper scoring rules and (b) empirical cross entropy (ECE) plot (description in the text). (For interpretation of the references to colour
in this figure legend, the reader is referred to the web version of the article.)

Empirical cross entropy [1,22,41–43] is a performance metric which illustration of the LR model's performance. In the case when the ex-
relies on assigning an appropriate penalty for each of the yielded LR perimental curve lies between the neutral and calibrated curves, one
values. The penalty magnifies with the increasing support for the in- can observe the reduction of information loss due to the model in re-
correct hypothesis according to the logarithmic strictly proper scoring gard to the neutral curve assigning no value to the evidence (LR = 1).
rules shown in Fig. 9a. The penalty for true Hi (i = 1, 2) LR value is The lower the experimental curve is located, the better the performance
given as − log2 Pr(Hi|E). of the LR model as it explains more information. If there is still too
The overall penalty for the model is the sum of the mean penalties much uncertainty within the model about the correct hypothesis, the
computed for the N1 LR values computed under H1 and N2 LR values ECE curve for experimental LR values will grow, and more information
computed under H2 hypotheses (Section 2.2.4), with each mean penalty will be needed in order to identify the true hypothesis. The calibrated
weighted by the relevant prior odds Pr(H1) and Pr(H2) [1,22,41–43]. curve is always the lowest one and its distance from the experimental
Taking into account Eq. (2) it could be expressed as: curve shows the goodness of LR model calibration [1,43]. It can be
N1 improved by altering the model, however, it does not indicate what
Pr(H1) Pr(H2) ⎤
ECE = ∑ log2 ⎡⎢1 + modifications will help achieve the goal. The lower the distance be-
N1 i=1 ⎣ LRi Pr(H1) ⎥
⎦ tween these curves, the LR model is better calibrated [1].
N2
Pr(H2) LRj Pr(H1) ⎤ In order to choose a single number for summarising ECE, a natural
+ ∑ log2 ⎡⎢1 + ⎥
. choice is the point in the x-axis in which prior probabilities are equal,
N2 j=1 ⎣ Pr(H2) ⎦ (7)
i.e. log10 Odds(H1) = 0. At this point, ECE represents the performance
The performance of the LR models should be available for inter- of the LR method solely, when there is not prior information about the
pretation for any assumed prior probabilities as assigning any particular hypotheses, and it is denoted Cllr or Cllr,min for experimental and cali-
priors in not a forensic expert domain. For this reason the final ECE brated curves respectively (Fig. 9b). The difference between Cllr and
results are portrayed in the form of a diagram illustrating the ECE Cllr,min is the calibration loss, Cllr,cal. The values Cllr and Cllr,cal are a
curves showing the ECE values for all possible quotients Pr(H1)/Pr(H2), summarising scalar measure of performance, that is very popular
under obvious constraint that Pr(H1) + Pr(H2) = 1. The logarithm of among LR-based forensic interpretation because they present some at-
this quotient is commonly referred to as the log-odds or the logarithm of tractive properties [46].
the prior odds in favour of H1 and denoted as log10 Odds(H1). An ex-
ample of the typical ECE curve computed for an experimental set of LR 2.3. Software
values is portrayed as a solid (red) curve (commonly named experi-
mental) in Fig. 9b. It is accompanied by the two other curves. The dotted All the mathematical background of the research including the
(black) curve (commonly named null or neutral) is the fixed reference chromatographic profiles pretreatment, chemometrics and LR calcula-
curve showing the performance of a model, which is neutral and does tions was carried out using the R software (version 3.3.1) [47] and
not support any of the hypotheses (LR = 1). The dashed (blue) curve Matlab (version 9.2.0.556344 (R2017a)) [48]. Most of the scripts were
(commonly named calibrated) represents the ECE values obtained after prepared by the authors themselves and are available upon request.
transforming the LR values with Pool Adjacent Violators algorithm
(PAV) [1,44,45]. The algorithm produces the monotonic function based 3. Results
on the posterior probabilities Pr(H1|E) and Pr(H2|E) that returns new LR
values according to Eq. (2). The new set of LR values represents iden- 3.1. Descriptive statistics
tical discriminating power, expressed by the levels of false positive and
false negative answers, and boasts the best performance (i.e. LR values Fig. 10 illustrates the distributions for the same source and different
strongly support the correct hypotheses and give weak support for the source distances for a randomly selected case (involving Manhattan
incorrect). Therefore, the observed differences between the calibrated distance in the ASCA space) and their overlap in the log scale. The
curve and the ECE curve for the experimental LR set indicate whether common area below both curves pointing out the overlap of the dis-
there exists any possibility of improving the model's performance. tributions is shaded in green. It is expected that the least overlap be-
Summarising, the neutral (black) curve is the floor of performance for tween SS and DS distributions, the better the discriminating power of
an LR approach to be useful, and the calibrated (blue) curve shows a the LR model, i.e., its ability to distinguish between cases where H1 and
ceiling of performance for the LR approach. H2 are respectively true. This will be also seen as an improvement in
The relative position of all three variants of ECE curves provides an Cllr,min (i.e., Cllr,min will be lower) as the overlap decreases. Thus LR

208
A. Martyna et al. Journal of Analytical and Applied Pyrolysis 133 (2018) 198–215

Fig. 10. (a) Distributions for the same source and different source distances for a random test set, (b) boxplots presenting the overlap between the distributions for
the same source (SS) and different source (DS) classes in all test sets. (For interpretation of the references to colour in this figure legend, the reader is referred to the
web version of the article.)

models performance degrades with increasing overlap between dis- suggests that the replicates of different samples may appear very si-
tributions. The levels of false negatives rise when green area filled with milar, misleadingly indicating their common origins.
dot lines in Fig. 10a is getting larger and false positive rates increase The distributions for SS and DS classes demonstrate the least overlap
when green area with dashed lines is augmenting. The figure points out for the models using rMANOVA, which widely outperforms ASCA and
that different source distributions are asymmetric due to the positive ANOVA-TP. The overlapping areas cover usually less than 10% of the
skewness. Same source distributions are usually unimodal and seem distributions areas in rMANOVA, while the overlap reaches up to even
more symmetric resembling Gaussian distribution. The distributions for 60% for the remaining techniques. The correlation-based and squared
DS class are in most cases bimodal, with one maximum within the Euclidean distances demonstrate the least overlap between the same
distribution for SS class and the second beyond. This observation and different source distributions oscillating around 30%. These

209
A. Martyna et al. Journal of Analytical and Applied Pyrolysis 133 (2018) 198–215

Fig. 11. Violin plots illustrating the levels of (a) false positive and (b) false negative model responses (LR stands for conventional LR and LogReg stands for logistic
regression). 10% of false negative answers refer to 20 values of LR < 1 and 10% of false positive answers correspond with 83 values of LR > 1. (For interpretation
of the references to colour in this figure legend, the reader is referred to the web version of the article.)

observations may be the premise for predicting better performance of computing the distances between chromatograms projections on the
LR models using rMANOVA and correlation-based or squared Euclidean rMANOVA eigenvectors seems to provide the most optimal solution for
distance metrics due to their most preferably separated SS and DS receiving best behaving LR models (Fig. 10b).
distributions. The model engaging squared Euclidean distance for

210
A. Martyna et al. Journal of Analytical and Applied Pyrolysis 133 (2018) 198–215

Fig. 12. Box-empirical cross entropy plots for the best (left hand side) and worst (right hand side) LR models: (a) LogReg model issuing correlation distance in ASCA
space, (b) conventional LR model issuing squared Euclidean distance in ASCA space, (c) LogReg model issuing Chebyshev distance in rMANOVA space, (d) con-
ventional LR model issuing squared Euclidean distance in rMANOVA space, (e) LogReg model issuing correlation distance in ANOVA-TP space and (f) conventional LR
model issuing squared Euclidean distance in ANOVA-TP space. (For interpretation of the references to colour in this figure legend, the reader is referred to the web
version of the article.)

3.2. Score-based likelihood ratio models negative outcomes yielded from all s = 10 test sets.
As expected after inspection of the SS and DS distributions overlap,
Fig. 11 comprehensively portrays the overall performance of the the lowest levels of false positive and false negative rates are yielded by
conventional LR models (LR) or logistic regression (LogReg) LR models the models issuing rMANOVA. The performance of all rMANOVA-LR/
under investigation, where the scores are computed using various dis- LogReg models is comparable and none of them appears to be appar-
tance metrics, in the spaces defined by ASCA, rMANOVA and ANOVA- ently better than the remaining. The false rates oscillate around 15%
TP. Each violin plot, being a hybrid of a boxplot and a kernel density and outperform the remaining LR/LogReg models based on ASCA and
plot curved along the boxplot sides, shows the false positive or false ANOVA-TP by ca. 15%. False positive rates are in general a few

211
A. Martyna et al. Journal of Analytical and Applied Pyrolysis 133 (2018) 198–215

Table 1
Mean false positive and false negative error rates yielded in all studied models (LR stands for conventional LR and LogReg stands for logistic regression). 10% of false
negative answers refer to 20 values of LR < 1 and 10% of false positive answers correspond with 83 values of LR > 1.
Manhattan Euclidean Squared Euclidean Correlation-based Chebyshev

LR LogReg LR LogReg LR LogReg LR LogReg LR LogReg

False positive
ASCA
33.27 31.57 35.80 32.47 40.50 39.25 32.24 33.37 36.65 33.68
rMANOVA
15.21 13.21 15.14 13.05 18.38 14.22 – – 15.37 13.19
ANOVA-TP
28.49 26.98 29.17 27.23 34.26 33.53 30.78 28.96 31.18 29.45
False negative
ASCA
25.35 24.65 22.07 23.43 16.67 16.11 15.51 15.96 19.55 21.16
rMANOVA
6.16 13.64 2.68 11.87 3.23 10.51 – – 2.48 7.27
ANOVA-TP
23.69 23.43 23.54 23.89 18.03 17.12 17.33 16.37 23.23 23.28

percentages lower for ANOVA-TP-LR/LogReg than for ASCA-LR/ distance metric is issued. The progress is not evident as in many cases
LogReg models. This tendency is not observed for false negative rates, the experimental line exceeds neutral line for the whole or only partial
which are rather stable for both models and reach ca. 25%. The worst range of the prior odds, particularly for positive logarithm of the prior
models are constructed using the ASCA methodology and squared odds, i.e. log10 Odds(H1) > 0. This may be a consequence of only a
Euclidean distance with false positive rates exceeding even 40%. On the single likelihood ratio values strongly supporting the incorrect hy-
other hand these models yield very low levels of false negative re- pothesis, mainly H1. It may generate so high penalty that it outweighs
sponses (ca. 16%). The tendency of yielding slightly lower rates of false the sum of awards assigned for the remaining correct responses. This
negative answers for LR than LogReg models is observable, in particular still makes the ECE plots interpretation challenging as the results stay in
for rMANOVA-LR/LogReg models. Surprisingly this trend is not con- contrast to the previous remarks, which suggested that this distance
tinued for the false positive rates, in which it is reverse. Nevertheless, metric should outperform the others. However, the best models are
the noted discrepancies are not so meaningful. In spite of the smallest based usually using the correlation distance, which is not surprising.
overlap area for squared Euclidean and correlation distances, which is Then the reduction of information loss (or gain of information) reaches
portrayed in Fig. 10b, the prediction that these are the most suitable even ca. 60% for the rMANOVA-LogReg model with respect to its re-
distance metrics is confirmed only in the levels of false negative rates ference, which is quite acceptable. This means that the majority of the
reaching ca. 20%. False positive rates do not reflect such an anticipa- information concerning the uncertainty about the correct hypotheses is
tion, which may be the consequence of larger overlap area responsible explained after the evidence analysis using the rMANOVA-LogReg
for magnifying the false positive rates (dashed blue shaded area in models. From all considered models rMANOVA-LogReg demonstrates
Fig. 10a) than the one related to the false negative rates (dotted blue the most satisfying performance (Table 1, Fig. 11).
shaded area in Fig. 10a). These findings agree with the previous re- A supplementary information on the ECE outcomes is provided in
search [11], in which the correlation distance metric was also noted as Fig. 13 which pictures the ECE values for log10 Odds(H1) = 0 (known as
one of the most suitable. Cllr) for the experimental sets in regard to the false positive and false
The dispersion of the results within each model is not meaningful negative rates yielded for each considered model. The colours corre-
and usually does not exceed a few percent for false positive rates and spond with the models, which performance is shown in Fig. 11 and the
increases to ca. 15% for false negatives. Nevertheless the results can be markers shapes refer to the ASCA-LR/LogReg (hexagons), rMANOVA-
regarded as stable regardless of the chosen distance metric and tech- LR/LogReg (squares) and ANOVA-TP-LR/LogReg (circles) results. The
nique of variance examination. size of the markers gives the idea of the Cllr values magnitude, and
The general findings expose that false positive rates are rather therefore the smaller the marker, the better. There are s = 10 points of
comparable or a bit larger than the levels of false negative answers. This the same colour and shape corresponding with 10 validation sets.
is explained by the similarity of the analysed samples. Inconsiderable Fig. 13 clearly indicates the relative performance of all considered
distances between their characteristics affect the LR/LogReg models models accounting for the levels of false positive, false negative rates
capability to distinguish between different source items, leading to the and ECE outcomes simultaneously. The diagrams show that the models
increase in the levels of false positive answers. This seems to be the issuing rMANOVA are indisputably better than the remaining. Not only
consequence of finding the DS distributions usually bimodal with one deliver they the lowest rates of false positive and false negative an-
maximum falling in the range of distances typical for SS class (Fig. 10a). swers, but also yield incomparably more satisfying ECE outcomes. This
Such distances are then misleadingly supporting the hypothesis about is reflected in relatively small size of the markers with only a few ex-
their common source. ceptions. The least acceptable performance is yielded by ASCA-LR/
Fig. 12 presents the box-empirical cross entropy plots in a modified LogReg models with the highest rates of false positive answers and
way in comparison to traditional ECE curves as introduced in Section practically no reduction of information loss (i.e., ECE is close to the
2.2.5. Here the experimental and calibrated curves (Section 2.2.5) are neutral curve, or even higher), in particular for conventional LR
replaced by the sets of boxplots positioned at the considered prior odds. models. Thus applying these models for evidence evaluation may lead
Each boxplot accounts for the ECE values calculated for a given prior to more misleading conclusions than when the samples are not ex-
odds using the available likelihood ratio values. The plots in Fig. 12 amined and the likelihood ratio values are assumed neutral (LR = 1).
portray the best and worst models observed for the models issuing Observable differences between the experimental (Cllr) and cali-
ASCA, rMANOVA and ANOVA-TP techniques. brated (Cllr,min) box-ECE curves (known as Cllr,cal) indicate the oppor-
The diagrams do not seem to convince that LR/LogReg models tunity to improve the performance of the models by e.g. extending the
evidence the most desirable performance when squared Euclidean database, which would represent the relevant population in an

212
A. Martyna et al. Journal of Analytical and Applied Pyrolysis 133 (2018) 198–215

Fig. 13. The magnitude of Cllr (ECE values for log10 Odds(H1) = 0) in regard to the levels of false positive and false negative rates in (a) conventional LR models and
(b) LogReg models (logistic regression models). The Cllr value for the largest green square in the bottom diagram is ca. 1.6, which is given for comparison purposes.
Therefore, the lower the size of the markers the better. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of
the article.)

extensive manner. Nevertheless the interpretation of the results for for forensic purposes. The studies clearly demonstrate that the like-
legal processing must be carried out carefully, accounting for all pos- lihood ratio framework is capable of objective addressing of the com-
sible performance metrics such as false positive and false negative rates parison problem of the compared samples described by highly multi-
and ECE plots. As shown, despite the scarce database the results were variate data, such as chromatograms recorded in the course of Py-
still gratifying. This appears as an advantage of the proposed metho- GC–MS. The hybrid approach combining the chemometric tools and
dology, which is suitable for small datasets. score-based likelihood ratio framework for reporting the evidential
value yields more objective conclusions by expressing the degree of
4. Conclusions resemblance between the profiles quantitatively.
Moving from conventional feature representation to the scores re-
The crucial issue of the presented research was to formalise the presentation effectively reduces data dimensionality in a few stages
existing practice of visual differentiation of chromatograms/pyrograms from thousands of features to a single score. It was found adequate for

213
A. Martyna et al. Journal of Analytical and Applied Pyrolysis 133 (2018) 198–215

solving the comparison problem due to the fact that similarities are The experiments were undertaken within the National Science
bigger between samples sharing the same origin than for those from Centre in Poland (Preludium 6 no. 2013/11/N/ST4/01547) and the
different sources. Nevertheless, the distance metrics must be selected in Institute of Forensic Research projects VI K/2013-15 and IV K/2015-17.
such a way as to reflect the data structure as accurately as possible. A
disadvantage of this methodology is in the loss of information including References
the neglect of the original rarity of features characterising the samples.
However, when scores are computed in the ASCA, rMANOVA and [1] G. Zadora, A. Martyna, D. Ramos, C. Aitken, Statistical Analysis in Forensic Science:
ANOVA-TP subspaces, then the score is not only considering the simi- Evidential Value of Multivariate Physicochemical Data, Wiley, Chichester, UK,
2014.
larity of the features in this subspace, but some population variation [2] C. Aitken, G. Zadora, D. Lucy, A two-level model for evidence evaluation, J.
that adds information to the final score. Thus, the loss of information in Forensic Sci. 52 (3) (2007) 412–419.
the score by solely considering distance between features is diminished. [3] R. Cook, I. Evett, G. Jackson, P. Jones, J. Lambert, A hierarchy of propositions:
deciding which level to address in casework, Sci. Justice 38 (4) (1998) 231–239.
However, the scores accounting also for the typicality apart from si- [4] C. Aitken, P. Roberts, G. Jackson, Fundamental of Probability in Statistical Evidence
milarity should be considered in future work. in Criminal Proceedings, Royal Statistical Society, 2012.
The proposed score-based LR models admittedly benefited from [5] D. Ommen, C. Saunders, C. Neumann, A Note on the Specific Source Identification
Problem in Forensic Science in the Presence of Uncertainty About the Background
using the ASCA, rMANOVA and ANOVA-TP for exposing the superiority Population, (2015) arXiv:1503.08234.
of the between-samples variability over the within-samples variability, [6] ENFSI, ENFSI Guideline for Evaluative Reporting in Forensic Science. Strengthening
being a crucial issue for developing efficient LR models. rMANOVA the Evaluation of Forensic Results Across Europe, European Network of Forensic
Science Institutes, 2015.
proved the most suitable for solving the stated comparison problem due
[7] C. Aitken, F. Taroni, Statistics and the Evaluation of Evidence for Forensic
to directly addressing both examined sources of variance. ANOVA-TP Scientists, Wiley, Chichester, UK, 2004.
was also deemed to deliver acceptable results. However, its perfor- [8] F. Taroni, S. Bozza, A. Biedermann, P. Garbolino, C. Aitken, Data Analysis in
mance degrades with increasing number of samples, which must be Forensic Science: A Bayesian Decision Perspective, Wiley, Chichester, UK, 2010.
[9] W. Bolstad, Bayesian Statistics, Wiley, 2007.
then encoded in too many separate response variables. Contrary to [10] C. Aitken, D. Lucy, Evaluation of trace evidence in the form of multivariate data,
ASCA which assumes constant and equal within-samples variance and Appl. Stat. 53 (2004) 109–122.
implies that the variables are uncorrelated, rMANOVA does not make [11] A. Martyna, G. Zadora, T. Neocleous, A. Michalska, N. Dean, Hybrid approach
combining chemometrics and likelihood ratio framework for reporting the evi-
such an assumption and accounts for the shape of the replicates data dential value of spectra, Anal. Chim. Acta 931 (2016) 34–46.
around the samples averages. This feature enables for comprehensive [12] A. Martyna, A. Michalska, G. Zadora, Interpretation of FTIR spectra of polymers and
and detailed inspection of the variance aspects and effectively addresses Raman spectra of car paints by means of likelihood ratio approach supported by
wavelet transform for reducing data dimensionality, Anal. Bioanal. Chem. 407
them for finding successful solution to the comparison problem of the (2015) 3357–3376.
examined chromatograms. From all considered models rMANOVA- [13] A. Martyna, K.-E. Sjåstad, G. Zadora, D. Ramos, Analysis of lead isotopic ratios of
LogReg should be recommended for evidence evaluation as it demon- glass objects with the aim of comparing them for forensic purposes, Talanta 105
(2013) 158–166.
strates the best performance in all senses. [14] A. Martyna, D. Lucy, G. Zadora, B. Trzcinska, D. Ramos, A. Parczewski, The evi-
An assessment of the likelihood ratio models performance leads to dential value of microspectrophotometry measurements made for pen inks, Anal.
an optimistic observation of moderate levels of false positive and false Methods 5 (2013) 6788–6795.
[15] A. Michalska, A. Martyna, J. Zieba-Palus, G. Zadora, Application of a likelihood
negative responses. Nevertheless the box-empirical cross entropy plots
ratio approach in solving a comparison problem of Raman spectra recorded for blue
do not always provide noticeable reduction of information loss. This automotive paints, J. Raman Spectrosc. 46 (2015) 772–783.
shows that the results of ECE for assessing model performance must be [16] P. Zerzucha, B. Walczak, Concept of (dis)similarity in data analysis, Trends Anal.
interpreted carefully in the case of small databases. More in-depth Chem. 38 (2012) 116–128.
[17] D. Porro, R. Duin, I. Talavera, N. Hernandez, Alternative representations of spectral
analysis of this issue will be conducted in the future. However, the data for classification, Proceedings ASCI, 15th Annual Conference of the Advanced
rMANOVA technique followed by a logistic regression score-based LR School for Computing and Imaging (2009).
model showed great stability and superior performance, and it is the [18] D. Porro-Munoz, I. Talavera, R. Duin, N. Hernandez, M. Orozco-Alzate,
Dissimilarity representation on functional spectral data for classification, J.
option recommended as a first choice. Chemom. 25 (2011) 476–486.
With the increase in diversity of microtraces and decrease in time [19] A. van Es, W. Wiarda, M. Hordijk, I. Alberink, P. Vergeer, Implementation and
and resources available for data collection, researchers are often pre- assessment of a likelihood ratio approach for the evaluation of LA-ICP-MS evidence
in forensic glass analysis, Sci. Justice 57 (2017) 181–192.
sented with limited databases (comprising fewer samples than the [20] G. Morrison, E. Enzinger, Score based procedures for the calculation of forensic
number of sample characteristics). The presented results show that the likelihood ratios – scores should take account of both similarity and typicality, Sci.
proposed methodology yields reliable conclusions in the comparison Justice 58 (2018) 47–58.
[21] Y. Tang, S. Srihari, Likelihood ratio estimation in forensic identification using si-
problem even for such small databases. What is more, the problem of
milarity and rarity, Pattern Recognit. 47 (2014) 945–958.
growing dimensionality of the analytical data delivered by developing [22] D. Ramos, J. Gonzalez-Rodriguez, G. Zadora, C. Aitken, Information-theoretical
key analytical techniques seems to be still ongoing. Thus it will become assessment of the performance of likelihood ratio computation methods, J. Forensic
Sci. 58 (2013) 1503–1518.
typical that the number of samples will be reducing with augmenting
[23] D. Ramos, J. Gonzalez-Rodriguez, Reliable support: measuring calibration of like-
number of variables. The proposed approach addresses particularly the lihood ratios, Forensic Sci. Int. 230 (2013) 156–169.
problem of assessing the evidential value for datasets showing such a [24] D. Meuwly, D. Ramos, R. Haraksim, A guideline for the validation of likelihood
structure. ratio methods used for forensic evidence evaluation, Forensic Sci. Int. Data Brief 10
(2017) 75–92.
Development of a validation methodology suitable for the forensic [25] F. Marini, D. de Beer, E. Joubert, B. Walczak, Analysis of variance of designed
practice was an important part of this research. The results presented chromatographic data sets: the analysis of variance-target projection approach, J.
here prove that the proposed methods can be applied directly in the Chromatogr. A 1405 (2015) 94–102.
[26] G. Zwanenburg, H. Hoefsloot, J. Westerhuis, J. Jansen, A. Smilde, ANOVA-principal
forensic expert practice for interpreting the pyrograms recorded for component analysis and ANOVA-simultaneous component analysis: a comparison,
evidence materials. Further research will focus on incorporating the J. Chemom. 25 (2011) 561–567.
mass spectra available from Py-GC–MS analyses to evaluate the evi- [27] F. Marini, D. de Beer, N. Walters, A. de Villiers, E. Joubert, B. Walczak, Multivariate
analysis of variance of designed chromatographic data. A case study involving
dential value of the analysed samples. fermentation of rooibos tea, J. Chromatogr. A 1489 (2017) 115–125.
[28] J. Engel, L. Blanchet, B. Bloemen, L. Heuvel, U. Engelke, R. Wevers, L. Buydens,
Acknowledgements Regularized MANOVA (rMANOVA) in untargeted metabolomics, Anal. Chim. Acta
899 (2015) 1–12.
[29] A. Martyna, G. Zadora, I. Stanimirova, D. Ramos, Wine authenticity verification as a
The authors wish to thank Dr. Rafal Borusiewicz (Institute of forensic problem: an application of likelihood ratio test to label verification, Food
Forensic Research in Krakow) for conducting part of Py-GC–MS ana- Chem. 154 (2014) 287–295.
[30] P. Wlasiuk, A. Martyna, G. Zadora, A likelihood ratio model for the determination of
lyses.

214
A. Martyna et al. Journal of Analytical and Applied Pyrolysis 133 (2018) 198–215

the geographical origin of olive oil, Anal. Chim. Acta 853 (2015) 187–199. refractive index, J. Forensic Sci. 54 (2009) 49–59.
[31] A. Bolck, H. Ni, M. Lopatka, Evaluating score-and feature-based likelihood ratio [41] N. Brümmer, J. du Preez, Application independent evaluation of speaker detection,
models for multivariate continuous data: applied to forensic MDMA comparison, Comput. Speech Lang. 20 (2006) 230–275.
Law Probab. Risk 14 (2015) 243–266. [42] G. Zadora, D. Ramos, Evaluation of glass samples for forensic purposes – an ap-
[32] M. Hazewinkel, Y. Subbotin (Eds.), Encyclopedia of Mathematics, Springer, 2001. plication of likelihood ratios and an information-theoretical approach, Chemom.
[33] T. Bloemberg, J. Gerretzen, A. Lunshof, R. Wehrens, L. Buydens, Warping methods Intell. Lab. Syst. 102 (2010) 63–83.
for spectroscopic and chromatographic signal alignment: a tutorial, Anal. Chim. [43] D. Ramos, G. Zadora, Information-theoretical feature selection using data obtained
Acta 781 (2013) 14–32. by scanning electron microscopy coupled with an energy dispersive X-ray spec-
[34] N. Nielsen, J. Carstensen, J. Smedsgaard, Aligning of single and multiple wave- trometer for the classification of glass traces, Anal. Chim. Acta 705 (2011) 207–217.
length chromatographic profiles for chemometric data analysis using correlation [44] M. Ayer, H. Brunk, G. Ewing, W. Reid, E. Silverman, An empirical distribution
optimised warping, J. Chromatogr. A 805 (1998) 17–35. function for sampling with incomplete information, Ann. Math. Stat. 26 (1955)
[35] M. Daszykowski, B. Walczak, Target selection for alignment of chromatographic 641–647.
signals obtained using monochannel detectors, J. Chromatogr. A 1176 (2007) 1–11. [45] M. Best, N. Chakravarti, Active set algorithms for isotonic regression; a unifying
[36] S. Baek, A. Park, Y. Ahn, J. Choo, Baseline correction using asymmetrically re- framework, Math. Program. 47 (1990) 425–439.
weighted penalized least squares smoothing, Analyst 140 (2015) 250–257. [46] D. Ramos, J. Gonzalez-Rodriguez, G. Zadora, C. Aitken, Information-theoretical
[37] F. Dieterle, A. Ross, G. Schlotterbeck, H. Senn, Probabilistic Quotient Normalization assessment of the performance of likelihood ratio computation methods, J. Forensic
as robust method to account for dilution of complex biological mixtures. Sci. 58 (2013) 1503–1518.
Application in 1H NMR metabonomics, Anal. Chem. 78 (2006) 4281–4290. [47] R. Core Team, R: A Language and Environment for Statistical Computing, R
[38] K. Varmuza, P. Filzmoser, Introduction to Multivariate Statistical Analysis in Foundation for Statistical Computing, Vienna, Austria, 2012 ISBN 3-900051-07-0.
Chemometrics, Wiley, 2009. http://www.R-project.org/.
[39] R. Tauler, B. Walczak, S. Brown, Comprehensive Chemometrics, Elsevier, 2009. [48] MATLAB, Version 9.2.0.556344 (R2017a), The MathWorks Inc., Natick, MA, 2017.
[40] G. Zadora, Classification of glass fragments based on elemental composition and

215

Das könnte Ihnen auch gefallen