Validation of Microbiological Methods For Food: DR Sharon Brunelle

13
VALIDATION OF MICROBIOLOGICAL METHODS

FOR FOOD
Dr Sharon Brunelle*
Microbiological methods, either qualitative (quantal; presence/absence) or quantitative are

composed of multiple steps, including sample preparation, sample analysis, data interpreta-
tion and confirmation (Fig. 13.1). The optimal integration of these steps defines the method
and is critical to providing reliable results. Nowadays, microbiological methods range from
fully manual methods to partially automated methods to fully automated methods.
Quantal methods for the detection of pathogens in food are generally aimed at detecting 1
colony-forming unit (cfu) per test portion (typically 25 g of food matrix). In order to achieve
this level of detection, the organisms in the test portion of food must be enriched to a level
detectable by the analytical assay. The detectable level for polymerase chain reaction (PCR)
methods is in the range of 103–104 cfu/ml and for immunoassays is 105–106 cfu/ml. Both raw
foods and processed foods set challenges at a detection level of 1 cfu/test portion. Raw foods
carry a higher bacterial load than processed foods, so the challenge is to detect 1 cfu of target
Qualitative Sample Sample Isolation and

Analysis
detection preparation enrichment confirmation
Quantitative Sample Data

Analysis
estimation preparation processing
FIGURE 13.1 Steps comprising qualitative and quantitative microbiological methods.
*Brunelle Biotech Consulting, Technical Consultant to AOAC International and AOAC Research Institute
Woodinville WA, USA.
Statistical Aspects of the Microbiological Examination of Foods

Copyright © 2008 by Academic Press. All rights of reproduction in any form reserved. 259
CH013-N53039.indd 259 5/26/2008 8:56:40 PM

260 STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
organism among 104–106 cfu of background flora and as a result, the enrichment media used
often contain selective agents to suppress non-target organisms. For processed foods, the
challenge lies in the state of the organism. After processing, any surviving microbial cells
are likely to be injured, so the method must allow for recovery of injured cells. In this case,
a pre-enrichment broth without selective agents can be employed to allow the organism to
repair itself and reproduce. Transfer to a selective broth or addition of selective agents can
then be used to suppress non-target organisms once repair and recovery have taken place.
Examples of qualitative methods include use of chromogenic agars, enzyme-linked immuno-
sorbent assays (ELISAs), lateral flow immunoassays, enzyme-linked gene probe assays and
PCR techniques.
Quantitative methods are aimed at estimating the level of pathogens, coliforms and E.
coli, yeast and mould, or total aerobic bacterial load. A test portion, typically 50 g of food
matrix, is diluted into a broth or buffer at a ratio of 1:9. The diluted homogenised test
portion is analysed directly without enrichment. Quantitative methods include direct meth-
ods such as enumeration by plate count, or indirect methods, such as the most probable
number (MPN), impedance measurement and real time PCR (RT-PCR). MPN procedures
require replicate tubes at multiple dilutions with a qualitative readout of each tube, fol-
lowed by the calculation of the ‘most likely’ estimate of the contamination level based on
the number of tubes positive at each dilution level. Impedance methods measure the time
required, under controlled growth conditions, to cross a threshold impedance value. The
time required to cross the threshold value is related to the contamination level of the sample
using a standard curve. RT-PCR measures the number of cycles required to cross a thresh-
old value (Ct value). The Ct value relates to the contamination level of the sample in that
the higher the contamination level, the fewer PCR cycles required to reach the threshold.
Ct values are converted to cfu/test portion using a standard curve usually built into the
RT-PCR software.
THE STAGES OF METHOD DEVELOPMENT
Method development begins with a concept of the intended use of the method. This defines
the type of test, the target analyte, the applicable foods or matrices and the user of the
method. The type of test may be qualitative detection or quantitative estimation of contami-
nation. The target analyte, meaning the microorganism or toxin that the method detects,
quantifies or identifies, can be a genus, a species or a serovar. Matrices are generally foods
or environmental surfaces. Foods are typically divided into categories, such as meat, poultry,
seafood, fruits and vegetables, etc. and these categories are subdivided into raw, heat proc-
essed, frozen, fermented, etc. as appropriate. Examples of environmental surfaces are those
that can be found in a food manufacturing facility, such as stainless steel, rubber, sealed con-
crete, ceramic or glass, plastic, wood, food-grade painted surfaces, air filter material and
cast iron. In microbiological food examination, the end user is typically a trained laboratory
technician. However, there are instances where this may not be the case, such as detection
CH013-N53039.indd 260 5/26/2008 8:56:41 PM

VALIDATION OF MICROBIOLOGICAL METHODS FOR FOOD 261
of biological threat agents where the end user could be a trained ‘first responder’ in the field.
As an example of an intended use statement, the method developer might state, ‘this method
is intended for the detection of Salmonella species in raw poultry in a microbiological
laboratory’.
Method development can begin once the concept of the method is defined by the intended
use. First, the ‘assay critical’ reagents are identified. If the method is an immunoassay, anti-
bodies are screened against pure culture target and non-target organisms to identify highly
selective candidates. For molecular-based assays, DNA sequences are identified and targeted
with primers and probes. These primers and probes are screened against DNA derived from
target and non-target organisms. Performance parameters, such as sensitivity, specificity, Ct
values and the like are measured under various assay conditions to optimize performance
and to determine the best candidate critical reagents. Assays are then constructed around
the ‘best candidate’ critical reagents, based on the initial pure-culture screening. The assays
are challenged with pure cultures and a few selected foods representative of the breadth of
the intended matrices. During this process, sample preparation methodology is also varied
to ensure that the target analyte is presented to the critical reagent(s) in an optimal manner
that facilitates detection. For immunoassays, this means ensuring that the antigenic target is
accessible to the antibody.
In the final stages of method development, the optimized assay configuration is tested
against matrix samples inoculated at various levels of target analyte. Final optimization of
assay conditions (buffers, critical reagent concentrations, reaction times, reaction tempera-
tures, etc.) occurs at this stage. Statistically designed experiments should be carried out to
verify optimal assay conditions, to examine the ruggedness of the assay and to establish
‘guard bands’ on the critical assay parameters. Guard bands are defined in this context as
the variations of assay conditions that are tolerated by the method without significantly
affecting the method results.
Once method development is complete, the method is transferred to the manufactur-
ing facility and process development occurs. The goal of process development is to devise a
manufacturing scheme that produces assay components consistent with the final assay design
in a reproducible manner. Some re-optimization of the assay design may be required upon
scaled-up production in order to achieve the assay performance observed in the final method
development stage. Additionally, new ‘guard bands’ may need to be established on full-scale
manufactured assay components. The manufactured assay components and method instruc-
tions are now ready for validation.
WHAT IS VALIDATION?
Validation is the establishment of method performance in a single laboratory or multiple

laboratories under controlled conditions. A method can be validated to demonstrate that it
performs as claimed; that it performs at least as well as a validated standard method; or that
it performs to a set of established criteria. In general, current practices in food microbiology
CH013-N53039.indd 261 5/26/2008 8:56:41 PM

(Anon, 2003; Feldsine et al., 2002) validate by comparison to a standard, typically an offi-
cial or regulatory method, such as an AOAC International Official Method of AnalysisSM
(AOAC, 2007a), an FDA Bacteriological Analytical Manual (BAM) method (FDA, 2007),
a USDA Microbiological Laboratory Guidebook (MLG) method (USDA, 2007) or an
International Standards Organization (ISO) method (Anon, 2007). Some organizations, such
as ISO and AOAC, are re-visiting validation standard practices and considering a paradigm
shift to establishing method performance independent of comparison to a standard (FDA,
2007b). This could result in accepting evidence to demonstrate that a method performs as
claimed or, alternatively, accepting comparative method performance against established
acceptance criteria. The former requires the end user to determine whether the method per-
formance meets their needs for a particular use, that is, whether the method is ‘fit for pur-
pose’. For example, a method might detect Salmonella reliably down to a level of 10 cfu/25 g.
While this does not meet regulatory standards of 1 cfu/25 g, there might be non-regulatory
uses for such a method. The latter establishes whether the method meets the requirements
for a particular intended use, such as regulatory testing. In the range of 1 cfu/test portion,
it is also feasible to compare method performance to the theoretical Poisson distribution of
target organisms in the food matrix.
Validation of a microbiological method for food generally includes a test of inclusivity and
exclusivity to establish the analytical selectivity of the method for the analyte. In other words,
is the method inclusive in the scope of the target analyte and is it exclusive of cross reactivity
to closely related non-target analytes? Such pure culture studies establish the analytical scope
of the method.
The method is then challenged with a range of artificially- or naturally-contaminated food
matrices to establish: (1) that the food matrices do not interfere with either the growth of the
organism or its detection, and (2) that background flora found naturally in foods do not sup-
press the enrichment or detection of the target organism. It may be discovered, for example,
that a particular method well suited to detection of analyte in processed foods where back-
ground flora is generally very low, does not perform well with unprocessed or raw foods that
tend to have much higher levels of competing background flora. It is for these reasons that
a method should be considered validated only for those foods or food types that have been
tested successfully in the validation study.
While validation establishes the method performance in one or multiple laboratories,
verification establishes proper implementation of a method in the end user’s laboratory.
Using a specified verification protocol, the laboratory performs the method and ensures that
results are comparable to the method performance established in the validation study.
Selectivity Testing
For inclusivity testing of qualitative methods, 50 or more strains are chosen for analysis based
on the target scope of the method. For example, if the target group is a family, then representa-
tive strains from the various genera within that family are chosen. If the target is a species, then
a range of strains within that species should be chosen to represent the antigenic or genetic
CH013-N53039.indd 262 5/26/2008 8:56:41 PM

diversity. In the case of a genus-level test for Salmonella, 100 strains should be chosen due to
the much greater size and variation of the genus compared to other pathogens.
Exclusivity testing requires at least 30 organisms closely related to the target group. The
test results (positive or negative) of each organism are reported and % inclusivity and %
exclusivity are expressed as the ratio of the number of organisms correctly detected to the
number of organisms tested. For instance, a method for which the inclusive strains resulted
in 49 positive responses out of 50 strains tested, has an inclusivity ratio of 49/50.
Selectivity testing is not applicable to quantitative methods that measure general catego-
ries of organisms, such as total aerobic counts or yeast and mould counts. For quantitative
methods that target a genera or species, however, 30 strains of the target microorganism are
tested. Exclusivity requires at least 20 closely related non-target organisms. As for qualita-
tive methods, the % inclusivity and % exclusivity values are determined.
Method Comparison: Qualitative Methods
Qualitative method validation studies can be divided into two types – paired sample design
and unpaired sample design (Fig. 13.2). A paired sample design is one in which a single test
portion is enriched and the enriched test portion is analysed by both the alternative method
(the new method being validated) and the reference method (the established official or regu-
latory method). This design results from a common enrichment scheme for the two methods.
When the enrichment conditions differ between the two methods, an unpaired sample design
is used. In the unpaired design, distinct test portions are enriched and analysed by the two
methods or the enrichment schemes diverge after a common preenrichment. It must be noted
that, regardless of presumptive results, the alternative method test portion enrichment cul-
tures must be subjected to the confirmatory procedures outlined in the reference method in
order to establish the true status (positive or negative) of the test portions. The distinction
between paired and unpaired study designs is critical to the resulting statistical analyses.
Comparison of two qualitative methods is best accomplished near the limit of detection
of one or both of the methods. It is only at low levels of contamination (approximately
0.2–2 cfu/test portion) that differences between the methods can be observed. Thus, the goal
of the method comparison study is to achieve an artificial or natural contamination level that
yields fractional positive results across all replicates, preferably near 50% positive for one
of the methods. The data can then be statistically analysed for a significant difference
between the methods. If fractional positive results are not obtained for at least one of the
methods, then the study is repeated at a higher or lower contamination level as needed to
achieve fractional positive results. To avoid having to repeat a study, many analysts will inoc-
ulate at more than one level to increase the chances of obtaining fractional positive results.
The inoculation of food matrices is carried out using a single strain of target organism. Food
isolates are preferred and a different strain is used for each food type tested. The inoculating
strain is cultured in an appropriate enrichment broth and the concentration is determined by
plate count. Enough food matrix is inoculated at one level to carry out testing of replicate
test portions by the alternative and reference methods as well as the test portions required
CH013-N53039.indd 263 5/26/2008 8:56:41 PM

CH013-N53039.indd 264
264
Test portion
25 g
Test portion Primary Test portion 1 Test portion 2

25 g enrichment in 25 g 25 g
Media A
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS

Reference Alternative
Enrichment in method method
Secondary Secondary enrichment enrichment
Media A
enrichment in enrichment in
Media B Media C
Reference Alternative Reference Alternative Reference Reference Alternative Reference

method method method method method method method method
detection detection detection detection detection detection detection detection
Reference Alternative Reference Alternative Reference Reference Alternative Reference

method method method method method method method method
result result result result result result result result
Confirmed Presumptive Confirmed Presumptive Confirmed Confirmed Presumptive Confirmed
(a) (b) (c)
FIGURE 13.2 Examples of paired and unpaired study designs: (a) Paired samples; (b) Unpaired samples with different secondary enrichment;
and (c) Unpaired samples with different primary enrichment.
5/26/2008 8:56:41 PM
for MPN analysis (Blodgett, 2003), (Goulden, 1959). Typically, these amounts are approxi-
mately 900 g for paired samples and approximately 1400 g for unpaired samples.
Statistical Analysis of Paired Sample Designs: Single Laboratory and

Collaborative Laboratory Validation
Data are collected and analysed by food matrix and contamination level. The food matrix
is inoculated in bulk, homogenized, and allowed to stabilize for an appropriate period of
time. On the day of analysis, replicate 25-g test portions are randomly removed from the
bulk and homogenized in the enrichment medium at a 10-fold dilution of the test portion
(25-g food matrix with 225 ml enrichment medium).
On the same day that replicate test portion enrichments are begun, the bulk contami-
nated matrix is also examined by MPN (Blodgett, 2003) to determine the contamination
level at the initiation of the study. Typically, a three-tube MPN at three or four levels is
carried out. For example, the technician might analyse three 100 g portions, three 10 g
portions, three 1 g portions and three 0.1 g portions. Following analysis by the reference
method, the number of positives at each level is compared to an MPN table to yield the
probable level of contamination of the bulk matrix.
Single laboratory studies involve the analysis of twenty replicate test portions at a single
contamination level by both methods. Collaborative studies typically include 12–15 laborato-
ries each analysing six replicate test portions at each contamination level. At least 10 valid data
sets for each food type are required. Collaborative data are first reviewed for completeness and
any laboratory yielding data that appear aberrant is questioned to determine whether there
is cause to eliminate the data set. Sample integrity upon receipt, equipment malfunctions and
protocol deviations would be reasons for eliminating aberrant data from further analysis. If no
assignable cause can be determined, the data are considered valid and representative of inter-
laboratory variation. Such variation could be indicative of poorly written method instructions,
poor training or even variation in the prevalence of contamination across the test portions,
rather than the inherent variability of the method itself, and therefore aberrant data should
always be investigated thoroughly to determine whether improvements should be made.
Once the initial data review is complete, multi-laboratory collaborative data should be
checked for inter-laboratory homogeneity using a Pearson Chi Square test (LaBudde, 2006).
The Chi-Square test can be carried out by constructing a 2 L contingency table (Table
13.1), with one row indicating the number of positive results for the method across laborato-
ries, and the other row indicating the negative results.
The 2 value is calculated by the equation: 2 ∑ {(ai E1i )2 /E1i (bi E 2i )2 / E 2i },
where the expected values E1i and E2i are given by: E1i A(ai bi)/(A B), E2i B(ai bi)/(A B)
and ai, bi, A and B are as defined in Table 13.1.
The P-value of the calculated 2 is computed for L 1 degrees of freedom. If the
value of P is less than the tabulated value of 2 at 0.05, the data are not independent
across laboratories and consequently the equations for between-method comparisons are
not valid. If the data are independent across laboratories, then the collaborative data can be
compiled by food matrix and contamination level for further statistical analysis.
CH013-N53039.indd 265 5/26/2008 8:56:42 PM

TABLE 13.1
Contingency Table for Inter-laboratory Homogeneity
Result Lab #1 Lab #2 ... Lab #L Total
Positive a1 a2 ... aL A
Negative b1 b2 ... bL B
a1 through aL are the number of test portions yielding positive results and b1
through bL are the number of test portions yielding negative results by the can-
didate method, for laboratories 1 through L, respectively. A and B are the total
number of test portions yielding positive or negative results, respectively, by the
candidate method.
Performance Indicators
The performance indicators for qualitative methods include sensitivity, specificity, false nega-
tive rate and false positive rate. Sensitivity is defined as the proportion of true positive sam-
ples that test positive by the alternative method at a single contamination level. Likewise,
specificity is defined as the proportion of true negative test portions that test negative by the
alternative method. The true status of each test portion is typically defined by the reference
method cultural results. The false negative rate is the proportion of true positive test portions
that test negative by the alternative method and is equal to 1 minus sensitivity. The false posi-
tive rate is likewise the proportion of true negative test portions that yield positive results by
the alternative method and is equal to 1 minus specificity. Because of the simple relationships
between sensitivity and false positive rate, and specificity and false negative rate, it is not
necessary to report both sets of performance indicators. Typically the false positive and false
negative rates are more meaningful to the end user and are the preferred indicators to report.
Calculation of the performance indicators is easily accomplished by tabulating the single
laboratory or compiled collaborative laboratory data as shown in Table 13.2. The perform-
ance indicators are based on the values of a, b, c and d as defined in Table 13.2. They are
calculated as follows:
Sensitivity a/(a b)
False negative rate b/(a b) 1 – sensitivity
Specificity d/(c d)
False positive rate c/(c d) 1 – specificity
Sensitivity, or relative sensitivity, and false negative rate vary with the level of contamina-
tion of the matrix and, therefore, these performance indicators are reported in conjunction
with the contamination level.
Test for Significant Difference

McNemar’s Chi Square (2) test is used to determine whether the two methods are significantly
different (Siegel, 1956). This is not to say that the methods are equivalent if no significant
CH013-N53039.indd 266 5/26/2008 8:56:42 PM

TABLE 13.2
Data Tabulation of Paired Sample Method Comparison Study
Alternative method positive Alternative method negative
Reference method positive a b

Reference method negative c d
a number of positive replicate tests by both the candidate and reference methods; b number
of negative replicate tests by the candidate method, that are positive by the reference method;
c number of positive replicate tests by the candidate method that are negative by the reference
method; d number of negative replicate tests by both the candidate and reference methods.
difference is found, but rather that a significant difference was not detected. Tests for statistical
equivalence require more statistical power and thus, higher numbers of replicate test portions.
Using the data from Table 13.2, 2 is calculated using the McNemar formula:
(b c)2
2
bc
The experimental 2 value is compared to the tabulated 2 value with v 1 (degree of

freedom) at 0.05 (2 3.84) (Pearson and Hartley, 1974). Experimental values greater
than the tabulated value indicate a significant difference between the two methods.
Statistical Analysis of Unpaired Sample Design: Single Laboratory and

Collaborative Laboratory Validation
It has become common for commercial test kit manufacturers also to develop proprie-
tary enrichment media, precluding the use of the paired sample design. As for the paired
sample design, the bulk matrix is inoculated, homogenized and stabilized and test portions
are taken from the bulk. Current practice is to randomly remove a 2x test portion (50 g),
homogenize, and split this into two 25 g test portions, one to be analysed by the reference
method and one to be analysed by the alternative method. All test portions analysed by the
alternative method, regardless of presumptive result, are subjected to the reference method
confirmation procedure to establish the true status of the test portions. Because there is
no protocol for unpaired samples, statistical analyses of data from the test portions are
performed in the same way as for paired samples. But if we assume Poisson distribution
of target cells, at the low inoculation levels required for fractional positive results, then
we cannot assume that the two test portions are equivalent with respect to the presence
or absence of target organism. Hence, pairing of the test results is not justified but current
validation guidelines (Anon, 2003; Feldsine et al., 2002) do not adequately address this
situation. Alternative statistical methodologies for unpaired sample study designs are pre-
sented below.
CH013-N53039.indd 267 5/26/2008 8:56:42 PM

TABLE 13.3
Data Tabulation of Unpaired Sample Method Comparison Study
Confirmed positive Confirmed negative
Alternative method Presumptive positive A B

Presumptive negative C D
Reference method E F
A number of presumptive positive replicate results that were confirmed positive; B number of
presumptive positive replicate results that were confirmed negative; C number of presumptive
negative replicate results that were confirmed positive; D number of presumptive negative rep-
licate results that were confirmed negative; E number of replicates that gave positive results by
the reference method; F number of replicates that gave negative results by the reference method.
Performance Indicators
A data table (Table 13.3) is constructed from either single laboratory data or collabora-
tive data (after removal of any invalid data sets and testing for lab-to-lab homogeneity as
discussed above) compiled by food matrix and contamination level. The following perform-
ance parameters can be defined:
Relative sensitivity – the proportion of presumptive positive results that confirmed posi-
tive for the alternative method relative to the proportion of positive results for the reference
method A/E
False positive rate – the proportion of confirmed positive test portions for the alternative
method that yielded presumptive negative results B/(B D)
False negative rate – the proportion of confirmed negative test portions for the alterna-
tive method that yielded presumptive positive results C/(A C)
Note that presumptive results are not reported for the cultural reference methods. Note
also the assumption that false negative results cannot be obtained by the reference method,
not withstanding the microbial distribution issues at low inoculum levels since they cannot
be detected in unpaired samples. Occasionally false negatives from the reference method
are seen in paired sample studies, for instance where a PCR method is positive and the ref-
erence method is negative. Upon further examination (alternative confirmation procedures
or sheer persistence) one may eventually find a target colony.
Test for Significant Difference
For comparison of methods in an unpaired sample design, the Mantel-Haenszel 2 test

(Siegel, 1956) is used. The test statistic is:
(n 1)(AF (B C D)E)2
2 ,
(A E)(B C D F)(E F)(A B C D)
where n A B C D E F
CH013-N53039.indd 268 5/26/2008 8:56:42 PM

This test is compared to a tabulated 2 value, with v 1 df at the designated probability

level, usually 0.05, for which the tabulated 2 value is 3.84.
Method Comparison for Quantitative Methods: Single Laboratory and

Collaborative Laboratory Studies
Using either naturally contaminated or artificially contaminated food, three lots of the food
matrix covering at least 3 log units of contamination are analysed by both the alternative
and reference methods. If artificially contaminated food is tested, then one lot of uninocu-
lated food matrix must also be included. The lowest contamination level should be close to,
but not at, the limit of detection of the method. In the single laboratory study, five replicate
test portions at each level are examined by each method. Collaborative studies require a
minimum of eight valid data sets for each food type, so it is recommended that 10–14 labo-
ratories participate in the study. Each laboratory must analyse two test portions per con-
tamination level per food matrix.
Examination of Data for Outliers
The data are first subjected to visual inspection for obvious aberrant data. If aberrant data
are observed, the laboratory is contacted to determine whether there is cause for removal of
the data set as mentioned previously. The data can also be examined for statistical outliers
using the Grubbs, Cochran or Dixon tests (see Chapter 11), but removal of statistical out-
liers is now being discouraged in favour of investigating outlier data for assignable cause.
Additionally, robust methods of statistical analysis are becoming more widely accepted.
Graphical Representation of Data
Quantitative data are first normalized by logarithmic transformation, and then the data for
each food matrix are graphed with the alternative method results on the y-axis and the ref-
erence method results on the x-axis. Linear regression is performed to determine the slope
and linear correlation coefficient of the line. This is most easily done using a program such
as Microsoft Excel®. Figure 13.3 shows an example of pooled raw meat data from a total
aerobic count method (Kodaka, 2004) graphed against the reference standard plate count
method (AOAC, 2007b).
Performance Parameters
For quantitative methods, the performance parameters are repeatability, reproducibility

and relative standard deviation (see Chapter 11). This allows comparison of the method
variability at different concentrations. For each contamination level of each food in a single
laboratory, the mean of the log-transformed values is calculated and the standard deviation,
CH013-N53039.indd 269 5/26/2008 8:56:42 PM

Methods correlation at 35°C (n 60)

10.0
9.0
y 0.9914x 0.0061
8.0 R2 0.9955
log cfu/ml compact dry

7.0
6.0
5.0
4.0
3.0
2.0
1.0
0.0
0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0
log cfu/ml pour plate
FIGURE 13.3 Graph of pooled raw meat data from the validation of the Nissui Compact Dry
TC (reproduced from Kodaka, 2004, by permission of AOAC International).
sR determined. From collaborative study data, the mean of the log-transformed data across
all labs and the standard deviation, sR, is calculated for each contamination level of each
food (see Table 13.5). In each case, the standard deviation values are divided by the mean
log count of colonies to arrive at the relative standard deviations.
Comparison of Means
The mean log10 values for the alternative and reference methods are compared for each lot
or contamination level of each matrix. For paired samples, the paired t test is most appro-
priate. For unpaired samples, the independent t test can be used for comparison of means
from two methods and a one-way analysis of variance (ANOVA) can be used if two or
more method means are to be compared.
The paired t-test (Goulden, 1959) is performed by calculating the mean difference
between paired samples and thence the standard error of the mean difference. Dividing the
mean difference by the standard error of the mean difference yields the t-statistic that is
t-distributed with n-1 degrees of freedom. Thus, t (x1 x2 ) / s2 / n , where x1 is the
mean of the first method, x2 is the mean of the second method and s2 is the combined vari-
ance. The calculated t-value can be compared to a table of critical values of the t distribu-
tion (Pearson and Hartley, 1958) to determine whether the difference is significant. With
0.05, if the calculated value is less than the tabled value then the difference is not sig-
nificant at the 5% level. Alternatively, the t test function of Microsoft Excel®, can be used
to obtain a P value. If P 0.05 the difference is generally accepted to be significant.
CH013-N53039.indd 270 5/26/2008 8:56:42 PM

TABLE 13.4
Validation Results of a Collaborative PCR Study with Unpaired Samples
Confirmed positive Confirmed negative
Alternative method Presumptive positive 40 1

Presumptive negative 1 30
Reference method 7 65
The independent t test (Goulden, 1959) uses the equation:
| x1 x2 |
t
⎛ (n1 n2 ) ⎞⎟ ⎛ (n1 1)s12 (n2 1)s22 ⎞⎟
⎜⎜ ⎟ ⎜ ⎟⎟
⎜⎜⎝ (n n ) ⎟⎟⎠ ⎜⎜⎜⎝ (n n 2) ⎟⎠
1 2 1 2
and as for the paired t test, the resultant t value is compared to the critical value for t with
(n1 n2 2) degrees of freedom. Again, the test function of Microsoft Excel® can be used
to yield a P value for the independent t test.
To perform the ANOVA (Wernimont, 1985), first organize the data as in Table 13.5. In
this case, three methods are being compared in a single lab study, but a similar table can be
constructed for collaborative data and for comparison of two methods. Begin by calculating
the Sum of Squares for each method. This is done using the equation SS xi2 xi2 / n,
where xi is an individual replicate value and n is the number of replicates tested by that
method. The sum of the squares within all methods is obtained by summing the sum of
squares for the individual methods, that is, SSwm inA SSi (SSA SSB ... SSn ),
where SSA, SSB to SSn are the individual sums of squares.
Next, the mean values from each method are used to calculate the between-method sum
of squares, SSbm, using the equation: ni (xi x)2 , where xi is the mean value for the ith
method, x is the overall mean value and ni is the number of replicates for the ith method.
The sums of squares, SSwm and SSbm, are transformed into Mean Squares, MSwm and
MSbm, by dividing by the appropriate degrees of freedom. For the within-method calcu-
lation, the degrees of freedom equals the sum of degrees of freedom for each method:
vi (ni 1) . For the between-method calculation, the degrees of freedom are the number
of methods minus one. We now arrive at MSwm SSwm/vwm and MSbm SSbm/vbm.
The purpose is to determine whether the observed variability between the means of the
methods can be attributed to the random variability between the replicates. To test this, we
calculate the ratio, F MSbm/MSwm and compare the resultant F-value to the critical value of
the F-distribution for 0.05 (Siegel, 1956) with vBM and vWM degrees of freedom, respec-
tively. If the observed value of F is equal to or greater than the critical F table value, then
the difference between method means is significant.
CH013-N53039.indd 271 5/26/2008 8:56:43 PM

CH013-N53039.indd 272
272
TABLE 13.5
Data Tabulation for Quantitative Collaborative Study – log10 Transformed Counts for One Contamination Level of One Food
Replicate Method 1 Method 2 Method 3 Total
1 x1 y1 z1
2 x2 y2 z2
3 x3 y3 z3
4 x4 y4 z4
5 x5 y5 z5
Mean x ∑ xi / n1 y ∑ yi / n2 z ∑ zi / n3 ∑ xi ∑ yi ∑ zi
x
n1 n2 n3
∑ ( xi x)2 ∑ ( yi y)2 ∑ ( zi z )2
sr
n1 1 n2 1 n3 1
RSDr sr 1 / x sr 2 / y sr 3 / z
SSwm SS1a SS2a SS3a

SSwm SS1a ∑ xi2 (∑ xi )2 / n1 SS 2a ∑ yi2 (∑ yi )2 / n2 SS3a ∑ zi2 (∑ zi )2 / n3
SSbm SS1b SS2b SS3b

SSbm SS 1b n1 ( x x)2 SS 2 b n2 (y y)2 SS3b n3 ( z z )2
MSbm SSbm/vbm
MSwm SSwm/vwm
F ratio MSbm/MSwm
SSwm within-method sum of squares; SSbm between-method sum of squares; MSbm between-method mean square; MSwm within-
method mean square; vbm degrees of freedom between methods (number of methods minus one); vwm degrees of freedom within methods
((n1 1) (n2 1) (n3 1)).
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
5/26/2008 8:56:43 PM
FUTURE DIRECTIONS
As new pathogens emerge and biothreat agents are targeted for detection, we enter an
arena in which reference cultural methods may not be established. This presents a new
paradigm for validation of microbiological methods. The AOAC Presidential Task Force
for Best Practices for Microbiological Method Validation (BPMM) has recommended that
qualitative methods be validated to determine performance parameters that are independ-
ent of comparison to a reference method and further that the most important parameter
to be determined is the 50% limit of detection or LOD50 (FDA, 2006). The LOD50 is the
point on the dose-response curve that results in positive responses for 50% of replicates
(Fig. 13.4). As can be seen in Fig. 13.4, the LOD50 is only part of the story. The three
curves shown all intersect at the same LOD50, but have very different slopes. Determining
the LOD90, for example, in addition to the LOD50 would better define the shape of the
dose-response curve and provide a more accurate description of the method performance at
low doses. The details of the study designs and statistical methodology to determine these
parameters are still being debated. Suffice to say that in the future, method validation will
likely move away from significance testing in method comparison and move toward provid-
ing an independent assessment of method performance. This independent assessment can
also be applied to reference methods in those cases where an appropriate reference method
exists.
100
80
% Positive response
60
40
20
0
1 0.8 0.6 0.4 0.2 0
log cfu/25 g test sample
FIGURE 13.4 Limit of detection curves.
CH013-N53039.indd 273 5/26/2008 8:56:44 PM

EXAMPLE 13.1 USE OF THE MANTEL-HAENSZEL 2 TEST TO ASSESS

THE RELATIVE SENSITIVITY OF TWO METHODS
DURING QUALITATIVE METHOD VALIDATION (DATA
OF FELDSINE ET AL., 2005)
A collaborative validation of a PCR method for E. coli O157:H7 examined independent

portions of inoculated food matrices by PCR and by a reference cultural method. Each of
15 collaborating laboratories examined 6 replicate test portions of ground beef at each of
three levels of inoculum – uninoculated, 0.08 cfu/25 g (low) and 0.29 cfu/25 g (high). The
reported valid data from 12 laboratories for the low level inoculation are shown in Table
13.4.
Visual examination of the data suggests that the alternative PCR method is much
more sensitive than the reference method but we need to assess the difference statisti-
cally. Since this validation used an unpaired sample design, we use the Mantel-Haenszel
2 test, for (21)(21) 1 degree of freedom, to assess the significance of the
difference between the results of the two methods. The equation is:
(n 1)(AF (B C D)E)2
2
(A E)(B C D F )(E F )(A B C D)
Then:
(72 1){(40 65) (1 1 30)7}2
2
(40 7)(1 1 30 65)(7 65)(40 1 1 30)
71{2600 224}2 71 23762

(47 97 72 72) 23, 633, 856
400, 8 2 1, 696
16 . 96
23, 633, 856
For 1 df, the probability is that the observed 2 value of 16.96 is likely to occur
with a frequency of 0.001, which is much higher than the critical 2 value of 3.84 at
0.05. This confirms statistically what was seen by observation – the PCR method is
significantly more sensitive than the reference cultural method.
CH013-N53039.indd 274 5/26/2008 8:56:44 PM

EXAMPLE 13.2 STATISTICAL VALIDATION OF A QUANTITATIVE

METHOD (DATA OF KINNEBERG AND LINDBERG
(2002))
A collaborative study to validate a rapid (Petrifilm™) method for total coliforms included
comparison of the rapid method, after 14 h and 24 h incubation, to the standard method
for the analysis of vanilla ice cream using violet red bile agar (VRBA; APHA, 1985).
Contaminated ice cream, prepared with low, medium and high levels of inoculum, was
examined. The log-transformed colony count data for the high level inoculation are pre-
sented in Table 13.6.
The mean ( x ), reproducibility (sR) and the relative standard deviation (RSDR) are
determined for each of the three methods as well as the overall mean ( x ) – the results
are shown in Table 13.6.
TABLE 13.6
Data Analysis for Validation of Petrifilm™ Coliform Method on Vanilla Ice Cream
At the High Inoculation Level, After Removal of Outliers
Petrifilm 14 h Petrifilm 24 h VRBA
Laboratory A B A B A B Total
1 4.653 4.716 4.748 4.785 4.903 4.978
2 4.398 4.643 4.415 4.653 4.204 4.342
3 4.833 4.833 4.845 4.929 4.826 4.875
4 4.934 4.778 4.934 4.778 4.968 4.778
5 4.663 4.602 4.681 4.613 4.785 4.672
6 4.851 5.127 4.869 5.130 5.207 5.152
7 4.778 4.763 4.778 4.763 4.954 4.919
8 4.398 4.342 4.623 4.591 4.669 4.748
9 4.763 4.531 5.029 4.863 4.940 4.908
10 4.845 4.820 4.857 4.820 4.505 4.690
11 4.690 4.806 4.708 4.813 4.771 4.949
Mean 4.717 4.783 4.808 4.769
sR 0.185 0.156 0.234
RSDR 0.039 0.033 0.049
SSwm 0.712 0.512 1.151 2.375

SSbm 0.0595 0.00431 0.0335 0.0973
MSbm 0.0486
MSwm 0.0377
F ratio 1.289
F(0.05, 2, 63) 3.150
Source: Data from Kinneberg and Lindberg (2002). SSwm within-method sum of
squares; SSbm between-method sum of squares; MSbm between-method mean square;
MSwm within-method mean square.
CH013-N53039.indd 275 5/26/2008 8:56:44 PM

Do the colony counts done using the 2 variants of the Petrifilm™ method and the refer-
ence method differ statistically?
xi2 ( xi )2
First we determine the SSwm for each method: SS .
n
For the 14 h Petrifilm™ method: SS 490.148 – (103.767) /22 490.148 – 489.436 0.712.
2
For the 24 h Petrifilm™ method, SS 0.512 and for the Reference method, SS 1.159.
Adding the method SS values together yields the sum of the squares within methods:
SSwm 0.712 0.512 1.159 2.383.
Next, we determine the sum of squares between the methods (SSbm) by adding the
square of the difference between the method mean ( x ) and the overall mean ( x ) multi-
plied by the number of test results (n), that is, SSbm ∑ ni (xi x )2 . For these data,
SSbm 22(4.717 4.769)2 22(4.783 4.769)2 22(4.808 4.769)2

0. 0 595 0.00431 0.0335 0.0973.
Then the SSwm and SSbm are divided by their respective degrees of freedom to yield the
mean squares within and between the methods:
MSwm 2.383/63 0.0378 and
MSbm 0.0973/2 0.0487.
Finally, we determine the F ratio of the mean squares between and within methods.
F MSbm/MSwm 0.0487/0.0378 1.287, for 0.05, with v1 2 and v2 63 degrees
of freedom. This value is less than the tabulated value of 3.150; hence, no significant dif-
ferences are detected between the mean values of the three methods.
References
Anon (2003) Microbiology of food and animal feeding stuffs – Protocol for the validation of alterna-
tive methods, ISO16140:2003. Geneva: International Organisation for Standardisation.
Anon (2007) International Organization for Standardization (ISO) online, http://www.iso.org/iso/en/
prods-services/ISOstore/store.html.
AOAC (2007a) Official Methods of Analysis online (2007) http://eoma.aoac.org. AOAC International,
Gaithersburg MD.
AOAC (2007b) Official Methods of Analysis online (2007) Method 966.23 http://eoma.aoac.org.
AOAC International, Gaithersburg MD.
APHA (1985) Standard Methods for the Examination of Dairy Products. American Public Health
Association, Washington DC.
Blodgett, R (2003) Most Probable Number Determination from Serial Dilutions, Bacteriological
Analytical Manual, Appendix 2, http://www.cfsan.fda.gov/~ebam/bam-a2.html.
FDA (2007) Bacteriological Analytical Manual online, http://www.cfsan.fda.gov/~ebam/bam-toc.html.
FDA (2006) Final Report and Executive Summaries from the AOAC International Presidential Task
Force on Best Practices in Microbiological Methodology, http://www.cfsan.fda.gov/~comm/bpm-
mtoc.html.
CH013-N53039.indd 276 5/26/2008 8:56:45 PM

Feldsine, P, Abeyta, C, and Andrews, WH (2002) AOAC International Methods Committee

Guidelines for validation of qualitative and quantitative food microbiological official methods of
analysis. J. Assoc Offic. Anal Chem. Int., 85, 1187–1200.
Feldsine, PT, Green, ST, Lienau, AH, Stephens, J, Jucker, MT, and Kerr, DE (2005) Evaluation of the
assurance GDS™ for E. coli O157:H7 method and assurance GDS for shigatoxin genes method in
selected foods: Collaborative study. J. Assoc Offic. Anal Chem. Int., 88, 1334–1348.
Goulden, CH (1959) Methods of Statistical Analysis, 2nd edition. Wiley, New York, USA.
Kinneberg, KM and Lindberg, KG (2002) Dry rehydratable film method for rapid enumeration of
coliforms in foods (3M™ Petrifilm™ Rapid Coliform Count Plate): Collaborative study. J. Assoc
Offic. Anal Chem. Int., 85, 56–71.
Kodaka, H (2004) Nissui pharmaceutical and Neogen kits granted PTM status. Inside Lab. Manage.,
8(4), 19–22.
LaBudde, R (2006) Statistical analysis of interlaboratory validation studies. IV. Example analy-
sis of matched binary data. Technical Report 233. Virginia Beach, VA 23464 USA, Least Cost
Formulations, Ltd.
Pearson, ES and Hartley, HO (1958) Biometrika Tables for Statisticians, 6th edition, Vol. 1.
Cambridge University Press, Cambridge, UK.
Siegel, S (1956) Nonparametric Statistics for the Behavioral Sciences. McGraw-Hill Book Co, New
York NY, USA.
USDA (2007) Microbiology Laboratory Guidebook online, http://www.fsis.usda.gov/Science/
Microbiological_Lab_Guidebook/index.asp.
Wernimont, GT (1985) Use of statistics to develop and evaluate analytical methods. Spendley, W
(ed.) AOAC Int. Gaithersburg, MD, USA.
CH013-N53039.indd 277 5/26/2008 8:56:45 PM

Validation of Microbiological Methods For Food: DR Sharon Brunelle

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Validation of Microbiological Methods For Food: DR Sharon Brunelle

Hochgeladen von

Copyright:

Verfügbare Formate

13

VALIDATION OF MICROBIOLOGICAL METHODS

Microbiological methods, either qualitative (quantal; presence/absence) or quantitative are

Qualitative Sample Sample Isolation and

Quantitative Sample Data

FIGURE 13.1 Steps comprising qualitative and quantitative microbiological methods.

Statistical Aspects of the Microbiological Examination of Foods

CH013-N53039.indd 259 5/26/2008 8:56:40 PM

THE STAGES OF METHOD DEVELOPMENT

CH013-N53039.indd 260 5/26/2008 8:56:41 PM

Validation is the establishment of method performance in a single laboratory or multiple

CH013-N53039.indd 261 5/26/2008 8:56:41 PM

CH013-N53039.indd 262 5/26/2008 8:56:41 PM

Method Comparison: Qualitative Methods

CH013-N53039.indd 263 5/26/2008 8:56:41 PM

Test portion Primary Test portion 1 Test portion 2

STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS

Reference Alternative Reference Alternative Reference Reference Alternative Reference

Reference Alternative Reference Alternative Reference Reference Alternative Reference

(a) (b) (c)

Statistical Analysis of Paired Sample Designs: Single Laboratory and

CH013-N53039.indd 265 5/26/2008 8:56:42 PM

Result Lab #1 Lab #2 ... Lab #L Total

Test for Significant Difference

CH013-N53039.indd 266 5/26/2008 8:56:42 PM

Alternative method positive Alternative method negative

Reference method positive a b

The experimental 2 value is compared to the tabulated 2 value with v 1 (degree of

Statistical Analysis of Unpaired Sample Design: Single Laboratory and

CH013-N53039.indd 267 5/26/2008 8:56:42 PM

Confirmed positive Confirmed negative

Alternative method Presumptive positive A B

Test for Significant Difference

For comparison of methods in an unpaired sample design, the Mantel-Haenszel 2 test

CH013-N53039.indd 268 5/26/2008 8:56:42 PM

This test is compared to a tabulated 2 value, with v 1 df at the designated probability

Method Comparison for Quantitative Methods: Single Laboratory and

Examination of Data for Outliers

Graphical Representation of Data

For quantitative methods, the performance parameters are repeatability, reproducibility

CH013-N53039.indd 269 5/26/2008 8:56:42 PM

Methods correlation at 35°C (n 60)

log cfu/ml compact dry

CH013-N53039.indd 270 5/26/2008 8:56:42 PM

Confirmed positive Confirmed negative

Alternative method Presumptive positive 40 1

The independent t test (Goulden, 1959) uses the equation:

CH013-N53039.indd 271 5/26/2008 8:56:43 PM

Replicate Method 1 Method 2 Method 3 Total

SSwm SS1a  SS2a  SS3a

SSbm SS1b  SS2b  SS3b

FIGURE 13.4 Limit of detection curves.

CH013-N53039.indd 273 5/26/2008 8:56:44 PM

EXAMPLE 13.1 USE OF THE MANTEL-HAENSZEL 2 TEST TO ASSESS

A collaborative validation of a PCR method for E. coli O157:H7 examined independent

71{2600 224}2 71 23762

CH013-N53039.indd 274 5/26/2008 8:56:44 PM

EXAMPLE 13.2 STATISTICAL VALIDATION OF A QUANTITATIVE

Petrifilm 14 h Petrifilm 24 h VRBA

SSwm 0.712 0.512 1.151 2.375

CH013-N53039.indd 275 5/26/2008 8:56:44 PM

SSbm 22(4.717 4.769)2  22(4.783 4.769)2  22(4.808 4.769)2

CH013-N53039.indd 276 5/26/2008 8:56:45 PM

SSwm SS1a SS2a SS3a

SSbm SS1b SS2b SS3b

SSbm 22(4.717 4.769)2 22(4.783 4.769)2 22(4.808 4.769)2