Sie sind auf Seite 1von 7

Journal of Microbiological Methods 30 (1997) 6369

Journal of Microbiological Methods

Statistical analysis of the time-course of Biolog substrate utilization


a, b Christine A. Hackett *, Bryan S. Grifths
b

Biomathematics and Statistics Scotland, Scottish Crop Research Institute, Invergowrie, Dundee DD2 5 DA, UK Soil Plant Dynamics Unit, Cellular and Environmental Physiology Department, Scottish Crop Research Institute, Invergowrie, Dundee DD2 5 DA, UK

Abstract Growth of a microbial community on a particular Biolog substrate can be assessed crudely as presence or absence of colour development. More rened is an assessment based on the maximum colour reached after a particular incubation period. The rate of colour development leading up to that maximum can also give useful information about the utilization of the substrate. The statistical analysis and comparison of such time-course proles is complicated by the multivariate nature of the data. We describe two methods for examining such proles:

1. a comparison of similarities between proles from different substrates, using cluster analysis and principal coordinate analysis; 2. an analysis of summary statistics of the proles, using principal component analysis to identify substrates whose proles distinguish different treatments. We illustrate these methods with data from a complex microbial community. Cluster analysis and principal coordinate analysis are found to be useful for identifying groups of substrates with similar patterns of reactions. In particular we can identify substrates where no colour development is produced by the microbial communities. The area under the colour development prole summarises the prole in a single statistic, and this may be used in a principal component analysis to identify differences between sample treatments, and the substrates responsible for these. 1997 Elsevier Science B.V. Keywords: Biolog; Kinetic prole; Statistical analysis

1. Introduction A number of authors (e.g. Ref. [1]) have commented that the time-course of colour development in Biolog wells could be monitored to provide a colour development prole of a microbial community on each individual substrate. Haack et al. [2] have published time-course proles, and have observed
*Corresponding author. Tel.: 44 1382 562731; fax: 44 1382 562426; e-mail: chacke@scri.sari.ac.uk

that the proles could be characterized by curvetting and the resulting parameters used for statistical analysis, providing greater analytical power for ecological studies. Such an analysis represents the highest level of resolution and information available from substrate utilization assays. Repeated measurements have also been taken to select the most appropriate time point for a single time-point analysis, although the potential importance of curve-tting approaches was recognised [3]. Recently Guckert et al. [4] have summarised colour development proles

0167-7012 / 97 / $17.00 1997 Elsevier Science B.V. All rights reserved PII S0167-7012( 97 )00045-6

64

C. A. Hackett, B.S. Grifths / Journal of Microbiological Methods 30 (1997) 63 69

by the area under each curve, and have used this summary to examine the pattern of substrate use. If the main interest of the analysis is to compare the colour proles of several bacterial communities for each carbon substrate individually, then a large repeated measures methodology is open to the statistician. Accounts of these methods are given by Crowder and Hand [5], and Diggle et al. [6]. However such an analysis does not provide a concise summary of the patterns of carbon source utilisation or examine relationships between substrates. Multivariate statistical techniques are required to examine data from all the carbon sources simultaneously. We undertook this study to determine whether three multivariate statistical approaches, cluster analysis, principal coordinate analysis and principal component analysis, were suitable for the analysis of timecourse Biolog data. The important aspect of this study is the statistical approach, therefore the experimental details given are only sufcient to show that samples representative of complex microbial communities were utilized.

with 150 l of cell suspension. This gave two plates from each ask, and two asks of each of three media. The Biolog plates were incubated at 15C and the OD 595 of each well read daily for eight days using a microplate reader. The substrates are referred to by number, so that substrate 1 well A1, substrate 13 well B1, substrate 96 well H12 and so on.

2.2. Statistical analysis


The data consist of a set of time proles showing the colour development for each carbon substrate, one prole from each Biolog plate. Inoculum density has been shown to have a strong inuence on the rate of colour development [1]. To compensate for this, the average well colour development (AWCD) can be calculated for each plate and reading time as the mean colour of the 95 carbon substrates. Garland [3] normalised his data by dividing each colour score by the AWCD for that plate before using the scores for multivariate analyses. We compared analyses of the raw and AWCD normalised data.

2. Materials and methods

2.1. Sample preparation


To test the statistical techniques Biolog GN plates were inoculated with three different, complex, bacterial communities and incubated at 15C for eight days. A 50 g mass of fresh eld soil, from a permanent pasture, was sieved through a 4 mm 1 diameter mesh and thoroughly shaken in 500 ml 4 strength Ringers solution (2.25 g NaCl, 0.105 g KCl, 0.12 g CaCl 2 2H 2 O, 0.05 g NaHCO 3 l 1 ). A 5 ml volume of this soil suspension was used to inoculate 250 ml of three different media [(A) soil extract, (B) soil extract 1 / 1000 strength tryptone soya broth (TSB), and (C) soil extract 1 / 10 TSB] in a 500 ml Erlenmeyer ask. The asks were shaken at 15C for four days and then used to inoculate two more asks of the same media, which were incubated under the same conditions. Subsam1 ples from each ask were then diluted with 4 strength Ringers solution to give the same concentration of cells [determined as optical density (OD) 600 ], and duplicate Biolog GN plates inoculated

2.2.1. Similarity between wells Cluster analysis [7] is an exploratory statistical technique which aims to identify natural groupings among individuals, and to present these groupings in the form of a hierarchical tree, or dendrogram, which shows the similarities between individuals. Here cluster analysis may be used to group substrates with similar patterns of colour development, i.e., substrates which are being used to a similar extent by all the bacterial communities. In this case we regard the 95 substrates as 95 individuals, and the 12 plates and 8 time points as 96 variables. A measure of similarity Sij between two substrates i and j is dened using the city-block metric [8] as
1 Sij 1 kt C C r
ikt jkt k,t kt

where Cikt is the colour reading of substrate i for plate k at time t and r kt is the range for plate k at time t. The measure of similarity is unaffected by scaling by the AWCD. The substrates were then assembled into a dendrogram, using average linkage

C. A. Hackett, B.S. Grifths / Journal of Microbiological Methods 30 (1997) 63 69

65

cluster analysis to form successively less similar groups of substrates. The similarities between the substrates may also be displayed using a principal coordinate plot [7]. This is a plot, usually in two or three dimensions, in which the substrates are positioned so that their distances from each other mimic as closely as possible the similarities of the city-block measure. If the rst two principal coordinates account for a large proportion of the total information about similarities, then a two-dimensional scatterplot reproduces well the relationships between the substrates. Substrates with similar proles are then located close together on the plot, while substrates with dissimilar proles will be well separated.

2.2.2. Summary statistics of the prole Haack et al. [2] proposed that colour development proles may be characterised by tting an equation to the data and using the parameter estimates of each curve as a basis for analysis. Published data [2,3] follow a sigmoidal curve with time, suggesting a logistic model. Alternatively, curve tting can be avoided by an analysis of summary statistics such as the maximum value attained, the time at which the maximum is reached, the time at which the colour development is half of its maximum value or the area under the colour development prole. The area under the curve is a particularly useful summary statistic, as differences between treatments in the maximum colour development, or in the rates of colour development, will both result in different areas under the time-course prole. The area under the curve A ik for substrate i and plate k may be calculated by joining the colour levels at successive time points t(1), t(2)...t(n) by straight lines, and summing the areas corresponding to each segment between successive time points by the trapezium rule:
1 A ik 2
j n 1 j 1

area contains a typing error, although their Excel formula is correct. Each of the four summary statistics described above were calculated for each substrate and plate and were then used in four principal component analyses [7]. In principal component analysis, the summary statistic for each plate is regarded as a 95-component observation, and the aim is to identify a small number of linear combinations of the 95 substrates which account for as high a proportion of the plate-to-plate variation as possible. Each linear combination consists of a (possibly scaled) substrate score, multiplied by a weight, or loading, which measures its contribution to that combination. If the rst two principal components account for a high proportion of the variation then a scatterplot will show the major sources of plate-to-plate variation, and an examination of the loadings will show which substrates contribute most to the components. All statistical analyses were carried out using the statistical package Genstat 5.3 [9].

3. Results

3.1. Similarity between wells


Where colour developed, the proles showed a sigmoidal relationship with time of reading. However the shapes of the colour proles varied considerably from one carbon substrate to another. Fig. 1 shows six different proles. Fig. 2 illustrates the dendrogram of similarities between the substrates. At a similarity level of 80% all the substrates had merged to form six clusters. The proles in Fig. 1 were selected as the most typical substrate of each cluster. Most typical is dened here as the substrate having the largest average similarity with the other members of that cluster. The typical proles range from very little colour development on each plate (e.g. substrate 55) to rapid colour development for each plate (substrate 16). Of particular interest are substrates such as 86 and 13, where the nal level of the colour response, or the rate of colour development, varies with the treatment applied to that plate. When the similarity matrix was analysed using a principal coordinate analysis, the rst two principal

(t( j 1) t( j )) (Cikt ( j 1)

Cikt ( j ) ) where Cikt ( j) is the colour development of substrate i for plate k at time t( j ). Guckert et al. [4] also describe the calculation of the area under the curve by this method but their formula for the trapezoidal

66

C. A. Hackett, B.S. Grifths / Journal of Microbiological Methods 30 (1997) 63 69

Fig. 1. Proles of the time-course of colour development on six different substrates, selected as typical members of the six clusters of substrates with similar time-course proles in Fig. 2. The solid lines show the four plates with treatment A (soil extract), the dotted lines show treatment B (soil extract 1 / 1000 TSB) and the dashed lines show treatment C (soil extract 1 / 10 TSB).

Fig. 2. Dendrogram from average linkage cluster showing the similarities between the 95 carbon substrates.

coordinates accounted for 44% and 10% of the total similarity information. These are shown in Fig. 3. The rst principal coordinate represents the degree of colour development, with a low score for substrate 55 and a high score for substrate 16. The second principal coordinate is related to the rate of colour development. High scores on this axis (e.g. substrate 13) correspond to a gradual increase in colour in the well while low scores correspond to a rapid colour development (e.g. substrate 42). This plot enables a rapid identication, for example, of wells with no colour development (37, 40, 55, 56, 61, 63, 92 and 94). This indicates that the microbial community contains no organisms able to utilise these substrates.

3.2. Summary statistics of the prole


A logistic curve was appropriate for many of our colour proles, but there were also many proles where such a model did not t the data. These latter proles were of two types. For those where the colour developed very rapidly, changing from no colour at the start of the experiment to the maximum value of 2.5 at the rst observation time, the slope parameter and the point of inexion could not be estimated accurately. For those where the colour developed very slowly, and was still increasing after eight days, the upper asymptote could not be estimated.

C. A. Hackett, B.S. Grifths / Journal of Microbiological Methods 30 (1997) 63 69

67

Fig. 4. Plot of the rst two principal component (PCA) scores for each plate, derived from the area under the colour development prole. Each plate is labelled by its treatment (A, B or C).

Fig. 3. Plot of the rst two principal coordinate (PCO) scores for each carbon substrate, derived from the city-block similarity matrix. The rst principal coordinate represents the degree of colour development, ranging from low to high, and the second represents the rate of colour development, ranging from rapid to gradual.

Analysis of summary statistics of each prole proved more useful than curve tting. Principal component analysis of the four summary statistics of the colour development prole (maximum value attained, time at which the maximum is reached, time at which the colour development is half of its maximum value and area under the prole) all gave two principal components which accounted for a high proportion of the variation. For each summary statistic the rst principal component separated treatment C (soil extract 1 / 10 TSB) from treatments A (soil extract) and B (soil extract 1 / 1000 TSB), and the second principal component separated treatments A and B. In general, the rst three of these summary statistics produced different patterns of principal component loadings among the substrates. However substrates with high loadings for either the maximum

value attained or the time to reach half the maximum colour development had high loadings for the area under the prole. Fig. 4 shows a plot of the scores for the rst two principal components of the area under the prole, accounting for 78% and 15% of the variation between plates. The loadings of each substrate in the rst two principal components are shown in Fig. 5. Here we identify, for example, that wells 53, 77, 86, 88, 44 and 76 have large negative loadings for the rst principal component, and therefore have large areas under their proles for treatment C and smaller areas for treatments A and B. Wells 27 and 26 have the largest positive loadings for the second principal component, and therefore have large areas for treatment B and smaller areas for treatment A. This analysis was repeated after adjustment for the AWCD but the results were very similar.

4. Discussion Our analysis of the time-course of substrate utilisation showed that curve-tting and analysis of the parameter estimates of each curve can be helpful given the logistic pattern of the proles. However, there were distinct practical difculties with this approach. If the colour development was measured

68

C. A. Hackett, B.S. Grifths / Journal of Microbiological Methods 30 (1997) 63 69

Fig. 5. Plot of the rst two principal component loadings for each substrate, derived from the area under the colour development prole.

more frequently at early stages of the colour development, and the readings were continued until all the wells had reached a steady state then it should be practical to t the same curve to every substrate. However we have demonstrated that summary statistics of the prole may be used to overcome this problem. Different maximum colour development, or different rates of colour development, will both result in different areas under the time-course prole, which therefore appears the most useful summary statistic for detecting differences among the treatments. A principal component analysis identies combinations of the substrates which account for as much variation among the plates as possible. This approach was also found to be useful by Guckert et al. [4]. In this case the rst two principal components distinguished our three treatments. However this will not always be the case. A canonical variate analysis of the area under the curve would calculate linear combinations of substrates which provide the optimal separation of specic treatments. When a substrate has been found to have different proles for different treatments, repeated measures methodology may prove useful to answer questions

such as the time of divergence of the proles. One such method is ante-dependence modelling [10,11]. Gabriel [10] dened a set of repeated observations to have ante-dependence structure of order r if the observations at each time point, given the previous r observations, are independent of all further preceding observations. Genstat 5.3 [9] can calculate the order of ante-dependence of a set of data, and then use this to test for differences between treatments, allowing for the time-dependence, as described by Kenward [11]. In this experiment normalisation of the prole data by division by the AWCD had little effect on the results of the principal component analysis. However this is unlikely to be true in general and Garland and Mills [1] concluded that transformed data should be used. We prefer to investigate the effect of this transformation on each data set. Due to the form of the city-block similarity coefcient, cluster analysis and principal coordinate analysis are unaffected by the AWCD transformation. We conclude, therefore that cluster analysis and principal coordinate analysis are useful methods for grouping the substrates according to the similarity of their reactions. In particular, identication of sub-

C. A. Hackett, B.S. Grifths / Journal of Microbiological Methods 30 (1997) 63 69

69

strates where no colour is produced is important, as this shows that the microbial community contains no member capable of utilising it [2]. The area under the colour development prole is a useful statistic to summarise the time-course. A principal component analysis of the area under the prole examines variation between the samples. Observation of a full time-course prole gives far more information than measurements at one or two time-points, and justies the extra effort in analysis.

Acknowledgements We thank Mrs. S. Caul for technical assistance and Mr. J.W. McNicol for useful comments on the manuscript. This work was funded by the Scottish Ofce Agriculture, Environment and Fisheries Department.

References
[1] J.L. Garland, A.L. Mills, Classication and characterization of heterotrophic communities on the basis of patterns of community-level sole-carbon-source utilization, Appl. Environ. Microbiol. 57 (1991) 23512359.

[2] S.K. Haack, H. Garchow, M.J. Klug, L.J. Forney, Analysis of the factors affecting the accuracy, reproducibility, and interpretation of microbial community carbon source utilization patterns, Appl. Environ. Microbiol. 61 (1996) 1458 1468. [3] J.L. Garland, Analytical approaches to the characterization of samples of microbial communities using patterns of potential C source utilization, Soil Biol. Biochem. 28 (1996) 213221. [4] J.B. Guckert, G.J. Carr, T.D. Johnson, B.G. Hamm, D.H. Davidson, Y. Kumagai, Community analysis by Biolog: curve integration for statistical analysis of activated sludge microbial habitats, J. Microbiol. Methods 27 (1996) 183 197. [5] M.J. Crowder, D.J. Hand, Analysis of Repeated Measures, Chapman and Hall, London, 1990. [6] P.J. Diggle, K.-Y. Liang, S.L. Zeger, Analysis of Longitudinal Data, Clarendon, Oxford, 1994. [7] P.G.N. Digby, R.A. Kempton, Multivariate Analysis of Ecological Communities, Chapman and Hall, London, 1987. [8] A.J. Cain, G.A. Harrison, An analysis of the taxonomists judgement of afnity, Proc. Zool. Soc. London 131 (1958) 8598. [9] R.W. Payne, P.W. Lane, P.G.N. Digby, S.A. Harding, P.K. Leech, G.W. Morgan, A.D. Todd, R. Thompson, G. Tunnicliffe Wilson, S.J. Welham, R.P. White, Genstat 5 Release 3 Reference Manual, Oxford University Press, Oxford, 1993. [10] K.R. Gabriel, The model of ante-dependence for data of biological growth, Bulletin Institute 33rd Session (Paris), 1961, pp. 253264. [11] M.G. Kenward, A method of comparing proles of repeated measurements, Appl. Stat. 36 (1987) 296308.

Das könnte Ihnen auch gefallen