Bootstrap Confidence Interval Estimation For Shannon Diversity Index

BOOTSTRAP CONFIDENCE INTERVAL ESTIMATION FOR
SHANNON DIVERSITY INDEX:
I. INTRODUCTION
Background of the Study
Diversity index is a parameter intended to measure diversity or variability of a
collection of individuals based on the set characteristics that are categorical in nature, the
same way that statistical variance is used to provide measure of variability for
quantitative variables. Among several indices, Shannon index (Shannon-Weiner index)
stands out because it provides more information about community composition. This
index was originally proposed as a measure of information content of a code. Later on, it
was widely used to determine the diversity of sample of individual from an ecological
community, treating species as symbols and their relative population sizes as the
probability.
Moreover, it has been one of the most commonly used indices because of its
application in measuring the diversity of a collection coming from infinitely large
population in which a random sample can be drawn. This classification is the most
common scenario in which it becomes necessary to estimate diversity. As an estimator
the of community’s diversity, it has been proven to be biased especially when all the
species are not included in the sample (Pielou 1969).
Several studies have adopted the index to measure other types of diversity. The
index also measures the variation of genetic, morphologic and phenotypic characteristics.
Analogous to components of biological collection, the properties and condition of
Shannon index remains.
Likewise, the Shannon index used for these diversities is biased and more often
underestimate the true population diversity, especially when the number of not sampled
species or subjects increases. In biological diversity, the number of species in a collection
is sometimes not independently known. For phenotypic diversity, some characteristics
may not be observed in the sample. Thus, the statistical properties of the index are still in
question since the sample was not able to capture the vital properties of the parameter,
restricting one to proceed on estimation of the index by means of confidence interval and
hypothesis testing based on sampling imposed by field condition. As Peet (1975) had
elaborated, the use of diversity index without a significance test can be quite misleading.
What is required for the solution of these inference problems is the sampling
distribution of the estimate of the Shannon index. Several methods have been done to
develop and obtain improved estimates considering its distribution and other statistical
properties.
Zahl (1977) applied the resampling method called jackknife method to the
estimation of the diversity index and showed the advantages of the method when the
random sampling of the species or subject is not satisfied. This method provided good
estimate of bias and variance.
Similar to the basic idea of jackknife method, bootstrap method produces an
improved estimate for Shannon index. Bootstrapping as a technique that allows one to
estimate sample variability by resampling from empirical probability distribution defined

by a single sampling. Through this process, the scatter found enables one to estimate the
standard error and construct confidence interval of the result.
In bootstrap sampling data can be created based on the observation without
knowing the distribution of the population. Also, the bootstrap provides a way to
substitute computation for mathematical analysis if calculating the asymptotic
distribution of an estimator or statistic is difficult. On that reason, bootstrap could be a
good selection in calculating the Shannon diversity index and its properties.
However, the choice of which bootstrap confidence interval to use is highly
dependent on the practical research situation. Among several bootstrap confidence
interval methods, normal approximation method, standard percentile method, and bias-
corrected method seem to be the ones that best take advantage of bootstrapping’s
benefits. They are automatic and relatively simple to program, easy to understand
conceptually, and applicable, perhaps, to any statistics developed from a simple random
sample. For these reasons, these methods are the leading candidate for application in
different field of science.
Statement of the Problem
Over the past decade, substantial attention has been paid to the development of
techniques using bootstrapping sampling distribution to build confidence intervals around
various population parameters. This study presents a bootstrap simulation for determining
the Shannon index using the four most general and practical method for biological
sciences namely: the normal approximation method, the standard percentile method, and
the bias-corrected method. It also explores the sampling mean coverage of the confidence
intervals for the index when the number of species or subject in the sample itself is
accounted to random variation.
Objective of the Study
In general the study aims to utilize the bootstrap confidence interval methods for
Shannon index. Specifically, this study was conducted to:
1. examine the behavior of Shannon index using bootstrapping method when the
number of original sample, bootstrap resample, and classes vary;
2. determine the sampling distribution of the Shannon index;
3. construct confidence interval of Shannon index using bootstrap methods such
as the normal approximation method, the standard percentile method, and the
bias-corrected method;
4. assess the interval based on its statistical properties such as mean coverage,
accuracy, precision, and reliability; and
5. compare the interval obtained using the different interval construction method.
Significance of the Study
Shannon index, similar to other indices for different field simplifies the
information content of a collection. Most of the times it is of interest to compare
collections and to know which collection exhibits stable and productive community
structure. For environmental sciences, quantifying through this index is a tool in
monitoring the diversity condition of an ecological space through time, reflecting the
extinction or domination of a species or classification. Producing an index that best
represent diversity of the collection could help if not maintain healthy community. Since
it is important to account uncertainty when diversity indices are calculated in making
policies this study may be beneficial to the user of Shannon index. In addition,
identifying the bootstrap method that will give a best coverage of an interval will improve
the analysis of the index. Furthermore, the problem of interpreting this diversity
measurement will be improved if statistical properties of estimates will be studied.
Balanced characterization and biological-conservation program will be more efficient.

II. REVIEW OF LITERATURE
The application of index in biological diversity had been increasingly adopted
by ecologist since the late 1950s. However, with the different types of collection, Pielou
(1966) recommended the use of Shannon index for a large collection where a random
sample can be drawn and the number of species is known. Since the basis is only a
portion of the whole collection, one cannot determine the true population diversity or
preferably estimate the average diversity from a sample. The estimated value of Shannon
index is from the incomplete knowledge yielded by a sample and thus has sampling error.
Pielou reiterated that estimates of Shannon should always be accompanied by estimates
of their standard errors (1969).

The Shannon index of diversity is obtained as H p
ilnp
i , where pi is the
i
1
proportion of the ith species of a population with s species. The value of H is estimated
using field data as H 

ˆ  p
ilnp th
i , where pi=n/N is the proportion of the i species in
i
1
the sample. However, this method yields a biased estimator with expected mean
s
s 1
E ( Hˆ )   pi ln pi   ...
i 0 2N
and variance
s s
 p
il
npi
2
( p
il
np)
i
2

s1
v
a
rˆ
(
H)
i0 
i0
 .
.
N 2
N
(Hutchenson 1970). The bias may only be insignificant when all of the species are
included in the sample (Peet 1974).
Several methods had been employed to obtain improved estimates for the
Shannon index. Pielou repeatedly computed the index of a sample, adding new quadrants
in random order at a time, until the index showed no significant difference. Using
Pielou’s method, Heyer and Berven (1973) repeated the procedure for different random
orderings of the quadrant. This method provided an improved or smaller standard error.
However, these methods failed to provide significance tests or confidence intervals for
sampling done when the number of species in the sample is subject to variation.
Zahl (1977) applied the so called jackknife method to the estimation of
Shannon index. The method was employed by systematically dropping out quadrants one
at a time and assessing the variation in the resulted index. It automatically took into
account the restriction on filed sampling and showed approximately normally distributed
estimates making possible significant test and confidence interval.
A simulation study made by Pinol (1997) focused on the impact of jackknifing
procedure as a tool to correct the bias of the usual estimation procedure for species
diversity. She compared the relative reduction in bias of the jackknife estimates from the
computed values using sample based procedure of the three diversity indices, including
Shannon index. Jackknife method performed better, resulting in a substantial reduction
on bias. However, the estimates gathered showed an increase on its variance.
Efron (1981) compared jackknife and bootstrap method in evaluating the
nonparametric estimates of the standard error of an estimator. It showed that the bootstrap
performs notably better than the jackknife in estimating several parameters like the
standard error of correlation coefficient from a bivariate normal model. He showed that
bootstrap can exhaust all the possible nonempty subsets or at most 2  1 nonempty
n
subset of the data set of size n, while the jackknife utilize only n of them. Hall (1989)
generalized that bootstrap methods are simulation methods for assessing sampling
properties of the statistical estimates.
Several studies extended the index to evaluate other types of diversity. Genetic
diversity used Shannon index to distinguish the genetical variation of an organism.
Morphologic and agronomic variety of plants implemented the index to characterize the
resistance of the variety from certain diseases.
Jain et al. (1975) adopted the Shannon index to examine the geographical
patterns of the phenotypic diversity in the world collection of the durum wheats (Triticum
turgidum). In the study, durum wheats were classified for different observable
characteristics, each in different number of classes. The Shannon index, Hpi lnpi ,
i
1
was employed, where n is the number phenotypic classes for a character and pi is the
proportion of total number of entries in the ith class. Moreover, Jain et al, used the
additivity property of the Shannon index which is useful in hierarchal analyses of
diversity in a large variety of data.
Meanwhile, a study by Riley in 1998 was undertaken to estimate the phenotypic
diversity in the Ethiopian noug germplasm collections across a wide range of characters
in different agro-ecological areas of the country. As described by Jain et al., the
phenotypic frequencies of the characters were analyzed by the Shannon diversity index
(H’) in order to estimate the diversity of each character within each province. The result
supported the assertion of Yang et al. (1991) that the value of the index increases with the
increase in polymorphism and reaches the maximum value when all phenotypic classes
have equal frequencies. Also the variance of H’ has not been characterized. However,
assuming that the eight characters used in the study represent a random sample of all
possible characters of noug plant, an empirical variance was computed from the eight
estimates of Shannon’s diversity index. It was concluded that the utility of germplasm
collection to research programmes designed to locate genes depends on adequate
sampling procedures.
A study by Aldemita (2006) was done to determine the genetic diversity of
Korean germplasm of rice with different levels of resistance to blast in the Philippines.
Bootstrap analysis was used to determine the variation on phenotypic characteristics of
the leaves of each Korean germplasm. The bootstrap values was generalized and come in
percentages which can be considered as statistical tests (confidence limits) on the validity
of the various groups. She discussed further that the higher the percentage, the greater the
confidence that a particular group is true.
On the literature of bootstrap confidence interval, Efron and Tibshani (1993)
compared theoretically the three confidence interval method and highlighted the
advantages and disadvantages of each method. The normal approximation requires
parametric assumption about the distribution of the estimator but demands the least
computation. On the other hand, percentile method, which is invariant to transformation,
allows the distribution of the estimator to be asymmetrical; however, it results to low

accuracy when the sample is small. Taking the advantages of percentile method, the bias-
corrected method, on the other hand, requires a limited parametric assumption.

III. CONCEPTUAL FRAMEWORK
Bootstrap Procedure
Efron and Tibshani (1993) simplified the steps of the generic bootstrapping
procedure. This was followed to bootstrap the Shannon diversity index. Suppose a
random sample of x1, x2,.. xN with unspecified probability distribution F, so that xi~indF,
for which the parameter of interest is to estimated. The basic steps in the procedure are as
follows:
1. Construct an empirical probability distribution, Fˆ ( x) , from the
observed values xi, assigning 1/n at each data point. This is the empirical
distribution function (EDF) of x, which is the nonparametric maximum
likelihood estimate (MLE) of the population distribution function, F(X).
2. From the EDF, Fˆ ( x) , draw a random sample of size n with
replacement. This is a “resample,” xb*.
3. Calculate the statistics of interest, ˆ , from this resample, yielding
ˆb* .
4. Repeat the steps 2 and 3 B times, where the B is a large number.
The practical magnitude of B depends on the tests to be run on the data.
Typically, B should be 50-200 to estimate the standard error of ˆ , and at
least 1000 to estimate the confidence intervals around the ˆ .

5. Construct a probability distribution from the B θ b*’s by placing a
probability 1/B at each point, ˆ1 , ˆ2 ,…, ˆB . This distribution is the
* * *
bootstrapped estimate of the sampling distribution of ˆ , Fˆ (ˆ ) .

* *
Confidence Interval Estimation
Three different methods will be used and compared in constructing confidence
interval for Shannon diversity index namely: normal approximation method, standard
percentile method, and bias-corrected method.
The normal approximation method is analogous to the parametric approach in
constructing confidence intervals. The method assumes that the statistics follow a normal
distribution; however no analytic standard error formula for it exists. The bootstrapped
sampling distribution can be surrogated to estimate the standard error. This estimation is
a straightforward application of the notion that are normal variates distributed as :
where , and B is the number of bootstrap
samples.
Similar to the traditional parametric case, the points on the z distribution
associated with  / 2 and 1   / 2 will be identified. Using the bootstrapped standard
error, , thus, there is ( 1   ) probability that
( )
The percentile method, on the other hand, takes literally the notion that
approximates . That is, the bootstrapped estimate of the sampling distribution of

approximates the empirical distribtution F( ). The basic approach in finding the lower
and upper limit of an a-level confidence interval is to determine the  / 2 and 1   / 2
percentiles of the distribution. The values will be sorted so that the value
of at the 2.5th and 97.5th percentile of can easily be determined. Thus, given
that B =1000, the lowest 25th value of will be the lower limit and the 25th highest
value will be the upper limit of the interval.
The third method, bias-corrected, adjusts the bootstrapped distribution at the
center of the point estimate. This allows finding asymmetric intervals. The confidence
interval can be modified and written as
( )
Where and are the standardized normal distribution values with
probability  / 2 , ( 1   / 2 ), and p*, respectively, where p* is the proportion of the
bootstrap estimates that are larger than the original sample estimate or the bootstrap
population. If the distribution is already centered, that is, if p* is 0.50, then it will turn out
to be the symmetric percentile confidence interval.
Equivalently, the endpoints can also be obtained using the cumulative probability
of the bound from the empirical distribution as
for upper limit and for
lower limit.
Thus the bias corrected percentile limits are the 100 % and 100 % percentile
of the parameter estimation bootstrap estimation.

IV. METHODOLOGY
Partial data in morphologic and agronomic characterization of rice accessions at
the IRRI germplasm bank was used in the study. Each phenotypic character, denoted by
Yi, of each rice accession was scored or measured in accordance with the procedure
describe in descriptors for Rice provided by IBPGR – IRRI (1980). For agronomic
characteristics which involve quantitative data, the accessions were categorized into 10
classes defined by the intervals generated by   k , where k=1,2,3,4, and 5,  is the
mean and  is the variance of the values collected for the variable.
The behavior of bootstrap was examined when (A) the number of original
samples and (B) the number of bootstrap resamples vary. Samples consisting of 3000,
1000, 500, 100, and 50 rice accessions were drawn randomly from the rice collection.
These sample sizes were denoted as the original sample. The (1) Shannon index ( Ĥ ) and
(2) number of states ( R ) of each descriptor, Yi, using the formula of H 

ˆ  p
ilnp
i
1
i,
where n is the number phenotypic states for a descriptor and p i is the proportion of the
ˆ), were computed respectively from

total number of entries in the ith state, and Rexp(H
each original sample. Simple random sample with replacement was used from the
original sample to draw different bootstrap resamples.
For each original sample, 200, 500, 1000, 1500, 2000, 5000 and 10000 bootstrap
resamples were generated to make bootstrap estimation and examine the performance of
the index in modifying the number of resamples. Then, for each bootstrap resample, the
(1) Shannon index and (2) number of states observed was estimated and compared to the
measures obtained from the original sample. Likewise, the (3) normal approximation
confidence interval, (4) standard percentile confidence interval, (5) bias-corrected
confidence interval, (6) bias, and (7) standard error of Shannon index were estimated.
The empirical coverage of nominal 0.95 confidence will be used for each confidence
interval.
To estimate the magnitude of the bias, the percent bias was calculated as the
difference between the true and estimated Shannon index divided by the true value for all
of each conditions.
Each resamples was subjected in determining the distribution of the estimates
acquired. Kolmogorov-Smirnov statistics on Goodness-of-Fit test was used to test the
normality of the distribution.
Since it is of interest to know the behavior of the Shannon diversity index in
different conditions, the number of classes or states of a descriptor was set in varying
number. Given the specified number of states, the Shannon index was then determined.
The performance of Shannon index for each condition was then assessed.
Moreover, to evaluate the properties of the confidence intervals obtained, 100
bootstrap confidence intervals were constructed for each of the method. Then, the
accuracy, efficiency and sufficiency of the intervals were assessed.
Accuracy of the method was measured as indicated by confidence interval
coverage or the percentage of bootstrapped intervals which contain the population
Shannon index. The efficiency of the method was amounted based on the Average Range
(AR) of set of estimated confidence interval. The measure of coverage sufficiency was
provided by Realized Coverage Rate (RCR), defined as a percentage by which a set of
estimated intervals actually cover the population index given a prescribed level of
confidence. The RCR of the methods was compared to determine the most efficient
interval construction technique.
Statistical Analysis Software (SAS) and STATA Software was used for the
analysis.
III. RESULTS AND DISCUSSION
Partial data of rice accessions was composed of 36 rice descriptors with 31 pre-
coded and classified descriptors and five descriptors with actual value. Table 1 shows the
computed Shannon Diversity Index of each of the descriptors of the rice collection.
Some of the accessions that contain rice descriptors with no recorded observation were
dropped for the analysis.
Table 1. Shannon Diversity Indices of the rice descriptors of the population of rice
collection and the number of state observed.
No. of Shannon No. of Shannon

State State
Descriptor Observed Index Descriptor Observed Index
Apiculus Color 8 1.401522 Main Heading* 7 1.412793
Auricle Color 3 0.349402 Internode Color 4 0.903629
Awn Color 7 0.720629 Ligule Color 4 0.292467
Awn Presence 5 0.726706 Ligule length* 8 1.403048
Blade Color 7 1.009778 Leaf Length 5 1.028233
Blade Pubescence 3 0.261031 Lemma Palea Color 11 1.418971
Blade Leaf Sheath
Color 4 0.584005 Leaf Senescence 9 1.389686
Collar Color 4 0.953196 Leaf Width 3 0.362462
Culm Angle 9 1.265548 Panicle Exserion 9 1.169451
Culm Diameter 2 0.652631 Panicle length 5 0.619180
Culm Length 7 1.723441 Panicle Threshability 9 1.509179
Culm Number 3 0.631618 Panicle Type 9 1.367554
Culm Strength 9 1.583626 Seed Coat Color 7 0.633206
Endosperm Type 2 0.228995 Seedling Height 3 0.753285
Flag Leaf Angle 4 1.180939 Sterile Lemma Color 4 0.490297
Grain Length* 10 1.464131 Sterile Lemma length 5 0.653199
Grain width* 9 1.428575 Spikelet Sterility 5 0.857232
100-Grain Weight* 10 1.442009 Stigma Color 5 0.632181
*Quantitative Variables
The descriptor with the highest Shannon index was the Culm Length with the
index of 1.723441 with seven states observed, followed by Panicle Threshability with the
index of 1.509179 with nine states. On the other hand, the descriptor with the lowest
Shannon index was found to be the Endosperm Type with the index of only 0.228995
having two states.
It was also detected that all five quantitative variables acquired an index larger
than one. The Grain Length had the highest Shannon diversity index of 1.464131 with the
complete 10 states observed. This was followed by 100-Grain Weight with the index of
1.442009 with also the complete 10 states observed. Grain Width, Main Heading, and
Ligule Length had indices higher than 1 with detected states of nine, seven, and eight,
respectively.
The Lemma Palea Color had the highest number of states detected with 11 states
observed, providing a Shannon diversity index of 1.418971. On the other hand, the
Endosperm Type and the Culm Diameter had only two states, providing an index of
0.228995 and 0.652631, respectively.
Analysis of Shannon Index using Bootstrap Method when the Number of Original
Sample and Bootstrap Resamples Vary
For this analysis, only the diversity of the descriptor Blade Color was used to
examine the behavior of the Shannon index when the original samples are 3,000, 1000,
500, 100, and 50 and when the number of resamples are 200, 500, 1000, 1500, 2000,
5000 and 10000.
Out of 9,105 rice accessions with the recorded Blade Color, 5,602 entries had the
color of Pale Green with the registered proportion of 0.61527 among all colors, as shown
Figure 1. It is followed by Dark Green state with 0.28160. However, there were only 13
rice entries with the Purple Tip state, which accounted for 0.00143 of the collection. This
proportional distribution of descriptor Blade Color produced an index for the population
of 1.009778.
Figure 1. Pie graph of the proportional distribution of the Blade Color’s states on
the population of rice collection.
Sample of 3,000
Using the 3000 original sample from the population of the Blade Colors of the
rice collections, the Shannon diversity index estimate of the Blade Color was found to be
1.0091. This estimate had a small bias of -0.0007 which can be attributed to the same
proportional distribution of sample among each descriptor’s states in the sample of 3000
rice accessions and in the rice population collection as presented in Figure 1 and Figure 2.
The two graphs exhibited almost the same proportional distribution of samples among the
Blade Color’s states.
Similar to the population collection, rice having Pale Green on Blade Color
showed the highest proportion of 0.62167 in the sample of 3,000 rice entries. Also, there
were only 3 entries or 0.00100 of the sample with the Purple Tip state.
Table 2. Frequency distribution of the Blade Color on the
Population and on the sample of 3000
Blade Color's Frequency

State Population Sample
Pale Green 5602 1865
Green 361 114
Dark Green 2564 818
Purple Tips 13 3
Purple Margins 423 147
Purple Blotch 40 11
Purple 102 42
Total 9105 3000
Shannon Index 1.0098 1.0091
Figure 2. Pie graph of the proportional distribution of the Blade Color’s states
on sample of 3,000.
All of the bootstrap estimates in different resamples had the value of the index
close to the original sample estimate index of 1.0091 as indicated by the small bias on
each resamples. All the bootstrap estimates underestimated the original sample index.
The most accurate estimate was produced by the bootstrap with 1,000 resamples with
value of index equal to 1.00893. On the other hand, the bootstrap with 200 resamples had
the least accurate estimate with the value of index equal to 1.00718. The standard errors
of the estimates using different number of resamples were found to be reliable with value
ranges from 0.01677 to 0.01789 only.
The 95% Confidence Intervals constructed using the three methods did not vary
significantly and covered the original sample estimate and the population index. Also, for
all of the bootstrap resamples, it was tested and verified that the distribution of the
bootstrap estimates followed a normal distribution.
Table 3. The Bootstrap estimates and statistical properties of the Shannon Index of the
Blade Color on the 3,000 original samples with different bootstrap resamples.
Bootstrap Bootstrap Standard Normality 95% Confidence Interval Bootstrap
Resample Estimates Bias Error Test Lower Limit Upper Limit Method
0.97582 1.04242 NA
200 1.00718 -0.0019 0.01689 Normal 0.97317 1.04099 P
0.97855 1.04308 BC
0.97396 1.04428 NA
500 1.00817 -0.001 0.01789 Normal 0.97295 1.04224 P
0.97808 1.0458 BC
0.97605 1.04219 NA
1000 1.00893 -0.0002 0.01685 Normal 0.97542 1.04214 P
0.97425 1.04137 BC
0.97622 1.04203 NA
1500 1.00827 -0.0009 0.01677 Normal 0.97554 1.04035 P
0.97662 1.0413 BC
0.97533 1.04291 NA
2000 1.0077 -0.0014 0.01723 Normal 0.97303 1.04107 P
0.97733 1.04474 BC
0.97600 1.04225 NA
5000 1.00825 -0.0009 0.0169 Normal 0.97545 1.04159 P
0.97744 1.04365 BC
0.97625 1.042 NA
10000 1.00806 -0.0011 0.01677 Normal 0.97462 1.04091 P
0.97721 1.04284 BC
NA = Normal Approximation P = Percentile BC = Bias Corrected
Sample of 1,000
Taking a sample of 1,000 from the population of the Blade Colors of the rice
collections, the Shannon diversity index estimate of the Blade Color was found to be
0.9971. This index estimate was smaller than the index estimated from 3,000 samples.
Moreover, this estimate provided a small bias of -0.01203. Furthermore, it can be
observed that the proportional distribution of samples on the different blade color’s states
was as closed as in the Blade Color’s population and in the sample of 1,000 observations,
as presented in Figure 1 and Figure 3.
Table 4 . Frequency distribution of the Blade Color on the

Population and on the sample of 1000
Descriptor's Frequency
Pale Green 5602 621
Green 361 32
Dark Green 2564 285
Purple Tips 13 3
Purple Blotch 40 7
Purple 102 12
Total 9105 1000
The sample of 1,000 rice entries was dominated by the state of having Pale Green
with the number of 621 or a proportion of 0.62100; followed by Dark Green with the
0.28500. On the other hand two states had samples with less than ten observations. There
were 3 and 7 rice accessions with Blade Color of Purple Tips and Purple Blotch,
respectively.
Figure 3. Pie graph of the proportional distribution of the Blade Color’s
states on sample of 1,000.
Given the original sample index estimate of 0.99774, all the bootstrap estimates
produced from the sample of 1,000, underestimated the original sample estimate and even
the population index.
However, as presented in Table 5, all the bootstrap estimates in different number
of bootstrap resamples were found to be accurate based on its bias. The bootstrap
resample of 1,000 provided the most accurate estimate with the index of 0.99523 and the
value of bias of -0.00251. On the other hand, the least accurate bootstrap estimate was
found in the resamples of 200 with the index of 0.99378 and bias of -0.00397.
At 95% level of confidence, all the confidence intervals constructed using the
three methods did not vary significantly and covered the original sample estimate and the
population index. Also, for all of the bootstrap resamples, the distribution of the bootstrap
estimates followed a normal distribution.

Blade Color on the 1,000 original samples with different bootstrap resamples.

Resample Esimates Bias Error Test Lower Limit Upper Limit Method
0.93994 1.05555 NA
200 0.99378 -0.00397 0.02931 Normal 0.93442 1.05091 P
0.94232 1.05863 BC
0.94016 1.05533 NA
500 0.99409 -0.00366 0.02931 Normal 0.93577 1.04949 P
0.94395 1.05243 BC
0.93744 1.05805 NA
1000 0.99523 -0.00251 0.03073 Normal 0.93435 1.05586 P
0.93744 1.05965 BC
0.93948 1.05601 NA
1500 0.99515 -0.00259 0.0297 Normal 0.93736 1.0535 P
0.94145 1.06104 BC
0.93943 1.05606 NA
2000 0.99504 -0.0027 0.02973 Normal 0.93695 1.05256 P
0.94003 1.05683 BC
0.93979 1.0557 NA
5000 0.99442 -0.00333 0.02956 Normal 0.93593 1.0521 P
0.94374 1.05933 BC
0.93949 1.056 NA
10000 0.99483 -0.00291 0.02972 Normal 0.93653 1.05255 P
0.94231 1.05799 BC
Sample of 500
Taking an original sample of 500 rice accessions from the population of the Blade
Colors, the Shannon Index estimate of the Blade Color was found to be 0.9971. This
estimate had a bias of -0.101839. Moreover, this index estimate was smaller than the
index estimated from 3,000 and 1,000 as original samples. Furthermore, it can be stated
that the proportional distribution of the rice collection on most of the blade color’s states
was closely the same in the population and in the sample with 500 rice accessions, as
presented in Figure 1 and Figure 4. However, no observation was collected for the state
of Purple Blotch in the sample of 500.
Table 6. Frequency distribution of the Blade Color on the

Population and on the sample of 500.
Pale Green 5602 324
Green 361 14
Dark Green 2564 137
Purple Tips 13 1
Purple Blotch 40 0
Purple 102 2
Total 9105 500
Similar to the sample of 3,000 and 1,000 entries, the Blade Color state of Pale
Green had the highest sampled entries with 324 in numbers or 0.64800 in proportion.
There were only one and two entries in the sample that are classified as Purple Tips and
Purple, respectively. Moreover, no entry was found in the state of Purple Blotch.
Figure 4. Pie Graph of the proportional distribution of

the Blade Color’s states on sample of 500.
With the original sample index estimate of 0.90794, all the bootstrap estimates in
different number of bootstrap resamples underestimated the original sample estimate.
However, all the bootstrap estimates in different number of resamples were accurate
based on its bias. The bootstrap resample of 1,000 provided the most accurate estimate
with the index of 0.90398 and bias of -0.00396. On the other hand, the least accurate
estimate was found in the resamples of 2,000 and 5,000 with the bias of the estimate of -
0.00582.
Blade Color on the 500 original samples with different bootstrap resamples.

Resample Estimates Bias Error Test Lower Limit Upper Limit Method
0.83087 0.98501 NA
200 0.90247 -0.0055 0.03908 Normal 0.82192 0.96784 P
0.82561 0.97421 BC
0.83296 0.98292 NA
500 0.90357 -0.0044 0.03816 Normal 0.83003 0.9749 P
0.83688 0.97805 BC
0.82837 0.98751 NA
1000 0.90398 -0.004 0.04055 Normal 0.8256 0.98381 P
0.83113 0.99359 BC
0.8317 0.98418 NA
1500 0.90266 -0.0053 0.03887 Normal 0.82615 0.98075 P
0.83683 0.99094 BC
0.8331 0.98278 NA
2000 0.90212 -0.0058 0.03816 Normal 0.82438 0.97535 P
0.83727 0.98772 BC
0.83213 0.98375 NA
5000 0.90212 -0.0058 0.03867 Normal 0.82589 0.97596 P
0.8378 0.988 BC
0.83231 0.98356 NA
10000 0.90213 -0.0058 0.03858 Normal 0.82548 0.97671 P
0.83648 0.98662 BC
The standard errors of the bootstrap estimates using different number of
resamples did not vary significantly which ranges from 0.03816 to 0.04055. The most
precise estimate was given by the bootstrap estimate with 500 and 2,000 resamples
In the same way with the intervals in the 3,000 and 1,000 original sample, the
95% confidence intervals in the 500 original sample produced using the three methods
did not vary significantly and covered the original sample estimate. The distribution of
the estimates for all the bootstrap resamples followed the normal distribution.
Sample of 100
A random sample of 100 rice accessions, as shown in Table 8 registered only
three states observed. No accession entries were found on Purple, Purple Tips, and Purple
Blotch in the sample. This sample collection produced a value Shannon index of 0.89913,
with a bias of -0.1206.
Table 8. Frequency distribution of the Blade Color on

the Population and on the sample of 100.
Pale Green (60) 5602 59
Green (61) 361 3
Dark Green (63) 2564 35
Purple Tips (80) 13 0
Purple Margins (85) 423 3
Purple Blotch (86) 40 0
Purple (89) 102 0
Total 9105 100
Figure 5. Pie Graph of the proportional distribution of the Blade Color’s states on sample
of 100.
In Table 9, all the Bootstrap estimates on the original sample of 100 using the
different number of bootstrap resamples underestimated the original sample estimate of
the Blade color’s Shannon index. The most accurate index was provided by the bootstrap
estimate with 1000 resamples having a bias of only -0.01317. However, trend on the
bootstrap estimates was not present as the number of resamples varies.
Non-normality of the distribution of the bootstrap estimates were found for the
resamples of 2,000 and higher. Moreover, the confidence interval using these resamples
did not cover the population of 1.0098 using Percentile method.

Bootstrap Bootstrap Normality 95% Confidence Interval Bootstrap

Resample Estimate Bias SE Test Lower Limit Upper Limit Method
0.74643 1.03184 NA
200 0.87139 -0.01771 0.07482 normal 0.74757 1.01639 P
0.7615 1.06629 BC
0.74744 1.03083 NA
500 0.87449 -0.01461 0.07212 normal 0.73066 1.01065 P
0.75628 1.03345 BC
0.74888 1.02939 NA
1000 0.87593 -0.01317 0.07147 normal 0.72907 1.01195 P
0.7615 1.0287 BC
0.74728 1.03099 NA
1500 0.87497 -0.01413 0.07232 normal 0.73066 1.01474 P
0.75628 1.03547 BC
0.7515 1.02677 NA
2000 0.87207 -0.01703 0.07018 not normal 0.72755 1.00192 P
0.7607 1.01877 BC
0.74688 1.03139 NA
5000 0.87342 -0.01568 0.07256 not normal 0.72411 1.008 P
0.75677 1.03474 BC
0.74761 1.03066 NA
10000 0.87256 -0.01654 0.0722 not normal 0.73066 1.01058 P
0.7616 1.03986 BC
Sample of 50
A random sample of 50 rice accessions, as shown in Table 10, provided a
Shannon index estimate of 0.87317 with bias of -0.1366. Sample of rice accessions
registered only on the three Blade color’s states namely Pale Green, Green, and Dark
Green.
Similar to the previous number of original samples, all the bootstrap estimates on
the sample of 100 using the different number of bootstrap resamples underestimated the
original sample estimate of the Blade color’s Shannon index. The most accurate index
was provided by the bootstrap resamples of 1000 having a bias of only -0.01431.
However, the standard errors of the bootstrap estimate in this sample indicated relatively
higher values compared to the previous number of original sample.
Table 10. Frequency distribution of the Blade Color on

the Population and on the sample of 50.
Pale Green (60) 5602 30
Green (61) 361 4
Dark Green (63) 2564 16
Purple Tips (80) 13 0
Purple Margins (85) 423 0
Purple Blotch (86) 40 0
Purple (89) 102 0
Total 9105 50
Figure 6. Pie Graph of the proportional distribution of the Blade Color’s

states on sample of 50.
For the original sample of 50, only the resamples of 200, 500 and 1,000 fitted the
normal distribution. Resampling greater than equal to 1,500 produced non-normal
distribution at 0.05 level of significance.
For all bootstrap resamples, it can be noticed that the upper bound of the interval
using percentile method failed to reach and cover the population index of 1.0098. Bias-
corrected method for 1000 resamples had an upper bound less than the population index.
Bootstrap Bootstrap Normality 95% Confidence Interval Bootstrap

Resample Estimate Bias SE Test Lower Limit Upper Limit Method
0.72604 1.02031 NA
200 0.85569 -0.0175 0.07461 normal 0.70276 0.99841 P
0.76099 1.03758 BC
0.71375 1.03259 NA
500 0.85518 -0.018 0.08114 normal 0.68434 0.99787 P
0.7171 1.0181 BC
0.71548 1.03086 NA
1000 0.85886 -0.0143 0.08036 normal 0.68434 0.99783 P
0.70168 1.00487 BC
0.71051 1.03584 NA
1500 0.8539 -0.0193 0.08293 not normal 0.665 0.99787 P
0.70779 1.0181 BC
0.71046 1.03588 NA
2000 0.85279 -0.0204 0.08297 not normal 0.67939 0.99785 P
0.71351 1.01986 BC
0.711 1.03535 NA
5000 0.85164 -0.0215 0.08272 not normal 0.67301 0.99524 P
0.71351 1.01331 BC
0.71064 1.03571 NA
10000 0.85362 -0.0196 0.08292 not normal 0.67301 0.99785 P
0.71351 1.0181 BC
Comparison of Different Number of Original Samples
Table 12 summarized the estimates of Shannon Index in the different number of
original sample and the frequency observed in the Blade color’s states. It shows that the
population index of 1.0098 was estimated using 3,000, 1,000, 500, 100, and 50 with the
value 1.0091, 0.9977, 0.9079, 0.88913 and 0.87317 respectively. Relative to the Shannon
index on the population of the Blade Color in the collection of rice, the Shannon index
estimate decreases as the number of original sample decreases. This shows that gathering
fewer samples from the collection will more likely underestimate the true population
index. This can be attributed by having no observation in some of the state of the Blade
Color sampled. In the sample of 500 observations, no sampled rice accession had a Blade
Color of Purple Blotch. In addition, the sample of 100 and 50 left three and four states
unregistered.
Table 12. Comparison of the distribution of Blade Color’s states and Shannon Index on
the different original samples.
Frequency on Sample
State Population 3000 1000 500 100 50
Pale Green 5602 1865 621 324 59 30
Green 361 114 32 14 3 4
Dark Green 2564 818 285 137 35 16
Purple Tips 13 3 3 1 0 0
Purple Margins 423 147 40 22 3 0
Purple Blotch 40 11 7 0 0 0
Purple 102 42 12 2 0 0
Total 9105 3000 1000 500 100 50
Shannon Index 1.0098 1.0091 0.99774 0.907940 0.88913 0.873173

Analysis of Bias
Figure 7 shows the percent bias of the bootstrap estimates for each of the
conditions of different number of original sample and bootstrap resample. It can be
observed that the estimates using the original sample of 3,000 provided the smallest bias
for all of different bootstrap resamples. The graph also shows the significant difference of
percent bias between the large original samples such as 500 and higher and smaller
original samples such as 50 and 100. Nonetheless, for all of the original samples, the
estimates that provided the most accurate estimate were the estimates generated using
1,000 bootstrap resamples.
Figure 7. Comparison of percent bias for different original samples on different

bootstrap resamples
Differences on the Observed State
The Shannon Index of the descriptor was also examined when the number of
states or classes varied. Table 8 confirmed that the Shannon diversity index decreases as
the number of the state or class decreases. The amount of decrease also increases as the
number of detected state closes to zero. Furthermore, the Shannon index was observed to
be equal to zero when there is only one state collected in the sample.
Table 13. The frequency distribution on the Blade Color’s State on the sample of 1,000 in
different number of states observed
Frequency in the Number of States Observed

State (7) (6) (5) (4) (3) (2) (1)
Pale Green 621 622 624 628 640 669 1000
Green 32 32 33 36 0 0 0
Dark Green 285 286 288 291 303 331 0
Purple Tips 3 0 0 0 0 0 0
Purple Margins 40 41 42 45 57 0 0
Purple Blotch 7 7 0 0 0 0 0
Purple 12 12 13 0 0 0 0
Total 1000 1000 1000 1000 1000 1000 1000
Shannon Index 0.99774 0.98225 0.95495 0.91060 0.81070 0.63488 0.00000
Differences of Observed State on Different Descriptors
The Shannon indices of different descriptors with different number of state were
also analyzed. Table 14 presents seven descriptors with increasing population index as
the number of state increases. However, there is no descriptor with six states found in the
rice collection. It can be observed that Blade color with seven states had a Shannon index
of 1.00978. This index was less than the index of Leaf Length with only five states but
with an index of 1.02823. Thus, it can be said that the rice collection’s Leaf Length is
more diverse than Blade color. The same behavior was followed for the sample of 100
and 50.
Table 14. Shannon index of different descriptors with different number of states
No. of Population Sample Index

Descriptor state Index 100 50
Endosperm 2 0.22900 0.22697 0.20456
Culm Number 3 0.63162 0.60008 0.53418
Collar Color 4 0.95320 0.94536 0.83457
Leaf Length 5 1.02823 1.02186 0.91956
Blade Color 7 1.00978 0.88913 0.87317
Apiculus Color 8 1.40152 1.37467 1.00034
Leaf Senescense 9 1.38969 1.28873 1.03265
Analysis for Confidence Interval
The original sample of 100 was used for the analysis of bootstrap confidence
interval since only after this number of original sample variation on the index was
observed. Original samples with large number, from 500 to 3,000 for instance, produced
confidence intervals that always cover the original estimate. Thus, a relatively small
sample was utilized to further investigate the power of bootstrap confidence interval.
Normal Approximation Method
The 95% confidence interval using the normal approximation method for different
bootstrap resamples with the original sample of 100 observations was presented in Table
10. It was revealed that the confidence interval using normal approximation with 2,000
resamples had the narrowest length; while the bootstrap estimate with 5,000 resamples
had the widest coverage of interval. Figure 8 illustrated the coverage of the intervals and
it can be noticed that the intervals are closed to one another. All the intervals covered the
population index of Blade color with the value of 1.00978 indicated by the solid vertical
line on the graph.

Table 15. 95 % Confidence Interval using Normal Approximation method.
Bootstrap Bootstrap 95 % Confidence Interval Interval

Resamples Estimate Lower Limit Upper Limit Length
200 0.87139 0.74643 1.02694 0.28050
500 0.87449 0.74744 1.03083 0.28339
1000 0.87593 0.74888 1.02939 0.28050
1500 0.87497 0.74728 1.03099 0.28371
2000 0.87207 0.75150 1.02677 0.27526
5000 0.87342 0.74688 1.03139 0.28450
10000 0.87256 0.74761 1.03066 0.28306
Original Sample
Estiimate
10000 Bootstrap Estimate
Lower Limit
Upper Limit
5000
2000
Resamples
1500
1000
500
200
0.700000 0.800000 0.900000 1.000000
Shannon Index
Figure 8. 95 % Confidence Interval using Normal Approximation method
Percentile Method
The 95% confidence interval using the percentile method for different bootstrap
resamples with the original sample of 100 observations was presented in Table 16. It
was revealed that the confidence interval using normal approximation with 200 resamples
had the narrowest length; while the bootstrap estimate with 1,500 resamples produced the
widest coverage of interval.
Table 16. 95 % Confidence Interval using Percentile method

200 0.87139 0.74757 1.01639 0.26882
500 0.87449 0.73066 1.01065 0.27999
1000 0.87593 0.72907 1.01195 0.28287
1500 0.87497 0.73066 1.01474 0.28408
2000 0.87207 0.72755 1.00192 0.27437
5000 0.87342 0.72411 1.00800 0.28389
10000 0.87256 0.73066 1.01058 0.27992
Original Sample
Estimate
Lower Limit
Upper Limit
5000
2000
Resamples
1500
1000
500
200
0.700000 0.800000 0.900000 1.000000
Shannon Index
Figure 9. 95 % Confidence Interval using Percentile method

As illustrated in Figure 9, unlike the normal approximation method, the
confidence intervals in different resamples were not symmetric with respect to the
bootstrap estimate. Notably, the upper bound of all the intervals were lie near the
population index of Blade color.
Bias-Corrected Method
The 95% confidence interval using the bias corrected method for different
bootstrap resamples with the original sample of 100 observations was presented in Table
17. Likewise to Normal approximation methods, the confidence interval 2000 resamples
registered the narrowest length among the different number of resamples implemented;
also the bootstrap estimate with 200 resamples had the widest coverage of interval.
As illustrated in Figure 10, unlike the normal approximation and percentile
method, the skewness in the intervals were more reflected using the bias corrected
method. The lower bound of all the resamples converged to a certain value; while, the
upper bounds varied significantly on every resample.
Table 17. 95 % Confidence Interval using Bias Corrected method

200 0.87139 0.76150 1.06629 0.30479
500 0.87449 0.75628 1.03345 0.27716
1000 0.87593 0.76150 1.02870 0.26721
1500 0.87497 0.75628 1.03547 0.27919
2000 0.87207 0.76070 1.01877 0.25807
5000 0.87342 0.75677 1.03474 0.27797
10000 0.87256 0.76160 1.03986 0.27826
Original Sample
estimate
10000 Bootstarp Esimate
Lower Limit
Uppwe Limit
5000
2000
Resamples
1500
1000
500
200
0.700000 0.750000 0.800000 0.850000 0.900000 0.950000 1.000000 1.050000
Shannon Index
Figure 10. 95 % Confidence Interval using Normal Approximation method
Comparison of Intervals on Different Resamples
The three methods in bootstrap confidence intervals were further analyzed within
the different number of bootstrap resamples as presented in Figure 10. For all the
different number of resamples specified, the lower and upper bound of the intervals using
Percentile Method reached the lowest index value for the confidence interval of the Blade
Color among the three methods except only for the lower bound of 200 resamples. On
the other hand, all the lower and upper bound of the confidence interval for all the
number of resamples specified using the Normal Approximation arrived with the highest
value of Shannon diversity index of Blade Color except with 2000 number of resamples.
BC
200
NA
3 Original Lower
Sample Limit
500
2 Estimate Upper
Bootstrap Limit
1 Estimate
BC
1000
P
NA
BC
1500
NA
NA
2000
NA
BC
5000
NA
BC
10000
NA
0.700000 0.800000 0.900000 1.000000 1.100000
Shannon Index
Figure 10. Comparison of the Confidence Interval using different methods in different
numbers of resamples
100 Confidence Intervals
From the 100 constructed confidence intervals using the Normal Approximation
method, all the intervals covered the original bootstrap estimates. It can also observe on
Figure 9 that the confidence intervals were symmetric with the reference on the bootstrap
estimate.
Original sample estimate

Bootstrap estimate
100
Lower limit
97
Upper limit
94
91
88
85
82
79
76
73
70
67
64
61
58
Interval
55
52
49
46
43
40
37
34
31
28
25
22
19
16
13
10
0.700000 0.800000 0.900000 1.000000
Shannon index
Figure 11. One Hundred Bootstrap confidence interval using Normal Approximation
Method
Unlike other methods, fifty-five out of 100 confidence intervals using Percentile
method covered the true population parameter. The other forty-five intervals have upper
bound less than the parameter. Moreover, the intervals produced were no longer
symmetric with respect to the bootstrap estimates.
Original Sample
estimate
Bootstrap estimate
101
99 Lower limit
97 Upper limit
95
93
91
89
87
85
83
81
79
77
75
73
71
69
67
65
63
61
59
57
55
Interval
53
51
49
47
45
43
41
39
37
35
33
31
29
27
25
23
21
19
17
15
13
11
9
7
5
3
1
0.700000 0.800000 0.900000 1.000000
Shannon index
Figure 12. One Hundred Bootstrap confidence interval using Percentile Method
The Bias-Corrected method also generated 100 confidence intervals that cover the
original bootstrap estimate like the previous two methods. No interval was found to be
symmetric with regard to bootstrap estimate and it can be observed that intervals are
skewed at either side the bootstrap estimate.
Original Sample
Estimate
99
97 Lower Limit
95
93 Upper Limit
91
89
87
85
83
81
79
77
75
73
71
69
67
65
63
61
Case Number
59
57
55
53
51
49
47
45
43
41
39
37
35
33
31
29
27
25
23
21
19
17
15
13
11
9
7
5
3
1
0.700000 0.800000 0.900000 1.000000
Value
Figure 13. One Hundred Bootstrap confidence intervals using Bias Corrected Method
Confidence Interval Measures
Using the 100 generated confidence intervals on the three methods, the
comparison of the properties of the interval were evaluated and presented on Table 14.
As a measure of the interval’s accuracy, the Average Range of the ranges of the 100
intervals indicated that the Bias-corrected method produced the most accurate interval
with the mean range of 0.26601. It was followed by Bias-Corrected and Normal
Approximation method with corresponding ranges equal to 0.28314 and 0.28332.
The Expected Coverage Rate and Realized Coverage Rate of the Normal and
Bias-corrected methods were the same with the rates equal to 0.95 and 1.00, respectively.
Percentile method, on the other hand, has a realized coverage rate of 0.55. Thus, this
means that the intervals constructed under Normal approximation and Bias-corrected had
the same number of intervals that actually covered the original sample estimate. Also,
only 55 percent of the intervals will expect to contain the parameter index using
Percentile method. Thus, the Normal approximation and Bias-corrected method are more
sufficient than Percentile method in forming confidence interval.
Similarly, both Normal approximation and Bias-corrected had the same
calibration rate of 0.99646 which implies that the average range of each method needs a
downward adjustment for the estimated confidence intervals to equalize the realized
coverage rate with the expected coverage rate. On the other hand Percentile method had a
calibration rate of 1.03454 which needs an upward adjustment. However, based on the
method’s Calibrated Average Rate, the Bias-corrected method is the most efficient
estimation after the adjustment of coverage sufficiency.

Table 18. Comparisons of the properties of the three confidence intervals
Interval Properties Normal Approx. Percentile Bias-Corrected

Average Range 0.28314 0.28332 0.26601
Expected Coverage Rate 0.95 0.95 0.95
Realized Coverage Rate 1.00 0.55 1.00
Calibration Rate 0.99646 1.03454 0.99646
Calibrated Average Rate 0.28214 0.29311 0.26507
Distribution of Shannon Index
It is also an interest to know the behavior of the distribution of Shannon index as
the number of bootstrap resamples varies. The original sample of 100 was used for this
analysis since it is the sample where the index reached its critical limits on fitting the
distribution for normal. Different levels of significance were set to determine the power
of the distribution of the estimates as it follows the normal distribution.
Figure 14 showed the probability density function of the distribution of the
estimates generated by bootstrapping using different number of resamples. Graphically,
all the resamples’ distribution resemble the curve of the normal distribution.
Using Kolmogorov-Smirnov statistics on the Goodness-of-Fit test for normality,
the distribution of the estimates produced by having 200 and 500 bootstrap resamples
fitted the normal distribution at level of significance ranging from 0.01 to 0.2. For the
bootstrap resamples of 1,000, the distribution of estimates of the resamples differs
significantly from normal distribution only at 0.2 level of significance. Both 2,000 and
5,000 resamples detected that index distribution does not fit the normal distribution on at
least 0.05 level of significance. Thus, as the number of bootstrap resamples increases the
distribution of the Shannon index departs from the normal distribution.

Probability Density Function Probability Density Function
0.18
0.18
0.16
0.16
0.14
0.14
0.12 0.12
0.1
f(x)
f(x)
0.1
0.08 0.08
0.06 0.06
0.04 0.04
0.02 0.02
0 0
0.7 0.8 0.9 1 1.1 0.7 0.8 0.9 1 1.1
x x
Histogram Normal Histogram Normal
(a) 200 Resamples (b) 500 Resamples

0.2
0.18
0.18
0.16
0.16
0.14
0.14
0.12
0.12
f(x)
f(x)
0.1
0.1
0.08
0.08
0.06
0.06
0.04 0.04
0.02 0.02
0 0
0.6 0.7 0.8 0.9 1 0.7 0.8 0.9 1 1.1
x x
(c) 1000 Resamples (d) 1500 Resamples

0.18 0.18
0.16 0.16
0.14 0.14
0.12 0.12
f(x)
f(x)
0.1 0.1
0.08 0.08
0.06 0.06
0.04 0.04
0.02 0.02
0 0
0.7 0.8 0.9 1 0.6 0.7 0.8 0.9 1 1.1
x x
(e) 2000 Resamples (e) 5000 Resamples
Figure 14. Comparison of the distribution of the Bootstrap estimates using different
resamples
Table 17. Goodness-of-Fit test for the normality of the different bootstrap resamples
Goodness of Fit (Test of Normality)

Kolmogorov-Smirnov
Sample Size 200
Statistic 0.04181
P-Value 0.86094
Α 0.2 0.1 0.05 0.02 0.01
Critical Value 0.07587 0.08648 0.09603 0.10734 0.11519
Reject? No No No No No
Sample Size 500
Statistic 0.02512
P-Value 0.90266
Α 0.2 0.1 0.05 0.02 0.01
Critical Value 0.04799 0.05469 0.06073 0.06789 0.07285
Reject? No No No No No
Sample Size 1000
Statistic 0.03585
P-Value 0.14925
Α 0.2 0.1 0.05 0.02 0.01
Critical Value 0.03393 0.03867 0.04294 0.048 0.05151
Reject? Yes No No No No
Sample Size 2000
Statistic 0.03297
P-Value 0.02532
Α 0.2 0.1 0.05 0.02 0.01
Critical Value 0.02399 0.02735 0.03037 0.03394 0.03643
Reject? Yes Yes Yes No No
Sample Size 5000
Statistic 0.02057
P-Value 0.02868
Α 0.2 0.1 0.05 0.02 0.01
Critical Value 0.01517 0.0173 0.01921 0.02147 0.02304
Reject? Yes Yes Yes No No
SUMMARY AND CONCLUSION
Shannon diversity index is widely used in determining the diversity of a
collection. However, the estimate of this index is known to be biased and with no simple
formula for statistical properties exist. Without any requirement for the formula,
bootstrapping was used to estimate the Shannon index and construct confidence interval
around it.
The behavior of the Shannon index using the different conditions under bootstrap
method was analyzed in the study. Using the different number of original sample to be
used for bootstrapping, relatively small number of original sample found to have
significant effect on the variation of the statistical properties of the Shannon diversity
index. From almost 9,000 rice accessions, an original sample of size 100 was found to
detect significant bias and interval coverage for the diversity index of Blade Color and
other descriptors. Also, the number of bootstrap resamples did not present any particular
trend for the properties of index for large original samples. However, resamples indicated
that as the number of resamples increases the distribution of the Shannon index will more
likely deviate from normal distribution.
Among the three methods used for the interval construction, Bias-corrected
method produced the most accurate, sufficient, and efficient interval based on average
range, expected and realized coverage rate, and calibration rate. On the other hand
Percentile method has a calibration rate of 1.03454 which needs an upward adjustment.
Normal Approximation method provided significant intervals for the some resamples that
detected to have a normal distribution.

LITERATURE CITED
ALDEMITA, Y. 2006. Genetic diversity in Korean Germpalsm of rice with different

levels of resistance to blast in the Philippines. Undergraduate Special Problem.
University of The Philippines, Los Banos.
ALCASID, C. 2006. Morpho-Agronomic Diversity Analysis of twenty thermo-sensitive

genetic male sterile lines of rice. Undergraduate Special Problem. University of The
Philippines, Los Banos.
ALMAZAN, K. 2007 Agro-Morphological Diversity among wide crossed derived rice

lines at F8 generation. Undergraduate Thesis Manuscript. University of the Philippines,
Los Banos.
EFRON, B. 1981. Nonparametic estimates of standard error: the jackknife, bootstrap,

and other methods. Biometrika 68. 589-599.
HUTCHENSON, K. 1970. A test for comparing diversities based on shannon formula.

Journal of Theoretical Biology. 29. 151-154.
JAIN, S.K., QUALSETM C.O., BHATT, G.M., WU, K.K. 1975. Geographic patterns of
Phenotyphic diversity in a world collection of Durum Wheats. Crop Science 15. 700-704.
PEET, R.1975. Relative Diversity indices. Ecology 56. 496-498
PIELOU, E.C. 1966. The measurement of diversity in different types of biological

collections. Journal of Theoretical Biology. 13. 131-141.
_____ .1969. An introduction to Mathematical Ecology. Wiley, New York.
PINOL, M. 1997. Species diversity measurement for logged-over dipterocarp forest in

the Philippines under different cutting regimes. Graduate Thesis. University of the
Philippines, Los Banos.
Riley, K.W. 1998. Phenotypic diversity in the ethiopian noug germplasm. African Crop
Science Journal 8. No. 2 . 137-143
Rice Diversity <http://www.ricediversity.org>
YANG, R.C., JANA, S., CLARKE, J.M. 1991. Phenotypic diversity and associations of
some potentially drought-responsive characters in durum wheat. Crop Science 31:1484 -
1491
ZALH, S. 1977. Jackkniffing an index of Diversity. Ecology 58. 907-913

Bootstrap Confidence Interval Estimation For Shannon Diversity Index

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Bootstrap Confidence Interval Estimation For Shannon Diversity Index

Hochgeladen von

Copyright:

Verfügbare Formate

BOOTSTRAP CONFIDENCE INTERVAL ESTIMATION FOR

SHANNON DIVERSITY INDEX:

Background of the Study

Diversity index is a parameter intended to measure diversity or variability of a

quantitative variables. Among several indices, Shannon index (Shannon-Weiner index)

application in measuring the diversity of a collection coming from infinitely large

common scenario in which it becomes necessary to estimate diversity. As an estimator

species are not included in the sample (Pielou 1969).

Shannon index remains.

species or subjects increases. In biological diversity, the number of species in a collection

is sometimes not independently known. For phenotypic diversity, some characteristics

estimate of bias and variance.

Similar to the basic idea of jackknife method, bootstrap method produces an

estimate sample variability by resampling from empirical probability distribution defined

standard error and construct confidence interval of the result.

In bootstrap sampling data can be created based on the observation without

substitute computation for mathematical analysis if calculating the asymptotic

distribution of an estimator or statistic is difficult. On that reason, bootstrap could be a

However, the choice of which bootstrap confidence interval to use is highly

dependent on the practical research situation. Among several bootstrap confidence

different field of science.

Statement of the Problem

techniques using bootstrapping sampling distribution to build confidence intervals around

accounted to random variation.

Objective of the Study

Shannon index. Specifically, this study was conducted to:

number of original sample, bootstrap resample, and classes vary;

2. determine the sampling distribution of the Shannon index;

3. construct confidence interval of Shannon index using bootstrap methods such

accuracy, precision, and reliability; and

Significance of the Study

information content of a collection. Most of the times it is of interest to compare

structure. For environmental sciences, quantifying through this index is a tool in

extinction or domination of a species or classification. Producing an index that best

measurement will be improved if statistical properties of estimates will be studied.

Balanced characterization and biological-conservation program will be more efficient.

The application of index in biological diversity had been increasingly adopted

Pielou reiterated that estimates of Shannon should always be accompanied by estimates

of their standard errors (1969).

using field data as H 

included in the sample (Peet 1974).

Zahl (1977) applied the so called jackknife method to the estimation of

estimates making possible significant test and confidence interval.

A simulation study made by Pinol (1997) focused on the impact of jackknifing

Shannon index. Jackknife method performed better, resulting in a substantial reduction

on bias. However, the estimates gathered showed an increase on its variance.

Efron (1981) compared jackknife and bootstrap method in evaluating the

properties of the statistical estimates.

diversity used Shannon index to distinguish the genetical variation of an organism.

resistance of the variety from certain diseases.

additivity property of the Shannon index which is useful in hierarchal analyses of

diversity in a large variety of data.

Meanwhile, a study by Riley in 1998 was undertaken to estimate the phenotypic

in different agro-ecological areas of the country. As described by Jain et al., the

collection to research programmes designed to locate genes depends on adequate

A study by Aldemita (2006) was done to determine the genetic diversity of

Bootstrap analysis was used to determine the variation on phenotypic characteristics of

confidence that a particular group is true.

On the literature of bootstrap confidence interval, Efron and Tibshani (1993)

advantages and disadvantages of each method. The normal approximation requires

computation. On the other hand, percentile method, which is invariant to transformation,

allows the distribution of the estimator to be asymmetrical; however, it results to low

probability  / 2 , ( 1   / 2 ), and p, respectively, where p is the proportion of the