Sie sind auf Seite 1von 8

STA 6166, Section 8489, Fall 2007

Final Exam Part I


Due 04 December 2007

RAMI SHAMSHIRI
UFID#: 9021-3353

Ramin Shamshiri, UFID#: 9021-3353

Page 1

A) Please read the attached paper by Bell et al. (2005, Science, 308, 1884 and supplementary material; Attachments I and II) and answer the following questions: a. State in words all sets of hypotheses that the authors are interested in testing (note that the authors could be interested in more than one set of hypotheses!)
The main hypotheses that the scientists are claiming (HA) are the following i. The slope of the taxa-area relationship for natural bacterial communities (which can be considered Microbes) inhabiting small aquatic islands is comparable to the slope of the taxaarea relationship for larger organisms. In the other words, the author claims that the mean of diversity of natural bacterial communities inhabiting small aquatic islands is similar to that found for larger organisms. ii. The slope of species-area relationship for insular bacterial communities would be similar to that found for communities of larger organism (on discrete islands).

The secondary hypotheses that the scientists are interested in are: i. Analogous processes structure both microbial communities and communities of larger organisms ii. To check the possibility whether other mechanisms can underlie the difference between the authors results and those of other microbial studies, for example, the Treehole habitat is more heterogeneous, so diversity increases more rapidly with area size.

Expanding the answer: First, based on recent studies, the slope of the relationship between Microbes and area differs from the slope of the relationship between other species richness and area. Here, the authors claim to show that the slope of the taxa-area relationship for natural bacterial communities (which can be considered Microbes) inhabiting small aquatic islands is comparable to the slope of the taxa-area relationship for larger organisms (which can represents other species richness). Second; the previous studies indicate that the species-area relationship is expected to be steeper on discrete islands, but the authors claim to predict that the slope of species-area relationship for insular bacterial communities would be similar to that found for communities of larger organism (on discrete islands). As a result, the authors showed that bacterial genetic diversity in water-filled treeholes increases with increasing island size (volume) which is similar with the linear relationship between island surface area and bacterial genetic diversity. According to their equations,: Equation relating bacterial genetic diversity and island size (Volume): S=2.11V0.26 Equation relating bacterial genetic diversity and island surface area (cm2): S=3.3A0.28 They have also showed that treehole volume and surface area are correlated.

Ramin Shamshiri, UFID#: 9021-3353

Page 2

b. Is the research observational or experimental?


This is an Observational study. Because the data are observed on a sample of population, and No treatment is assigned to samples. Here the interest is describing population.

c. What factors or explanatory variables are they interested in studying for their effects on the microbes?
Based on the relationship between diversity and sampling area size ( = . ) which is stated in the paper, we can write the equation as below: ln = ln . => ln = ln + . ln y = c + z. x Or y = + .x

x= (ln (A): size of area) is the explanatory variable and is measured as Island size (volume) Island surface area Expanding answer: The paper says that the number of Taxa can increase with the size of area. In addition, the number of Taxa in a particular area results from the balance between the colonization of new taxa and the extinction of the extant taxa. The size of area also influences the rate of colonization and extinction which indirectly influences biodiversity. The islands used in this research (area size) are water filled treeholes. The researcher measured water volume (island size) which is the explanatory variable and the bacterial genetic diversity (response variable) in 29 treehole islands.

d. What response variables are they measuring on the microbes?


In the equation, y = c + z. x Or y = + .x

y= (ln(S): Number of Species) is the response variable which is bacterial Genetic Diversity (the number of DGGE bands, S) in water-filled treeholes (shown to increases with increasing island size.) Another variable that might be considered as a response variable since its behavior is studied in this paper is the Slope z, (slope of the species-area relationship.) = (ln(c): empirically derived taxon and location specific constant) is the intercept = slope of the line (slope of the relationship between Number of species and size of area)

e. Describe the population(s) of interest to the researchers (Hint: think in terms of the scope of inference what group(s) or set(s) do the scientists wish to make inferences about)?
The populations of interests are all the possible Microbial communities and all possible communities of larger organism.

Ramin Shamshiri, UFID#: 9021-3353 Page 3

f. Describe the sample(s) that were collected, including the method used for selecting the sample.
The samples are water-filled holes of varying volume and surface area. The water volume (island size) and the bacterial genetic diversity (taxon richness) are measured in 29 tree-hole islands by homogenizing the water and sediment contained within the tree-holes and siphoning the liquid into measuring cylinders. Surface area of the tree holes was determined from digital photographs using the ImageJ 1.32 software package. 50ml of the mixed water and sediment from each tree-hole was then transferred into vials and kept them at 4C before processing in the following day. (Tree-hole volume and surface area varied over two orders of magnitude, so they were comparable to studies conducted on larger organisms.) This method of sampling can be considered as two-stage sampling method. The genotype diversity of each of the treehole bacterial communities are determined using denaturing gradient gel electrophoresis which is a technique commonly used to compare bacterial communities from environment samples. More explanation: The islands used by authors are water-filled tree-holes, a common feature of temperate and tropical forest. Rainwater accumulates in barklined pans formed by the buttressing at the base of large European beech trees to form small but often permanent bodies of water. Each of these islands houses a micro ecosystem that derives its nutrients and energy from leaf litter.

g. Restate all sets of hypotheses in statistical terms, i.e. in terms of the population parameters that are believed to be affected by the treatments. 1
. .
. .

H :

H :

. .

. .

..

..

or

H : H :

. .

. .

..

..

is the mean of diversity of Natural Bacterial communities which changes with area size and

is the slope of relationship between species richness and area size for Natural Bacterial communities inhabiting small aquatic islands. . . is the mean of diversity for communities of .. larger organisms and is the slope relationship between species richness and area size for communities of larger organisms. 2 : :
. . . .

Where . . is the slope of species-area relationship for insular bacterial communities and slope of relationship found for communities of larger organism (on discrete islands).

>

..

.. ..

is the

Ramin Shamshiri, UFID#: 9021-3353

Page 4

h. Describe the statistical method used to test the hypotheses. Have all the assumptions of the test been met? Explain.
The relationship between Bacterial genetic diversity and island size (Volume) has been first determined with simple regression method. The result of this regression leads to the equation = 2.11 . . The slope of this equation (z1=0.26) is compared with the slope (z2=0.28) of the linear relation between island surface area (cm2) and bacterial genetic diversity, equation = 3.30 . . The statistical t-test or ANOVA can be used to test for any significant difference between the mean of diversity for Natural bacterial communities and communities of larger organisms. One of the assumptions of t-test is normality, and In fact, there is not enough evidence from the paper that the data are not Normal. Perhaps the availability of linear relationship and the high value of Rsquared can be used to state that data are coming from a normal source. The plot used to show the linear relationship between Island size and Bacterial diversity is in Logarithmic scale. We already know that in the ANOVA, if is proportional to the Mean, use the Logarithm of the yij.

Ramin Shamshiri, UFID#: 9021-3353

Page 5

B) Suppose you are asked to design an observational study to answer the question: Are undergraduate students on campus more likely to take classes during periods 1, 2, or 3 than undergraduates students who commute to campus? You are to design a strategy for sampling 100 students from each population to test the hypothesis. So, please answer the following questions:

a. State the hypotheses to be tested in statistical terms. Answer: If we state our claim in the form of p1 > p2, then we would test the below hypotheses: H0: p1 p2 H1: p1> p2 p1: Proportion of all of the undergraduate students on campus taking classes during periods 1,2 or 3. p2: Proportion of all of the undergraduate students commuting to campus taking classes during periods 1,2 or 3 : Proportion of the sample of on campus undergraduate student who has taken classes in one of the periods 1 or 2 or 3. Note: The point estimators for p1 and p2 are = b. What testing procedure will you use? Answer:
Here we have two populations, one is all undergraduate students on campus and the other one all undergraduate students commuting to campus. Now we want to compare proportions of students from the first population who has registered in classes during periods 1, or 2 or 3 with the proportions of students from the second population who has registered in classes during these periods. If a student is observed to have registered in classes during periods 1 or 2 or 3, then the outcome is YES, otherwise the outcome is NO. So, the testing procedure would be as follow: Here we are comparing proportions using two independent samples. A hypothesis test involving a population proportion can be considered as a binomial experiment when there are only two outcomes and the probability of a success does not change from trial to trial. The appropriate statistic for inferences on (p1-p2) is: = 1 1 1

: Proportion of the sample of on campus undergraduate student taking classes in one of the periods 1 or 2 or 3. / and = /

. (Table 1)

Or

Ramin Shamshiri, UFID#: 9021-3353

Page 6

Explanations: Because. Based on independent samples of size n1 and n2, we want to make inferences on the difference between p1 and p2, that is p1-p2. Assuming sufficiently large sample sizes, the difference is normally distributed with Mean= P1-P2 Variance= + and has the standard normal distribution: = . But the problem is that the expression for the variance of the difference contains the unknown parameters p1 and p2. Thus we use an estimate of the common proportion for the variance formula. P1: the probability of success in population 1 P2: the probability of success in population 2 = = / : is the estimate of p1 / : is the estimate of p2

c. What are the assumptions of the test?


The assumptions must satisfy the binomial distribution. (Sampling should meet the conditions of a Binomial Experiment). The sample sizes should be large enough that we can use the central limit theorem and large enough so that we can use the Standard Normal rather than the T-distribution. Observations are independent of each other Each selection of an experimental unit (i.e. each trial) is a random selection from the population of interest. The selections are taken with replacement or the total number of samples is less than 5% of the population The probability of observing a success, , in a single selection does NOT change between trials The probability of success is constant for all observations.

d. How will you design the sampling in order to ensure that the assumptions of the test are met? Describe the sampling design you will use. If you are using any lists such as registrar information please describe explicitly what information you are assuming is available for you to use.
This is an observational study in which data are observed on a sample of the population. Sampling strategy for observational studies must consider the followings: Good representation of the population No systematic bias Small sampling variability Cost constraints (time, money, feasibility) Precision of estimates Power of tests

For this problem, we can use Random Sampling in which every possible sample of n units is equally likely to be observed. We can also use Stratified Sampling.

Ramin Shamshiri, UFID#: 9021-3353

Page 7

Random Sampling: The procedure for sampling students to make inference on the proportion of undergraduate students who are living on campus or commuting to campus and take classes on the periods 1 or 2 or 3 may be as follow: First: Accessing to the registrar list of all undergraduate student which reveals below information: Student living address, on campus or Off campus Student class registration information For example, there might be 3000 undergrad students in the list, of which 1300 are living on campus (Group A) and 1700 are living off campus (Group B). We can then assign a ID numbers from n1_1 to n1_1300 to students in group A and from n2_1 to n2_1700 to students in group B. Now we have unique ID numbers for every element in the population. So we can use a random generator (table of random digits, computer, calculator) to get a list of 100 randomly generated ID numbers from each population (Group A and Group B). As a result, we will have 100 randomly generated ID numbers (i.e. n1_150 , n1_27 ,.., n1_1011 , n2_1503 , n2_466 ,, n2_206 ) for each group and can make a table to check if a student has taken classed on periods 1, 2 or 3. Based on the result of this table, we will find = /100 and = /100
On Campus Student ID Has taken classes on periods 1 or 2 or 3 Yes No Off Campus Student ID Has taken classes on periods 1 or 2 or 3 Yes No

Group A
n1_150 n1_27 RANDOM IDs . . . n1_1011 Total n1=100

Group B
n2_1 n2_1 RANDOM IDs . . .

y2

n2_100 Total n2=100

y2

e. How does your design ensure that the sample is representative of the population being sampled (representative: sample estimates are unbiased for the population parameters being estimated)?
We have supposed that we have two populations, each of them with a proportion (p) of Yes and we take a binomial sample of size n=100 from each. As long as the sampling is done according to the requirements for a Binomial random variable, the frequency distribution of the sample number of successes has the characteristics that the shape of the distribution (which is exactly Binomial) can be approximated as a bell-curve (Normal), if the sample size is relatively large and the population proportion (p) is neither too small nor too large. We can verify our assumptions with the general rule of thumb which says: the sample size should be such that np>10 and n(1-p)10

Ramin Shamshiri, UFID#: 9021-3353 Page 8

Das könnte Ihnen auch gefallen