• Additive Effects 4

Analysis of Count • Independence of 3

and Proportion Data errors

Variance

2

• Homogeneity of

variances 1

• Normal distribution

0

0 2 4 6 8 10

Mean

Violeta I. Bartolome

Senior Associate Scientist

PBGB-CRIL

v.bartolome@cgiar.org

• Response variable

10

9

• Count of the number

is an integer

8

7

of failures of an 2

Variance

event as well as the

Variance

6

• Variance usually 5

4 number of successes

increase linearly 3

1

1

0 inverted U-shaped 0

0 2 4 6 8 10

function of the mean. Mean

normally distributed Mean

Count Data

Analysis of Count data

For treatment levels, define the

control as the first level when

sorted in ascending order. GLM

uses the first level as reference.

Overdispersion

• There are extra, unexplained

Note: glm

uses the first variation in the response

level as

reference.

• May result if the underlying

distribution is not Poisson

401.45/15=26.8

• Compensate for the overdispersion

Residual deviance

is much greater by refitting using quasi-Poisson

than df.

df. Indication

of overdispersion

rather than Poisson errors.

Correct for overdispersion

ANOVA table

401.47/15=26.8

Residual Plot

Standardized residuals

• After fitting a model to data, we

• For count data • For proportion data

should investigate how well the

y − fittedvalue

model describes the data. y − fitted value

fitted values

• With normal errors, the raw and fitted valuesx 1 −

fitted values

binomial deno min ator

standardized residuals are identical.

• The standardized residuals are

required to correct non-normal errors

(like in count and proportion).

Residual plot

Compute standardized residuals

Predicted Means

Note:

differences are

based on

transformed

values

If the interval

includes zero then

difference is not

significant.

Proportion Data

o Convert to percentage data and used • Use general linear model (glm)

as response variable • Family=binomial

o Not good • Uses two vectors, one for success

o Errors are not normally distributed counts and the other for failure

o Variances are heterogeneous counts

o Response is bounded by 0 and 100 • Number of failures + number of

o Size of the sample, n, is lost successes = binomial denominator, n

Analysis of proportion Create response matrix

• Second column is n - first column

123.96/45=2.8

An indication of

overdispersion

ANOVA table Plot standardized residuals

Predicted Means

Mean Comparison

Thank you!

