Sie sind auf Seite 1von 2

Comments on Harris DE, Aboueissa, AM and Hartley, D (2008) Igor Kovasov Institute of Biology of the Southern Seas National

Academy of Sciences of Ukraine This note reports a wrong statistical model used in Myocardial infarction and heart failure hospitalization rates in Maine, USA variability along the urban-rural continuum by Harris DE, Aboueissa, AM and Hartley, D, published in Rural and Remote Health, 8: 980 (online), 2008. In this study, the response (dependent) variable is the hospitalization rate. It is well known that the Poisson regression is the prototype for rate data (see, Chapter 12 of Fleiss, Levin and Paik, 2003 and Chapter 9 of Agresti, 2002). In the case of over-dispersion, the quasi-Poisson or negative binomial is the appropriate model for rate data. From statistics point of view, a rate (such as hospitalization rate, rare disease rate, accident rate, etc) is defined to be the ratio of the frequency (X) and size (of the population or subpopulation, denoted by N). We know the frequency (X) of the occurrences of any rare event (number of people hospitalized in this case) is a default Poisson random variable (not a normal random variable) whiles the size (N) is not a random variable. Therefore, the rate is a scalar multiple of a Poisson random variable. That is, the rate is not a normal random variable. From mathematical point of view, a rate is defined to be in interval [0, 1] while normal random variable is defined in (-, +). A little bit more technical justification on why Poisson is the correct model and normal linear model is incorrect: Let X = number of hospitalized patients, N = population size, the hospitalization R = X/N, the rate of hospitalizations. Then log (R) = log(X) log(N). Since X < N, therefore, - < log(R) = log(X) log(N) < 0 consequently, exp(-) < R < exp(0), i.e. 0 < R < 1. This makes both statistical and mathematical senses. In linear regression model, the expected rate is defined to be E[R] = 0 + 1X1 + + kXk. From this definition, we can see that there is no warranty the average rate is always in [0, 1] for any pattern of values of independent variables. Hence the linear regression model is wrong from both statistical and practical point of view. In summary, the normal based linear regression in the data analysis is a wrong model. Consequently, the results based on the normal based linear regression model are statistically incorrect. Acknowledgement I thank the editors professional comments on this note which make the original submission clearer. References Fleiss, JL, Levin, B and Paik, MC. 2003. Statistical Methods for Rates and Proportion. 3rd Ed. John Wiley Agresti, A. 2002. Categorical Data Analysis, 2nd Ed. John Wiley.

I only point out a wrong statistical model that was used in the aforementioned study. As an author, I have no potential conflict interest with any third party.

Das könnte Ihnen auch gefallen