Sie sind auf Seite 1von 21

# A Statistical Model to

## Explain Potential Causes

of Obesity in the U.S.

By

Amber Oldfield
St. John’s University- MBA
New York, NY
Amber.oldfield08@my.stjohns.edu
(570) 407-0224

2

I. Introduction

## The statistical research presented is used to help discover potential

causes of obesity throughout the United States. The National Institute of
Health (NIH) defines obesity as body mass index (BMI) greater than 30. The
study will be a cross-sectional analysis evaluating each of the 50 states of
the U.S. and potential contributing factors to obesity. All of the information
obtained is from year end 2007. Six factors will be evaluated to discover if
they do indeed contribute to a growing obesity rate. These independent
variables that will be evaluated are per capita income, unemployment rate,
percent of graduates from High School (25 years and older), diabetes rate,
population density, and percentage of uninsured individuals. After evaluating
the effect of each of these variables on the obesity rate, it will be clear the
degree to which they actually affect the obesity rate in the U.S., if indeed
they have any affect at all. This research is relevant and will prove valuable
to doctors, dieticians, trainers, and health care insurers. The research may
also prove to be valuable to those that are currently obese and are trying to
determine what factors are contributing to their condition. The results of this
research can help all of these individuals understand obesity to a greater
degree and may change the action they take in trying to alleviate the
condition.

## Prior research has been conducted on the study of potential causes of

obesity. Some of this research has proven to more successful than others in
determining what may be contributing to a growing obesity rate. Below is a
list of this prior research detailing the independent variables used, with the
corresponding functional specifications, and the resulting coefficient of
determination (R2).

+ -

3

R2 = .363

+ + -

degrees)

R2= .694

+ +

R2= .431

- - -

R2= .575

## By evaluating this prior research it is possible to build on what has

already been done or to attempt new independent variable combinations
in the hopes of increasing the coefficient of determination (R2).

III. Methodology

## As previously mentioned the research is a cross-section analysis

evaluating six independent variables that may contribute to obesity. The
hypothesis stated concludes that the connection between the obesity rate
and per capita income will be negative; this means that as per capita income
increases, the obesity rate will decrease. The assumption between the
obesity rate with the unemployment rate, diabetes rate, and percent
uninsured will be positive. This means that as these independent variables
4

increase it will be assumed that the obesity rate will increase as well. As in
the case of the obesity rate and per capita income; the percent of High
School graduates (over age 25), and population density will have a negative
effect on the obesity rate. The data for this research was obtained from
statemaster.com, U.S. Department of Commerce, the Bureau of Labor
Statistics, the Center for disease Control (CDC), and the U.S. Census Bureau.
A more detailed description of these sources can be found in the appendix of
this report. All of the data analysis was performed using SPSS. The
techniques that will be used in this research are Graphical presentations-
scatterplots and histograms, Descriptive Statistics, Correlation and
Regression Analysis.

## The functional specification for this research is as follows:

Eqn. 1

- + - +
- +

Obesity Rate= f (Per Cap. Income, Unemployment %, % Grads HS, Diabetes Rate, Population
Density, % Uninsured)

IV. Results

## Figure 1, below, shows a histogram of the dependent variable, the Obesity

Rate. The histogram appears to be approximately normally distributed with a
slight skewness to the left.
5

## Histogram of Obesity Rate across the U.S.

14

12

10
Frequency

2
Mean =25.656
Std. Dev. =2.8188
0
N =50
18.0 20.0 22.0 24.0 26.0 28.0 30.0 32.0

ObesityRate

## Table 1- Descriptive Statistics

Table 1, below, confirms what was shown in the histogram that the
dependent variable, Obesity Rate, is skewed to left with a skewness equal to
-.194. Also, the kurtosis for the population density shows that the data is
leptokurtic, meaning that the data for population density if thin in the mid-
region but is greater in the tail regions – where there is high and low
population density.
6

Income 6 00

Rate

(25 years and
older)

Density

Uninsured

## Table 2, below, shows the correlation between the six independent

variables in relation to the dependent variable, the Obesity Rate. All of the
correlations presented agree with Eqn.1- the functional specification. The per
capita income has a moderately strong negative correlation with the obesity
rate at -.542. The unemployment rate has a moderately weak positive
correlation with the obesity rate at .413. The percent of graduates from HS
(25 years and older) has a moderately strong negative correlation at -.513.
The diabetes rate has the strongest correlation with the obesity rate of all
the independent variables evaluated at .685. The population density has a
weak negative correlation with the obesity rate at -.321. The percent
uninsured also has a rather weak correlation with the obesity rate, but is
positive, at .237. It is important to note that high multi-collinearity does exist
in a few places in the correlation matrix as indicated with an asterisk (*).
Multi-collinearity is when there is high correlation between the independent
variables. This may result in biased coefficients in the estimated sample
regression line equation.
7

## Obesit Per Unemploy % Diabet Populati Percent

y Rate Capita ment Rate Grads es Rate on Uninsur
Incom from Density ed
e HS
(25
years
and
older)

Income

Rate

HS (25 years
and older)

*

Density

Uninsured *

## Figure 2, below, presents a scatterplot of the obesity rate v. per capita

income. The scatterplot appears to possess a moderately strong, negative,
linear relationship.
8

32.0

30.0

28.0
ObesityRate

26.0

24.0

22.0

20.0

18.0

PerCapitaIncome

## Figure 3, below, presents the scatterplot of the obesity rate v. the

unemployment rate. The scatterplot appears to possess a moderately weak,
positive, linear relationship.
9

32.0

30.0

28.0
ObesityRate

26.0

24.0

22.0

20.0

18.0

UnemploymentRate

years and older)

## Figure 4, below, presents the scatterplot of the obesity rate v. percent of

graduates from High School (25 years and older). The scatterplot appears to
possess a moderately strong, negative, linear relationship.
10

Scatterplot of Obesity Rate v. High School Grads (25 yrs. and older), r = -.513

32.0

30.0
ObesityRate

28.0

26.0

24.0

22.0

20.0

18.0

## Figure 5, below, presents the scatterplot of the obesity rate v. the

diabetes rate. The scatterplot appears to possess a moderately strong,
positive, linear relationship.
11

32.0

30.0

28.0
ObesityRate

26.0

24.0

22.0

20.0

18.0

DiabetesRate

## Figure 6, below, presents the scatterplot of the obesity rate v. population

density. The scatterplot appears to possess a weak, negative, linear
relationship.
12

32.0

30.0
ObesityRate

28.0

26.0

24.0

22.0

20.0

18.0

## 0.0 200.0 400.0 600.0 800.0 1000.0 1200.0

PopulationDensity

## Figure 7, below, presents the scatterplot of the obesity rate v. percent

uninsured. The scatterplot appears to possess a weak, positive, linear
relationship.
13

32.0

30.0
ObesityRate

28.0

26.0

24.0

22.0

20.0

18.0

## 6.0 9.0 12.0 15.0 18.0 21.0 24.0

PercentageUninsured

Table 3, below, shows the regression analysis for the research. The
independent variables were entered stepwise with the probability to enter
set at .200 and the probability to remove set at .250. After entering stepwise,
the resulting independent variables that remained were the diabetes rate,
population density and the unemployment rate. Therefore the variables that
were removed were per capita income, % grads from HS (25 years and
older), and the percent uninsured. The resulting R Square is moderately
strong at .663.

## Eqn. 2 Y= 13.59+ 1.413*Diabetes Rate - .004*Population Density +

.719*Unemployment Rate

(3.17)**
14

(.003)

r (.626) (-.369)
(.281)

SE= 1.688

## From the regression results the R-Sq., which is the coefficient of

determination is equal to .663. This means that 66.3% of the variation in the
obesity rate can be explained by or attributed to variation in the diabetes
rate, population density and the unemployment rate.

T-Statistics

## The research for each independent variable will be tested for

significance will the following null and alternative hypothesis: (Results are
evident in the table above.)

Ho= B = 0

## Ha= B > 0 or B< 0, based on functional specification

The alternative was accepted for each of the independent variables as the
(p-value/2) is equal to approximately .00 for each. These independent
variables are significant at the 1% level of significance.

## The evaluation of the equation would be:

For each percentage increase in the diabetes rate the obesity rate would
increase by 1.413, on average with all things equal.

For each increase in population density (population per sq. mile) the obesity
rate would decrease by .004, on average with all things equal.

For each percentage increase in the unemployment rate the obesity rate
would increase by .719, on average with all things equal.
15

F- Statistic

## The research appears to be statistically significant at the 1% level

given that the F- statistic is equal to 30.23 and the significance is equal to
.000. Where:

Ho= B =B
Diabetes Rate Population Density =B Unemployment Rate =0

## The alternative would be accepted that at least one B is not equal to

zero, given that the F significance is equal to .000.

## Figure 8, below, presents the histogram of the residuals. The

histogram is appears to be approximately normally distributed.

Histogram of Residuals

10

8
Frequency

Mean =6.3976602E-15
Std. Dev. =1.63519572
0
N =50
-2.50000 0.00000 2.50000

RES_1

16

## Figure 9, below, presents the scatterplot of the dependent variable,

Obesity Rate, and the predicted value. The figure appears to be positive,
linear and possesses no outliers.

32.0

30.0

28.0
ObesityRate

26.0

24.0

22.0

20.0

18.0

PRE_1

## Figure 10- Scatterplot of Residuals v. Per Capita Income

Figure 10, below, presents the scatterplot of the residuals v. per capita
income. Correlation exists as there appears to be linear relationship with no
visible curves.
17

2.50000
RES_1

0.00000

-2.50000

PerCapitaIncome

## Figure 11, below, presents the scatterplot of the residuals v. the

unemployment rate. There appears to be linear relationship with a “cluster”
of points. There also appears to be one possible outlier.
18

2.50000
RES_1

0.00000

-2.50000

UnemploymentRate

years and older)

## Figure 12, below, presents the scatterplot of the residuals v. % HS

grads (25 years and older). There appears to be a linear relationship with no
visible curves.
19

Scatterplot of Residuals v. Percent of High School Grads (25 years and older)

2.50000
RES_1

0.00000

-2.50000

## Figure 13, below, presents the scatterplot of the residuals v. diabetes

rate. There appears to be a linear relationship with no curves and two
potential outliers.

2.50000
RES_1

0.00000

-2.50000

DiabetesRate
20

## Figure 14, below, presents the scatterplot of the residuals v. population

density. There appears to be a discontinuous, random, linear relationship
with a few potential outliers.

2.50000
RES_1

0.00000

-2.50000

## 0.0 200.0 400.0 600.0 800.0 1000.0 1200.0

PopulationDensity

## Figure 15, below, presents the scatterplot of the residuals v. the

percent uninsured. There appears to be a random, linear relationship.
21

2.50000
RES_1

0.00000

-2.50000

## 6.0 9.0 12.0 15.0 18.0 21.0 24.0

PercentageUninsured

V. Conclusions

The research presented was fairly successful, but may need some
changes before being presented to a panel of professionals. The explanatory
power of .663 proves to be moderately strong therefore validity may be
found from this research. The greatest effect on obesity from this research
proved to be the diabetes rate. This may warrant further investigation as
there may be a question of causality. Is it diabetes that increases obesity, or
does obesity increase diabetes? This is an issue that may be of some interest
to healthcare professionals and they may need to do further research to
draw any definitive conclusions. The multicollinearity presented in the
correlation matrix may have biased the coefficients presented in Eqn. 2.
Therefore the interpretation of this sample regression line may not be very
accurate. The research may be improved by investigating other independent
variables that were not used in this research and not used in prior research
as outlined in Section II- Prior Research. This research can be utilized as a
starting point for healthcare professionals in further investigating the link
between diabetes and obesity. Also, government and public policy advocates
may have an interest in the link between the unemployment rate and the
resulting increase in obesity.