Sie sind auf Seite 1von 21

A Statistical Model to

Explain Potential Causes


of Obesity in the U.S.

By

Amber Oldfield
St. John’s University- MBA
New York, NY
Amber.oldfield08@my.stjohns.edu
(570) 407-0224

Submitted on May 9, 2009


2

I. Introduction

The statistical research presented is used to help discover potential


causes of obesity throughout the United States. The National Institute of
Health (NIH) defines obesity as body mass index (BMI) greater than 30. The
study will be a cross-sectional analysis evaluating each of the 50 states of
the U.S. and potential contributing factors to obesity. All of the information
obtained is from year end 2007. Six factors will be evaluated to discover if
they do indeed contribute to a growing obesity rate. These independent
variables that will be evaluated are per capita income, unemployment rate,
percent of graduates from High School (25 years and older), diabetes rate,
population density, and percentage of uninsured individuals. After evaluating
the effect of each of these variables on the obesity rate, it will be clear the
degree to which they actually affect the obesity rate in the U.S., if indeed
they have any affect at all. This research is relevant and will prove valuable
to doctors, dieticians, trainers, and health care insurers. The research may
also prove to be valuable to those that are currently obese and are trying to
determine what factors are contributing to their condition. The results of this
research can help all of these individuals understand obesity to a greater
degree and may change the action they take in trying to alleviate the
condition.

II. Prior Research

Prior research has been conducted on the study of potential causes of


obesity. Some of this research has proven to more successful than others in
determining what may be contributing to a growing obesity rate. Below is a
list of this prior research detailing the independent variables used, with the
corresponding functional specifications, and the resulting coefficient of
determination (R2).

- Analysis of Obesity Across the U.S.:

+ -

Obesity Rate= f (Unemployment Rate, Income)


3

R2 = .363

- Analysis of Obesity Across the U.S.:

+ + -

Obesity Rate=f (# fast food restaurants, commute time, % Bachelor


degrees)

R2= .694

- Analysis of Obesity Across the U.S.:

+ +

Obesity Rate=f (per capita income, unemployment rate)

R2= .431

- Analysis of Obesity Across the U.S.:

- - -

Obesity Rate=f (% Bachelor Degrees, Age, Income)

R2= .575

By evaluating this prior research it is possible to build on what has


already been done or to attempt new independent variable combinations
in the hopes of increasing the coefficient of determination (R2).

III. Methodology

As previously mentioned the research is a cross-section analysis


evaluating six independent variables that may contribute to obesity. The
hypothesis stated concludes that the connection between the obesity rate
and per capita income will be negative; this means that as per capita income
increases, the obesity rate will decrease. The assumption between the
obesity rate with the unemployment rate, diabetes rate, and percent
uninsured will be positive. This means that as these independent variables
4

increase it will be assumed that the obesity rate will increase as well. As in
the case of the obesity rate and per capita income; the percent of High
School graduates (over age 25), and population density will have a negative
effect on the obesity rate. The data for this research was obtained from
statemaster.com, U.S. Department of Commerce, the Bureau of Labor
Statistics, the Center for disease Control (CDC), and the U.S. Census Bureau.
A more detailed description of these sources can be found in the appendix of
this report. All of the data analysis was performed using SPSS. The
techniques that will be used in this research are Graphical presentations-
scatterplots and histograms, Descriptive Statistics, Correlation and
Regression Analysis.

The functional specification for this research is as follows:

Eqn. 1

- + - +
- +

Obesity Rate= f (Per Cap. Income, Unemployment %, % Grads HS, Diabetes Rate, Population
Density, % Uninsured)

IV. Results

Figure 1- Histogram of Obesity Rate

Figure 1, below, shows a histogram of the dependent variable, the Obesity


Rate. The histogram appears to be approximately normally distributed with a
slight skewness to the left.
5

Histogram of Obesity Rate across the U.S.

14

12

10
Frequency

2
Mean =25.656
Std. Dev. =2.8188
0
N =50
18.0 20.0 22.0 24.0 26.0 28.0 30.0 32.0

ObesityRate

Table 1- Descriptive Statistics

Table 1, below, confirms what was shown in the histogram that the
dependent variable, Obesity Rate, is skewed to left with a skewness equal to
-.194. Also, the kurtosis for the population density shows that the data is
leptokurtic, meaning that the data for population density if thin in the mid-
region but is greater in the tail regions – where there is high and low
population density.
6

Mean StdDev Variance Skewness Kurtosis

Obesity Rate 25.66 2.82 7.95 -.194 -.243

Per Capita 35328.6 5155.68 26581060. .898 .674


Income 6 00

Unemployment 4.39 1.10 1.215 1.027 1.44


Rate

% Grad from HS 85.28 3.89 15.13 -.430 -.985


(25 years and
older)

Diabetes Rate 6.84 1.25 1.56 .452 1.452

Population 181.90 250.15 62577.43 2.44 5.89


Density

Percent 14.20 3.99 15.98 .491 -.277


Uninsured

Table 2- Correlation Matrix

Table 2, below, shows the correlation between the six independent


variables in relation to the dependent variable, the Obesity Rate. All of the
correlations presented agree with Eqn.1- the functional specification. The per
capita income has a moderately strong negative correlation with the obesity
rate at -.542. The unemployment rate has a moderately weak positive
correlation with the obesity rate at .413. The percent of graduates from HS
(25 years and older) has a moderately strong negative correlation at -.513.
The diabetes rate has the strongest correlation with the obesity rate of all
the independent variables evaluated at .685. The population density has a
weak negative correlation with the obesity rate at -.321. The percent
uninsured also has a rather weak correlation with the obesity rate, but is
positive, at .237. It is important to note that high multi-collinearity does exist
in a few places in the correlation matrix as indicated with an asterisk (*).
Multi-collinearity is when there is high correlation between the independent
variables. This may result in biased coefficients in the estimated sample
regression line equation.
7

Obesit Per Unemploy % Diabet Populati Percent


y Rate Capita ment Rate Grads es Rate on Uninsur
Incom from Density ed
e HS
(25
years
and
older)

Obesity Rate 1 -.542 .413 -.513 .685 -.321 .237

Per Capita -.542 1 -.136 .396 -.385 .661 * -.293


Income

Unemployment .413 -.136 1 -.342 .260 .083 .190


Rate

% Grads from -.513 .396 -.342 1 -.721 * .002 -.568 *


HS (25 years
and older)

Diabetes Rate .685 -.385 .260 -.721 1 .040 .243


*

Population -.321 .661 * .083 .002 .040 1 -.262


Density

Percent .237 -.293 .190 -.568 .243 -.262 1


Uninsured *

Figure 2- Scatterplot of Obesity Rate v. Per Capita Income

Figure 2, below, presents a scatterplot of the obesity rate v. per capita


income. The scatterplot appears to possess a moderately strong, negative,
linear relationship.
8

Scatterplot of Obesity Rate v. Per Capita Income, r = -.542

32.0

30.0

28.0
ObesityRate

26.0

24.0

22.0

20.0

18.0

30000 35000 40000 45000 50000

PerCapitaIncome

Figure 3- Scatterplot of Obesity Rate v. Unemployment Rate

Figure 3, below, presents the scatterplot of the obesity rate v. the


unemployment rate. The scatterplot appears to possess a moderately weak,
positive, linear relationship.
9

Scatterplot of Obesity Rate v. Unemployment Rate, r = .413

32.0

30.0

28.0
ObesityRate

26.0

24.0

22.0

20.0

18.0

2.0 3.0 4.0 5.0 6.0 7.0 8.0

UnemploymentRate

Figure 4- Scatterplot of Obesity Rate v. % High School Grads (25


years and older)

Figure 4, below, presents the scatterplot of the obesity rate v. percent of


graduates from High School (25 years and older). The scatterplot appears to
possess a moderately strong, negative, linear relationship.
10

Scatterplot of Obesity Rate v. High School Grads (25 yrs. and older), r = -.513

32.0

30.0
ObesityRate

28.0

26.0

24.0

22.0

20.0

18.0

78.0 81.0 84.0 87.0 90.0


GradfromHS25yearsandolder

Figure 5- Scatterplot of Obesity Rate v. Diabetes Rate

Figure 5, below, presents the scatterplot of the obesity rate v. the


diabetes rate. The scatterplot appears to possess a moderately strong,
positive, linear relationship.
11

Scatterplot of Obesity Rate v. Diabetes Rate, r = .685

32.0

30.0

28.0
ObesityRate

26.0

24.0

22.0

20.0

18.0

4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0

DiabetesRate

Figure 6- Scatterplot of Obesity Rate v. Population Density

Figure 6, below, presents the scatterplot of the obesity rate v. population


density. The scatterplot appears to possess a weak, negative, linear
relationship.
12

Scatterplot of Obesity Rate v. Population Density, r = -.321

32.0

30.0
ObesityRate

28.0

26.0

24.0

22.0

20.0

18.0

0.0 200.0 400.0 600.0 800.0 1000.0 1200.0


PopulationDensity

Figure 7- Scatterplot of Obesity Rate v. Percent of Uninsured

Figure 7, below, presents the scatterplot of the obesity rate v. percent


uninsured. The scatterplot appears to possess a weak, positive, linear
relationship.
13

Scatterplot of Obesity Rate v. Percentage Uninsured, r = .237

32.0

30.0
ObesityRate

28.0

26.0

24.0

22.0

20.0

18.0

6.0 9.0 12.0 15.0 18.0 21.0 24.0


PercentageUninsured

Table 3, below, shows the regression analysis for the research. The
independent variables were entered stepwise with the probability to enter
set at .200 and the probability to remove set at .250. After entering stepwise,
the resulting independent variables that remained were the diabetes rate,
population density and the unemployment rate. Therefore the variables that
were removed were per capita income, % grads from HS (25 years and
older), and the percent uninsured. The resulting R Square is moderately
strong at .663.

Table 3- Regression Results

Eqn. 2 Y= 13.59+ 1.413*Diabetes Rate - .004*Population Density +


.719*Unemployment Rate

t-stat (9.14) (7.07)** (-4.302)**


(3.17)**
14

p-value (.000) (.000) (.000)


(.003)

r (.626) (-.369)
(.281)

n= 50 R-Sq. = .663 F= 30.23 F-Prob. = .000


SE= 1.688

**- Significant at 1% level of Significance

From the regression results the R-Sq., which is the coefficient of


determination is equal to .663. This means that 66.3% of the variation in the
obesity rate can be explained by or attributed to variation in the diabetes
rate, population density and the unemployment rate.

T-Statistics

The research for each independent variable will be tested for


significance will the following null and alternative hypothesis: (Results are
evident in the table above.)

Ho= B = 0

Ha= B > 0 or B< 0, based on functional specification

The alternative was accepted for each of the independent variables as the
(p-value/2) is equal to approximately .00 for each. These independent
variables are significant at the 1% level of significance.

The evaluation of the equation would be:

For each percentage increase in the diabetes rate the obesity rate would
increase by 1.413, on average with all things equal.

For each increase in population density (population per sq. mile) the obesity
rate would decrease by .004, on average with all things equal.

For each percentage increase in the unemployment rate the obesity rate
would increase by .719, on average with all things equal.
15

F- Statistic

The research appears to be statistically significant at the 1% level


given that the F- statistic is equal to 30.23 and the significance is equal to
.000. Where:

Ho= B =B
Diabetes Rate Population Density =B Unemployment Rate =0

Ha = at least one B is not equal to zero.

The alternative would be accepted that at least one B is not equal to


zero, given that the F significance is equal to .000.

Figure 8- Histogram of Residuals

Figure 8, below, presents the histogram of the residuals. The


histogram is appears to be approximately normally distributed.

Histogram of Residuals

10

8
Frequency

Mean =6.3976602E-15
Std. Dev. =1.63519572
0
N =50
-2.50000 0.00000 2.50000

RES_1

Figure 9- Scatterplot of Actual and Predicted Values


16

Figure 9, below, presents the scatterplot of the dependent variable,


Obesity Rate, and the predicted value. The figure appears to be positive,
linear and possesses no outliers.

Scatterplot of Actual v. Predicted

32.0

30.0

28.0
ObesityRate

26.0

24.0

22.0

20.0

18.0

20.00000 22.00000 24.00000 26.00000 28.00000 30.00000 32.00000

PRE_1

Figure 10- Scatterplot of Residuals v. Per Capita Income

Figure 10, below, presents the scatterplot of the residuals v. per capita
income. Correlation exists as there appears to be linear relationship with no
visible curves.
17

Scatterplot of Residuals v. Per Capita Income

2.50000
RES_1

0.00000

-2.50000

30000 35000 40000 45000 50000

PerCapitaIncome

Figure 11- Scatterplot of Residuals v. Unemployment Rate

Figure 11, below, presents the scatterplot of the residuals v. the


unemployment rate. There appears to be linear relationship with a “cluster”
of points. There also appears to be one possible outlier.
18

Scatterplot of Residuals v. Unemployment Rate

2.50000
RES_1

0.00000

-2.50000

2.0 3.0 4.0 5.0 6.0 7.0 8.0


UnemploymentRate

Figure 12- Scatterplot of Residuals v. Percent Grads from HS (25


years and older)

Figure 12, below, presents the scatterplot of the residuals v. % HS


grads (25 years and older). There appears to be a linear relationship with no
visible curves.
19

Scatterplot of Residuals v. Percent of High School Grads (25 years and older)

2.50000
RES_1

0.00000

-2.50000

78.0 81.0 84.0 87.0 90.0


GradfromHS25yearsandolder

Figure 13- Scatterplot of Residuals v. Diabetes Rate

Figure 13, below, presents the scatterplot of the residuals v. diabetes


rate. There appears to be a linear relationship with no curves and two
potential outliers.

Scatterplot of Residuals v. Diabetes Rate

2.50000
RES_1

0.00000

-2.50000

4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0

DiabetesRate
20

Figure 14- Scatterplot of Residuals v. Population Density

Figure 14, below, presents the scatterplot of the residuals v. population


density. There appears to be a discontinuous, random, linear relationship
with a few potential outliers.

Scatterplot of Residuals v. Population Density

2.50000
RES_1

0.00000

-2.50000

0.0 200.0 400.0 600.0 800.0 1000.0 1200.0


PopulationDensity

Figure 15- Scatterplot of Residuals v. Percent Uninsured

Figure 15, below, presents the scatterplot of the residuals v. the


percent uninsured. There appears to be a random, linear relationship.
21

Scatterplot of Residuals v. Percentage Uninsured

2.50000
RES_1

0.00000

-2.50000

6.0 9.0 12.0 15.0 18.0 21.0 24.0

PercentageUninsured

V. Conclusions

The research presented was fairly successful, but may need some
changes before being presented to a panel of professionals. The explanatory
power of .663 proves to be moderately strong therefore validity may be
found from this research. The greatest effect on obesity from this research
proved to be the diabetes rate. This may warrant further investigation as
there may be a question of causality. Is it diabetes that increases obesity, or
does obesity increase diabetes? This is an issue that may be of some interest
to healthcare professionals and they may need to do further research to
draw any definitive conclusions. The multicollinearity presented in the
correlation matrix may have biased the coefficients presented in Eqn. 2.
Therefore the interpretation of this sample regression line may not be very
accurate. The research may be improved by investigating other independent
variables that were not used in this research and not used in prior research
as outlined in Section II- Prior Research. This research can be utilized as a
starting point for healthcare professionals in further investigating the link
between diabetes and obesity. Also, government and public policy advocates
may have an interest in the link between the unemployment rate and the
resulting increase in obesity.