Sie sind auf Seite 1von 11

Mini Project 2

Name: Sruti Paku


UTD ID: 2021204550

(a) Use R to make three maps of the states in the USA. The first map should plot state level
income share of the top 1% of income earners in 2012. The second map should plot the
same variable but for 1999. The third map should plot the 2012 -1999 difference of the
variable.

Resulting maps:
Observations:

In the map of 1999 we can see that New York, Connecticut and Washington were in the
top of the top 1% of income share, thus being the darkest in the map.
In the map of 2012 we observe that New York and Connecticut tops the ranking of top
1% income share however Washington has dropped down many ranks.
The Difference of 2012 and 1999 income share shows that North Dakota has progressed
the highest i.e the income in 2012 has increased a lot when compared to income 1999.

(b) Examine the distribution of the HPI variable graphically. What would be
appropriate measures of center and spread of this distribution CCC (mean, SD) or
(median, IQR). Justify your answers.

Happy Planet Index: The Happy Planet Index (HPI) is an index of human well-being and
environmental impact that was introduced by the New Economics Foundation (NEF) in July 2006. .
The index is weighted to give progressively higher scores to nations with lower ecological footprints.

HPI is calculated in two ways

1. The average of subjective life satisfaction, life expectancy at birth and ecological
footprint per capita.

2. The product of life satisfaction and life expectancy divided by the ecological foot print.

Fig: Box Plot of Happy Planet Index


Fig: Histogram of Happy Planet Index

Fig: Normal Q-Q plot of HPI


Fig: Normal Q-Q plot of HPI

I considered Mean and Standard deviation as the appropriate measures of the distribution The box plot
and histogram shows that the distribution is symmetric and approximately normal. The median coincides
with the mean. The qqline exactly coincides with the density function which shows that the distribution
is normal.

c) Make scatterplots of HPI against each of the three variables on which the index is
best. Comment on what you see. Will it be appropriate to use correlation to
summarize the relationship of HPI with the other three variables? If yes, provide

The Life expectancy, wellbeing and ecological foot prints are the three metrics of the Happy Planet
Index. The scatter plot shows how one variable is effected by other variable. The relationship between
two variables is called their correlation.

The following are the scatter plots:


1. The correlation coefficient of the life expectancy and HPI is 0.5111565 which means is they are
strongly correlated.

2. The correlation coefficient of the wellbeing and HPI is 0.4510568 which means is they are
moderately correlated

3. The correlation coefficient of the ecological foot print and HPI is -0.2380059 which means is
they are weekly correlated

Appendix:

RCode for Question 1:


# Loading of Libraries:

library(raster) # to get map shape file


library(ggplot2) # for plotting and miscellaneuous things
library(ggmap) # for plotting
library(plyr) # for merging datasets
library(scales) # to get nice looking legends
library(maps)

#Defining Get_Map_Data function to the USA map shape and state details:
get_map_data <- function(){
+ #Get map Data
+ usa.df <- map_data("state")
+
+ # str(usa.df)
+
+ #Rename coulmn region as state
+ colnames(usa.df) [5] <- "state"
+ usa.df$state <- as.factor(usa.df$state)
+ usa.df$group <- as.numeric(usa.df$group)
+ return (usa.df)
+}

#Defining Get_File_Data Function to get the input file containing the Top1% income share
data:

> get_file_data <- function(){


+ #Read data from csv file
+ usa.dat <- read.csv(file.choose(), header = T,fill = TRUE)[ ,c('Year','id','state','Top1_adj')]
+ return (usa.dat)
+}

#Defining Plot_Income function to plot the income share of US states on the map:

plot_income <- function(usa.df,usa.dat,year,year1){


+
+ if(missing(year1)){
+ disp <- paste("Top 1% of income earners in ", year, sep=" ")
+ file_name <- paste(year,"Income map.pdf",sep = "_")
+
+ #Remove the redundant data by taking subset of data
+ usa.dats <- subset(usa.dat,Year==year)
+ usa.dats$state <- tolower(usa.dats$state)
+ colnames(usa.dats)[colnames(usa.dats)=="Top1_adj"] <- "plot"
+ # str(usa.dats)
+ # levels(usa.dat$state)
+ }
+
+ else{
+ year_name <- paste(year,year1,sep = "-")
+ disp <- paste("Top 1% of income earners in ", year_name, sep=" ")
+ year_name <- paste(year,year1,sep = "_")
+ file_name <- paste(year_name,"Income map.pdf",sep = "_")
+
+ #Remove the redundant data by taking subset of data
+ usa.year <- subset(usa.dat,Year==year)
+ usa.year1 <- subset(usa.dat,Year==year1)
+ plot_data <- usa.year$Top1_adj - usa.year1$Top1_adj
+ str(plot_data)
+ usa.year$plot <- plot_data
+ usa.dats <- usa.year
+ usa.dats$state <- tolower(usa.dats$state)
+ # return(usa.dats)
+ # str(usa.dats)
+ # levels(usa.dat$state)
+ }
+ #Join the file data with map data
+ usa.df <- join(usa.df, usa.dats, by = "state", type = "inner")
+ # str(usa.df)
+
+ #Find the range and assign break points accordingly

+ diff = range(usa.df$plot)
+ brk <- c(0,3,6,9,12,15,18,21,24,27,30,33,36)
+
+ #Adding states labels

+ states.df <- data.frame(state.center,state.abb)

+ #Plot the file data on map with color varying according to the data
+ p <- ggplot() +
+ geom_polygon(data = usa.df, aes(x = long, y = lat, group = group, fill = plot),
+ color = "black", size = 0.25) +
geom_text(aes(x=x,y=y,label=state.abb),data=states.df,size=2) +
+ scale_fill_distiller(palette = "Blues", breaks = brk) + theme_nothing(legend = TRUE) +
+ labs(title = disp, fill = "")
+
+ #Display map with file data
+ plot(p)
+
+
+ #save the resulting map into a pdf file
+ ggsave(p, file = file_name)
+ # return (usa.dats)
+
+}

#Calling of Defined Functions:


#Get map data and file data
Map_Data = get_map_data()

File_Data = get_file_data()
#Plotting the map for 2012, 1999 and 2012- 1999

x = plot_income(Map_Data,File_Data,2012)
x = plot_income(Map_Data,File_Data,1999)
x = plot_income(Map_Data,File_Data,2012,1999)

R Code for Question 2:


# Passing of input file data into data1
data1 <- read.table(file.choose(), sep=,, header = TRUE)

#Loading data1 into memory for direct access


attach(data1)

#Scatter plots for life expectancy and Happy Planet Index

> plot(Life.Expectancy, Happy.Planet.Index)


#Calculated Correlation Coefficient for Life Expectancy and HPI

> cor(Life.Expectancy,Happy.Planet.Index)
[1] 0.5111565

#Created Regression line for the Scatter plot


> abline(Life.Expectancy~Happy.Planet.Index)

#Scatter Plot of Well Being


> plot(Well.being.0.10., Happy.Planet.Index)

#Correlation Coefficient of WellBeing


> cor(Well.being.0.10.,Happy.Planet.Index)
[1] 0.4510568

#Created Regression line for the Scatter plot

> abline(Well.being.0.10.~Happy.Planet.Index)

#Scatter Plot of Ecological Foot Print and HPI


> plot(Footprint.gha.capita., Happy.Planet.Index)

#Created Correlation Coefficient


> cor(Footprint.gha.capita.,Happy.Planet.Index)
[1] -0.2380059

#Created Regression line for the Scatter plot


> abline(Footprint.gha.capita.~Happy.Planet.Index)

#Histogram of HPI
hist(Happy.Planet.Index ,prob = TRUE, main="Happy Planet Index")
x <- Happy.Planet.Index
lines(density(x))

#Box plot of HPI


boxplot(Happy.Planet.Index, main="Happy Planet Index")

#QQNorm of HPI
qqnorm(Happy.Planet.Index,main= "Happy planet Index")
qqline(add = TRUE)

Das könnte Ihnen auch gefallen