Sie sind auf Seite 1von 7

Advance Econometric Assignment

Submitted by:

Sharvari Parikh

PRN-17060242036

Assignment

The dataset contains the incomes of 16 persons along with the information on their ages, gender
and political party affiliation. You are supposed to build an econometrics model to explain the
income of a person with the help of the remaining variables. For this purpose, you are required to
answer the following questions:

1) Create a dummy variable for the gender of the persons.


2) Create dummy variable(s) for the political affiliations of the persons.
3) Set up a multiple regression model to explain the income with the help of age, gender and
their political affiliation. Interpret the regression coefficients.
4) Also check the model diagnostic with respect to Multicollinearity, heteroscedasticity etc.
5) Predict the income of 30 year old woman who is Democrat.
6) Produce the necessary R codes.
1) Create a dummy variable for the gender of the persons.

Answer:

In regression to represent the subgroups of the sample in a study, a numerical variable called
Dummy Variable is used. It is often used to distinguish different treatment groups. Each such dummy
variable will take a value of 0 or 1.

R code:
sharvari=read.csv(file.choose())

sharvari

install.packages("dplyr")

library(dplyr)

attach(sharvari)

Female.Dummy

=ifelse(Gender=="Female",1,0)

Female.Dummy

sharvari1=data.frame(sharvari,Female.Dummy)

sharvari1

Dummy variable for female –

Age Party Gender Income Female.Dummy


1 20 Rep Male 45000 0
2 25 Dem Male 39000 0
3 45 Ind Male 56000 0
4 35 Rep Female 49000 1
5 50 Dem Female 41000 1
6 55 Ind Female 42000 1
7 39 Rep Male 58000 0
8 48 Dem Male 55000 0
9 30 Ind Male 46000 0
10 27 Rep Female 42000 1
11 47 Dem Female 37000 1
12 21 Ind Female 25000 1
13 48 Rep Male 75000 0
14 24 Ind Male 43000 0
15 28 Ind Female 40000 1
16 40 Dem Female 31000 1
Here Female.Dummy is a dummy for female category i.e Female.Dummy is 1 when person is
female otherwise 0. Here Male is benchmark category.

2) Create dummy variable(s) for the political affiliations of the persons.

Answer:

R code :

Dem.dummy=ifelse(Party=="Dem",1,0)

Dem.dummy

Ind.dummy=ifelse(Party=="Ind",1,0)

Ind.dummy

sharvari2=data.frame(sharvari1,Ind.dummy,Dem.dummy)

sharvari2

Dummy variable for political parties

Age Party Gender Income Female.Dummy Ind.dummy Dem.dummy


1 20 Rep Male 45000 0 0 0
2 25 Dem Male 39000 0 0 1
3 45 Ind Male 56000 0 1 0
4 35 Rep Female 49000 1 0 0
5 50 Dem Female 41000 1 0 1
6 55 Ind Female 42000 1 1 0
7 39 Rep Male 58000 0 0 0
8 48 Dem Male 55000 0 0 1
9 30 Ind Male 46000 0 1 0
10 27 Rep Female 42000 1 0 0
11 47 Dem Female 37000 1 0 1
12 21 Ind Female 25000 1 1 0
13 48 Rep Male 75000 0 0 0
14 24 Ind Male 43000 0 1 0
15 28 Ind Female 40000 1 1 0
16 40 Dem Female 31000 1 0 1

Here Ind.dummy is a dummy variable for Independent. Dem.dummy is a dummy variable for
democrat.

Ind.dummy is 1 when person is independent otherwise 0 similarly Dem.dummy is 1 when


person is democrat otherwise 0. Here Rep (i.e. republic) is a benchmark category.

3) Set up a multiple regression model to explain the income with the help of age, gender
and their political affiliation. Interpret the regression coefficients.

Answer:

R code :

attach(sharvari2)

y=Income

x=Age

x1=Female.Dummy

x2=Dem.dummy

x3=Ind.dummy

sh=lm(y~x+x1+x2+x3)

summary(sh)
Multi-regression to explain the income on the basis of age, gender and political affiliation.

𝑦 = 𝛼 + 𝛽0 ∗ 𝑥0 + 𝛽1 ∗ 𝑥1 + 𝛽2 ∗ 𝑥2 + 𝛽3 ∗ 𝑥3 + 𝜇

Here y = income, α= intercept, 𝛽0 = co − efficient of variable age , 𝛽1 =


coefficient of dummy variable for gender i. e. female , 𝛽2 = co −
efficient of dummy variable for democrat , 𝛽3 = co −
efficient of dummy variable independent

𝑥0 = 𝑎𝑔𝑒,
𝑥1 = 𝐹𝑒𝑚𝑎𝑙𝑒. 𝐷𝑢𝑚𝑚𝑦 𝑖. 𝑒. 𝑑𝑢𝑚𝑚𝑦 𝑣𝑎𝑟𝑖𝑏𝑙𝑒 𝑓𝑜𝑟 𝑓𝑒𝑚𝑎𝑙𝑒, 𝑥2 =
𝐷𝑒𝑚. 𝑑𝑢𝑚𝑚𝑦 𝑖. 𝑒. 𝑑𝑢𝑚𝑚𝑦 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 𝑓𝑜𝑟 𝑑𝑒𝑚𝑜𝑐𝑟𝑎𝑡 , 𝑥3 =
𝐼𝑛𝑑. 𝑑𝑢𝑚𝑚𝑦 𝑖. 𝑒. 𝑑𝑢𝑚𝑚𝑦 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 𝑓𝑜𝑟 𝐼𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡, 𝜇 = 𝑒𝑟𝑟𝑜𝑟 𝑡𝑒𝑟𝑚

Call:

lm(formula = y ~ x + x1 + x2 + x3)

Residuals:

Min 1Q Median 3Q Max

-6403.4 -2318.9 244.2 1259.5 8488.2

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 38125.3 4353.6 8.757 2.74e-06 ***


x 625.6 112.0 5.585 0.000164 ***

x1 -13677.5 2384.8 -5.735 0.000131 ***

x2 -15594.5 3127.2 -4.987 0.000411 ***

x3 -10453.1 2848.8 -3.669 0.003694 **

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4688 on 11 degrees of freedom

Multiple R-squared: 0.8829, Adjusted R-squared: 0.8403

F-statistic: 20.74 on 4 and 11 DF, P-value: 4.408e-05

There is positive impact of age on Income as β is positive. The mean income of democrats are
less than republic by 15594.5 (Republic variable is our benchmark variable) while the mean
income of Independent are less than mean income of Republic by 10453.1. The mean income
for female is less than male by 13677.5.

4) Also check the model diagnostic with respect to Multicollinearity, heteroscedasticity


etc.

Answer:

Test for multicollinearity and heteroscedasticity

install.packages("lmtest")

install.packages("car")

library(lmtest)

library(car)

vif(sh)

bptest(sh)
To check multicollinearity, we use VIF i.e. variance influence factor. If VIF is greater than 4
then we have to go for another test and if it is greater than 10 then there is high
multicollinearity.

x x1 x2 x3
1.140509 1.035011 1.529495 1.384677

Here all figures are less than 4. So we shouldn’t be worried about multicollinearity.

For heteroscedasticity, we generally use BP test i.e. Breusch-Pagan test. The null hypothesis
of this test is there is a homoscedasticity.

studentized Breusch-Pagan test

Data: sh
BP = 3.2371, DF= 4, p-value = 0.519

5) Predict the income of 30 year old woman who is Democrat.

Answer:

𝐼𝑛𝑐𝑜𝑚𝑒 = 38125.3 + 625.6 ∗ 30 − 13677.5 ∗ 1 − 15594.5 ∗ 1

Therefore the income of the given person is 27621.3

Das könnte Ihnen auch gefallen