Sie sind auf Seite 1von 18

Wroblewski JM 20346484

Finding the most significant determinants of income inequality By Joanne Wroblewski

Page |2

Table of Contents
Table of Contents...................................................................................................2 Introduction:.......................................................................................................... 2 1.1 Problem Statement:.....................................................................................2 1.2 Background: ............................................................................................... 3 1.3 Motivation:...................................................................................................3 1.4 Description of objectives:............................................................................3 1.5 Description of method of study:..................................................................4 2.1 Descriptive statistics:...................................................................................4 2.3 Transforming the Data:.................................................................................6 2.4 Testing for most significant variables:..........................................................8 2.4.1 Multicollinearity:.....................................................................................8 2.4.2 Model 1...................................................................................................8 2.4.3 Model 2:..................................................................................................9 2.4.4 Final Model:............................................................................................9 Results:................................................................................................................ 10 Conclusion:.......................................................................................................... 10 Recommendations:..............................................................................................12 Appendix.............................................................................................................. 13

Introduction: 1.1 Problem Statement:

Page |3

The objective of this study is to examine a set of variables, their influence on the GINI coefficient and determine the most significant determinants of income inequality in a cross section of 40 countries. The variables include: Gross national product per capita, growth in the GDP, Population growth rate, % of the population that lives in urban areas, % the population that can read and write, secondary school enrolment as a % of the school age population, Agriculture as a % of GDP and whether it is a socialist country. 1.2 Background: Income Inequality plays a vital role in economic growth of a country. Very high income inequality can negatively influences the political stability of a country, level of poverty and quality of life. Yet if it is too low it can cause low economic development due to the fact that it lowers the incentives that people need to participate in economic activities as it is evident in many socialist countries. (Easterly, 2002) Income Inequality can be expressed with the GINI coefficient which ranges from 0 (perfect equality) 1 (perfect inequality) and is based on the Lorenzo curve. There are many determinants that influence income inequality and obtaining empirical evidence is difficult, since many factors influence the accuracy of the data. In this study the different determinants of income inequality in a cross section of 40 countries were examined to determine which have a more significant influence on income inequality. 1.3 Motivation: Income inequality plays a major role in the economy of a country and it is especially important in developing countries. The degree of inequality is a determinant of poverty. ( ) Determining the most significant determinants of income inequality will enable economists and policy makers to effectively create specific strategies to decrease income inequality in countries where it hinders further economic growth and development. 1.4 Description of objectives:

Page |4

The main objective of this study is to examine the different determinants of income inequality in a cross section of 40 countries and determine which have the most significant influence on income inequality. 1.5 Description of method of study: This will be done by gathering empirical evidence from other sources and estimating a regression model on the determinants by plotting them against the GINI coefficient and analysing the results. Various methods will be used to determine the best possible regression model.
1. The Model:

2.1 Descriptive statistics: Y (GINI coefficient) is calculated by using the Lorenzo curve and it can have a maximum value of 1 indicating perfect inequality or a value of 0 indicating perfect equality. In the original regression model the dependant variable had a minimum value of 0.18, maximum value of 0.66 and an average value of 0.43.

Agriculture equals an average of 24.15% of GDP with a maximum of 66% and a minimum of 4%. This indicates that there is a correlation between Agriculture and GDP. Higher % of agriculture in the GDP equals to a poorer less developed country where people living on farms often consume what they product.

Secondary school enrolment has an average of 27.82% of the school age population. Showing a maximum of 74% and a minimum of zero. The more educated people a country has, the less income inequality there should be, because the majority of the population will be able to get better jobs. Yet a measurement of skilled people versus unskilled people would have been a better indicator

Page |5

The literacy equals an average 66.55% of the population that can read and write. The more literate people a country has, the less income inequality there should be since literate people often have better jobs and a better understanding of finance and baking.

The average population growth rate equals 2.1%. The higher the population growth rate the higher the income inequality will be due to more births amongst the poorer communities of country where inequality is more evident.

The average population that lives in urban areas equals 42.57% with a minimum of 5% and a maximum of 78%. According to economic theory the influence of the urban population varies from country to country. In some people living in Urban areas are better off and income inequality is decreased. Yet other countries have the opposite effect. Many poor people move to urban areas in search of a better live but fail to find good jobs with their limited schooling and skills. This causes income inequality to rise.

GNP- Gross National Product per capita has an average value of 634.24, a minimum value of 66 and a maximum value of 3603. The higher GNP per capita, the lower the level of income inequality in a country. An increase in output decreases unemployment there for more people are receiving an income, which lowers the distribution gap. ( )

GDP, the growth in gross domestic product has an average of 5.11% with a minimum of 0.3% and a maximum of 10.5%. The higher the

Page |6

GDP, the higher the income inequality, because growth in GDP usually results in the richer getting richer and the poor getting poorer if income is distributed in-equally.

2.2 The following data resulted from the model: ORIGINAL


PREDICTED MODEL

(INCLUDING

ALL THE VARIABLES):

Y = 0 + X1 X2 - X3 - X4 X5 + X6 X7 + X8 X1 = Agriculture % of GDP X2 = % of population in secondary education enrolments X3 = Gross National Product per capita X4 = Growth in Gross Domestic products X5 = Literacy level X6 = Population growth X7= Socialist Country or not X8 = % of population living in urban areas The original model has many incorrect factors (table A). According to Economic theory many of the parameters have the wrong symbols, for example income inequality should increase as the agriculture % of GDP increases, but according to the EViews model income inequality would decrease when the agriculture % of GDP increase. Most of the parameters are statistically insignificant except for Agriculture and population. The R2 of the original model regression equals 0.64 this indicates that the model is 64% accurate. 2.3 Transforming the Data: The following variables were transformed by using the Log: URB, POP, GDP, GNP, AGR, and LIT. This unifies the data and therefore makes it

Page |7

easier to plot. The EViews results show a slightly better model when using the logged data.

Page |8

2.4 Testing for most significant variables:

2.4.1 MULTICOLLINEARITY:
The original model showed a possibility of multicollinearity due to the fact that we are working with cross sectional data and it was evident that R2 was high while the variables were statistically insignificant at both the 5% and 10% level. This was confirmed by doing a colleration Matrix test; the results obtained were evident of mulitcollinearity between EDU, AGR, LIT and GNP (figure C-i). The steps to correct multicollinearity include dropping a variable, transforming the data, estimating a different functional form and obtaining a different sample. Since the data was already transformed and obtaining a different sample is not possible the only option was to drop a variable. By plotting these variables against each other it was evident that GNP was the most significant variable. Various Scatter graphs confirmed that EDU, AGR and LIT were all correlated with GNP (figure C-iii). It was decided to drop these variables from the regression model.

2.4.2 MODEL 1
A new regression model was created with the remaining variables, the result were as follows: URB, POP and GNP were all statistically significant, but GDP and SOC were statistically insignificant. The R2 of the model was 0.61, which indicates that the model is 61% accurate. There was a possibility of Multicollinearity between SOC and GDP but no such evidence was found. According to PIET Socialist Countries have very low levels of income inequality due to the fact that instead of income people receive social grant and production is divided equally. Although this reduces income inequality, it often dampens economic growth as a result of

Page |9

non active participation in economic activities due to lack of incentives (Easterly, 2002). On the contrary in non-socialist countries income inequality is higher for various reasons. The Government attempts to redistribute the income by taxing the rich and giving grants to the poor. This lowers the inequality and also the growth rate of the economy. Therefore it is insignificant whether a country follows a Socialistic structure or not. Since Governments of non-socialist countries follow redistribution policies which lower income inequality in much the same way that a socialist country deliberately creates low income inequality. For this reason SOC will be dropped from the regression model.

2.4.3 MODEL 2:
Regression model 2 was created with the remaining variables, the result were as follows: URB, POP and GNP were all statistically significant, but GDP was still statistically insignificant. The R2 of the model was 0.60, which indicates that the model is 60% accurate. In Ravallion and Chens data set measures of inequality show no tendency to get either better or worse with economic growth. Therefore the variable GDP should have no influence on Income Inequality. According to the regression model 2 it is the only variable that is still statistically insignificant, while no multicollinearity, autocorrelation or hetroscedasticity is present. Therefore is shall be dropped from the model, since the effect of excluding it is minimal on the model in general.

2.4.4 FINAL MODEL:


After the GDP was dropped from the regression model and a new model was created. All the variables were statistically significant, the symbols were as anticipated and R2 was 0.59. This model will be analysed and interpreted.

P a g e | 10

Results: THE FINAL MODEL


Y = 0.475 + 0.100X1 + 0.075 X2 0.061X3 + Ut X1 = Population growth X2 = Percentage of population living in urban areas X3 = GNP per capita

Ut = Error term The R2 is 0.59 indicating a 59% that the effect of the variables has 59% accuracy on income inequality which is relatively good for a model of this type. Adjusted R2 equals 0.56. X1 has a positive symbol and is statistically significant which means that when population growth rate rises, inequality rises as well. X2 has a positive symbol and is statistically significant which indicates that a growth in the population living in urban areas will increase income inequality. X3 shows a negative symbol statistically significant indicating that a rise in the GNP per capita will decrease the income inequality. The error term is included in the model due to the fact that so many variables where dropped from the model, resulting in a possible under-fitting model.

Conclusion:

P a g e | 11

It was attempt to test and analyse the data and determine the most significant determinants of inequality by testing a cross section data of 40 countries. According to economic theory Most of the variables should have an influence on income inequality, yet it was evident that even after the data had been transformed, most of the variables were statistically insignificant. Due to the fact that we worked with cross sectional data, the possibility of multicollinearity was high. This was proven by doing a correlation matrix on the data which confirmed a relationship between GNP, EDU, AGR and LIT with GNP being correlated with EDU, AGR and LIT. Economic theory concluded that Literacy level and Education do have a relationship, since more people are literate if they receive and education. Agriculture also plays a role considering that children on farms often live too far from a School to attend and usually end up working on the farm. SOC and GDP were also dropped from the Model, since they both were statistically insignificant. Although tests could not detect multicollinearity, empirical evidence stated that growth in GDP is an insignificant variable since it does not influence the inequality of a given country (Ravallion; Chen, 2000). SOC also showed no influence on the model and was therefore also dropped. The final Model consists of URB, POP and GNP. Although excluding so many variables from the model does cause an underfitting model, it is more significant to the study to drop these variables to obtain a more statistically correct model. The model shows the relationship between URB, POP and GNP plotted against Y (the GINI coefficient). All the determinants are statistically significant and R2 is relatively good. This shows that URB, POP and GNP all play a big role in determining income inequality of a country. Economists and Policymakers can now focus exclusively on these variables to decrease income inequality that hinders economic growth.

P a g e | 12

Recommendations: Income inequality plays a major role in the growth and development of countries, especially 3rd world countries. There are many variables which have an influence on income inequality but when using them all together the results may not be as anticipated because of the relationship they have with each other. The most effective way to decrease income inequality would be to group variables according to the relationships they have with each other and then establish policies which incorporate these variables so that they work together to decrease income inequality.

P a g e | 13

Appendix
A) Original model with all the determinants
Dependent Variable: Y Method: Least Squares Date: 10/22/08 Time: 16:20 Sample: 1 40 Included observations: 40 Variable Coefficient C 0.522465 AGR -0.003082 POP 0.050917 URB -0.000596 LIT -0.000101 EDU -0.001139 GNP -4.19E-05 GDP -0.003203 SOC -0.080330 R-squared 0.646861 Adjusted R-squared 0.555728 S.E. of regression 0.077003 Sum squared resid 0.183813 Log likelihood 50.89682 Durbin-Watson stat 1.433537

Std. Error t-Statistic 0.099275 5.262809 0.001390 -2.217052 0.019751 2.577912 0.001058 -0.563232 0.000953 -0.106362 0.001296 -0.878876 2.74E-05 -1.531092 0.007722 -0.414838 0.042678 -1.882231 Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic)

Prob. 0.0000 0.0341 0.0149 0.5773 0.9160 0.3862 0.1359 0.6811 0.0692 0.438500 0.115527 -2.094841 -1.714843 7.098004 0.000026

B) Logged Data
Dependent Variable: Y Method: Least Squares Date: 11/02/08 Time: 20:13 Sample: 1 40 Included observations: 40 Variable Coefficient LGDP -0.010089 LGNP -0.046017 LLIT 0.041454 LPOP 0.076432 LURB 0.054742 EDU -0.001787 SOC -0.070545 C 0.380538 R-squared 0.660007 Adjusted R-squared 0.585633 S.E. of regression 0.074366 Sum squared resid 0.176970 Log likelihood 51.65555 Durbin-Watson stat 1.455752

Std. Error t-Statistic 0.023183 -0.435195 0.027662 -1.663553 0.031717 1.306983 0.028109 2.719134 0.034351 1.593617 0.001130 -1.581940 0.041317 -1.707426 0.108444 3.509083 Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic)

Prob. 0.6663 0.1060 0.2005 0.0105 0.1209 0.1235 0.0974 0.0014 0.438500 0.115527 -2.182778 -1.845002 8.874212 0.000005

Not significant to model, possibility of multicollinearity

C) Testing for most significant variables

P a g e | 14 i. Correlation matrix:
LGNP 1.000000 -0.820541 0.837673 0.205396 0.782157 0.160580 LAGR -0.820541 1.000000 -0.657261 -0.155510 -0.634073 0.090892 EDU 0.837673 -0.657261 1.000000 0.154551 0.666641 0.125086 LGDP 0.205396 -0.155510 0.154551 1.000000 0.344408 -0.001318 LLIT 0.782157 -0.634073 0.666641 0.344408 1.000000 0.215096 SOC 0.160580 0.090892 0.125086 -0.001318 0.215096 1.000000

LGNP LAGR EDU LGDP LLIT SOC

High pairwise correlation between variables

ii. Plotting the variables with high pairwise correlation against each other:
Dependent Variable: EDU Method: Least Squares Date: 11/02/08 Time: 21:13 Sample: 1 40 Included observations: 40 Variable Coefficient LAGR 2.725792 LGNP 18.10410 LLIT 0.782351 C -90.64595 R-squared 0.704764 Adjusted R-squared 0.680161 S.E. of regression 12.07119 Sum squared resid 5245.689 Log likelihood -154.2832 Durbin-Watson stat 1.786662

Std. Error t-Statistic 4.724685 0.576926 3.995916 4.530651 4.110005 0.190353 31.69988 -2.859504 Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic)

Prob. 0.5676 0.0001 0.8501 0.0070 27.82500 21.34441 7.914159 8.083047 28.64543 0.000000

Dependent Variable: LLIT Method: Least Squares Date: 11/02/08 Time: 21:18 Sample: 1 40 Included observations: 40 Variable Coefficient EDU 0.001285 LAGR 0.021386 LGNP 0.552191 C 0.638572 R-squared 0.612342 Adjusted R-squared 0.580038 S.E. of regression 0.489258 Sum squared resid LAGR 8.617443 Dependent Variable: Log likelihood Squares -26.05572 Method: Least Durbin-Watson stat 1.475314 Date: 11/02/08 Time: 21:16 Sample: 1 40

Std. Error t-Statistic 0.006752 0.190353 0.192347 0.111186 0.180879 3.052821 1.419298 0.449921 Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic)

Prob. 0.8501 0.9121 0.0042 0.6555 4.009402 0.754975 1.502786 1.671674 18.95515 0.000000

P a g e | 15
Included observations: 40 Variable Coefficient EDU 0.003361 LGNP -0.625408 LLIT 0.016051 C 6.506712 R-squared 0.676432 Adjusted R-squared 0.649468 S.E. of regression 0.423865 Sum squared resid 6.467813 Log likelihood -20.31671 Durbin-Watson stat 2.162241

Std. Error t-Statistic 0.005825 0.576926 0.141591 -4.416998 0.144366 0.111186 0.586836 11.08778 Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic)

Prob. 0.5676 0.0001 0.9121 0.0000 2.958981 0.715919 1.215836 1.384724 25.08648 0.000000

Dependent Variable: LGNP Method: Least Squares Date: 11/02/08 Time: 21:17 Sample: 1 40 Included observations: 40 Variable Coefficient EDU 0.020058 LAGR -0.561980 LLIT 0.372414 C 5.536714 R-squared 0.864969 Adjusted R-squared 0.853716 S.E. of regression 0.401797 Sum squared resid 5.811863 Log likelihood -18.17798 Durbin-Watson stat 2.246187

Std. Error t-Statistic 0.004427 4.530651 0.127231 -4.416998 0.121990 3.052821 0.717414 7.717599 Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic)

Prob. 0.0001 0.0001 0.0042 0.0000 5.925100 1.050530 1.108899 1.277787 76.86839 0.000000

With GNP as the dependant variable, all the other variables are statistically significant.

80 70 60 50 iii. Scatter Diagrams EDU 40 30 20 10 0 4 5 6 LGNP 7 8 9

P a g e | 16

EDU and GNP show clear positive correlation. If EDU increases GNP will also increase.

5.0 4.5 4.0 LLIT 3.5 3.0 2.5 2.0 1.5 4 5 6 LGNP 7 8 9

As LIT increases GNP also increases which shows a clear sign of colleration.

4.5 4.0 3.5 LAGR 3.0 2.5 2.0 1.5 1.0 4 5 6 LGNP 7 8 9

There is a clear negative colleration

D) Model 1
Dependent Variable: Y Method: Least Squares Date: 11/02/08 Time: 23:19 Sample: 1 40 Included observations: 40 Variable Coefficient LURB 0.077193 LPOP 0.090774

Std. Error 0.030061 0.026082

t-Statistic 2.567918 3.480293

Prob. 0.0148 0.0014

P a g e | 17
LGNP LGDP SOC C R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat -0.062772 -0.006747 -0.048316 0.500507 0.616493 0.560095 0.076623 0.199619 49.24691 1.470391 0.023469 -2.674714 0.022867 -0.295078 0.041036 -1.177395 0.091850 5.449171 Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic) 0.0114 0.7697 0.2472 0.0000 0.438500 0.115527 -2.162346 -1.909014 10.93109 0.000003

E) Model 2
Dependent Variable: Y Method: Least Squares Date: 11/02/08 Time: 23:58 Sample: 1 40 Included observations: 40 Variable Coefficient LGNP -0.060656 LGDP -0.010127 LPOP 0.103669 LURB 0.077351 C 0.479079 R-squared 0.600856 Adjusted R-squared 0.555240 S.E. of regression 0.077045 Sum squared resid 0.207758 Log likelihood 48.44765 Durbin-Watson stat 1.524263

Std. Error t-Statistic 0.023529 -2.577959 0.022811 -0.443947 0.023802 4.355442 0.030226 2.559105 0.090525 5.292263 Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic)

Prob. 0.0143 0.6598 0.0001 0.0150 0.0000 0.438500 0.115527 -2.172383 -1.961273 13.17194 0.000001

F) Final model
Dependent Variable: Y Method: Least Squares Date: 10/22/08 Time: 17:16 Sample: 1 40 Included observations: 40 Variable Coefficient LPOP 0.100338 LURB 0.075343 LGNP -0.061792 C 0.479521 R-squared 0.598609 Adjusted R-squared 0.565160 S.E. of regression 0.076181 Sum squared resid 0.208928 Log likelihood 48.33535 Durbin-Watson stat 1.529399

Std. Error t-Statistic 0.022335 4.492314 0.029550 2.549652 0.023127 -2.671912 0.089504 5.357548 Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic)

Prob. 0.0001 0.0152 0.0113 0.0000 0.438500 0.115527 -2.216767 -2.047879 17.89602 0.000000

P a g e | 18

Das könnte Ihnen auch gefallen