Statistics ECA Report

BUS105 STATISTICS
End-of-Course Assignment January Semester Assessment
Submitted by: Muhamad Fauzi Bin Mat Isa 14 March 2009
Table of Contents
Title Content Page Question 1 Question 2 Question 3 Question 4 References
Page/s 2 3-4 4-7 7-14 15 16
Qn1 a.
41.0

Family_Income
40.0

39.0
38.0
37.0 100 125 150 175 200
House_Value
The scatterplot shows graphically that the higher the value of the house, it is likely that the family that owns the house has a higher family income. It appears that both variables have a strong direct positive linear relationship.
b.
41.0

Family_Income
40.0
39.0
38.0
37.0 40 45 50 55
Age_Head
The scatterplot shows a wide spread. It indicates that the older the age of the household, it is likely that the family income increases. It appears that both variables have a weak positive linear relationship.
41.0
Family_Income
40.0

39.0
c.
38.0
37.0 200 300 400 500
Mortgage_Payment
The scatterplot shows graphically that the higher the current monthly mortgage payment, it is likely that the income of the family decreases. It appears that both variables have a weak direct negative linear relationship. Qn2 a.
Correlations
Family_Income
Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N
Family_Income 1 25 .720(**) .000 25
House_Value .720(**) .000 2%5 1 25
House_Value
** Correlation is significant at the 0.01 level (2-tailed).
From the SPSS output data, the correlation, r of 0.720 indicates a positive relationship. There is a direct relationship between the value of the house and the family income. Since the coefficient of correlation has a value range from -1.00 to +1.00 which indicates a perfect and strong correlation, there is evidence to show that the two variables have a moderately high strong association. It also demonstrates that the two variables are related i.e. the value of the house increases, the family income increases.
b.
Model Summary
Model
R .720(a) a Predictors: (Constant), House_Value 1
R Square .518
Adjusted R Square .497
Std. Error of the Estimate .7457
From the SPSS output data the coefficient of determination, r2 is indicated as 0.518. Through manual calculation, it is determined by (0.720) 2. It means that more than 52% of the variation of the family income is explained or responsible for by the variation in the value of the house. c. A test of significance for the coefficient of correlation may be used to determine if the computed r occurred in a population in which the two variables are not related i.e. zero correlation in the population. Step 1: Stating the hypothesis: H0: = 0 (The correlation in the population is zero) H1: 0 (The correlation in the population is not zero) The null hypothesis H0 is that there is no correlation in the population and the alternate H1 that there is a correlation. The two-tailed test is used due to the way H1 is stated. Step 2: Stating the level of significance: The test will be done using 0.05 significance level, = 0.05 Step 3: Using the appropriate test statistic: The test statistics follows the t distribution with the degrees of freedom (n - 2) and the formula is
t= r n 2 1r 2
Step 4: Stating the decisional rule: H0 is rejected if t > t/2,n-2 or t < -t /2,n-2 t > t0.025,23 or t < -t .025,8 t > 2.069 or t < -2.069
-2.069
2.069
Step 5: Calculation of test value, critical value and decisional making: Computing t: t=
r n 2 1 r 2
0.720
25 2
2
1 0.720
= 4.976
The computed t (4.976) is within the rejection region. Thus, H0 is rejected. It translates that there is correlation in the population is not zero. It also demonstrates that there is correlation with respect to the value of the house and the family income. d. To find the expected family income when the value of the house of the house is $175,000, the following formula is used: General Form of Linear Regression = a + bX
In order to utilise the equation, the following formulas are used to find the values of a and b: Slope of the regression line Y-Intercept b= r
sy sx
a = Y - bX
By using the SPSS output, the values of the formulas of a and b can be inserted.
Descriptive Statistics N 25 25 25 Sum 998.2 3849 Mean 39.928 153.96 Std. Deviation 1.0514 28.841
Family_Income House_Value Valid N (listwise)
Slope of the regression line
b= r
sy sx
1.0514 28 .841
b = 0.720 b = 0.026
Y-Intercept
a = Y - bX a = 39.928 0.026(153.96) = 35.925
Thus, the expected family income when the value of the house is $175,000: General Form of Linear Regression = a + bX = 35.925 + 0.026(175) = 40.475 Hence, the expected family income is $40,475 when the value of the house is $175,000.
Coefficients(a) Unstandardized Model Coefficients Std. B Error 1 (Constant) 35.889 .826 House_Value .026 .005 a Dependent Variable: Family_Income Standardize d Coefficients Beta .720 95% Confidence Interval for B B 34.180 .015 Std. Error 37.598 .037
t Lower Bound 43.440 4.971
Sig. Upper Bound .000 .000
From the SPSS output, the values for a and b in the estimated linear regression equation are found in the blue rectangle box where a = 35.889 and b = 0.026. Thus, the estimated regression equation is = 35.889 + 0.026X. It is noted that there is a slight difference in the coefficient of the intercept based on manual computation and SPSS due to rounding. Qn3 MULTIPLE REGRESSION ANALYSIS From the SPSS output data, the values for b1 to b2 in the estimated multiple regression equation are found in the column highlighted inside the blue rectangle box:
Coefficients(a) Unstandardized Coefficients Model 1 (Constant) House_Value Age_Head Mortgage_Payment B 35.635 .025 .007 -.001 Std. Error 1.345 .005 .027 .001 Standardized Coefficients Beta .677 .037 -.081 t 26.490 4.540 .273 -.557 Sig. .000 .000 .788 .584
Gender a Dependent Variable: Family_Income
.716
.285
.345
2.507
.021
After substituting the b1 to b4 values, the multiple regression equation is as follows: = 35.635 + 0.025X1 + 0.07X2 0.001X3 + 0,716X4 Where: = X1 = X2 = X3 = X4 = the family income (S$) the value of the house the age of head of the household the current monthly mortgage payment 0 if female is the head of household 1 if male is the head of household
Interpretations of the coefficients b1: The value of the house (X1) indicates a direct positive relationship. When the value of the house increases, the family income increases as well. With each additional $1000 increase on the value of the house, the income of the family is expected to increase to $25 when the rest of the variables are held constant. b2: The age of the head of the household (X 2) indicates a direct positive relationship. With an older head, the family income also increases. Hence, for each additional year the head gets older, the family income is expected to increase by $7, provided the rest of the variables are held constant. b3: The current monthly mortgage payment (X3) indicates a negative and inverse relationship. As the current monthly mortgage payment increases, the family income decreases. As such, the increase of mortgage payment of $100 and hold the other variables held constant, the estimated decrease of $0.10 in the family income. b4: The family income headed by a male in the household is on average $71.60 higher than the family income headed by a female in the household. MODEL FIT The model fit uses the least squares criterion is develop the following equation: = a +b1X1 + b2X2 + b3X3 + bkXk The SPSS package is used instead to compute the tedious nature of the calculation. From the SPSS output, the ANOVA table generated the following values:
ANOVA(b) Sum of Squares 17.357
Model 1
Regression
df 4
Mean Square 4.339
F 9.461
Sig. .000(a)
Residual Total
9.173
20
.459
26.530 24 a Predictors: (Constant), Gender, Mortgage_Payment, Age_Head, House_Value b Dependent Variable: Family_Income
The SSR (the sum of squares due to regression) and SST (the total sum of squares) are used to compute the multiple coefficient of determination (R2) R2 = SSR/SST = 17.537/26.530 = 0.661 The computed result agrees with the R2 value appeared in the Model Summary of the SPSS Output below. It is noted that there is a slight difference in the value due to rounding.
Model Summary
Model
R R Square Adjusted R Square .809(a) .654 .585 a Predictors: (Constant), Gender, Mortgage_Payment, Age_Head, House_Value 1
2
Std. Error of the Estimate .6772
The computed adjusted R2 ( Ra ) is needed to confirm the adjusted R2 generated by the SPSS package. It can be computed through the following formula
2 Ra =1 (1 R 2 )
n 1 n p 1
where n = number of observations, and p = number of independent variables.

2 Ra =1 (1 0.654 )
25 1 = 0.5845 25 4 1
2
From the result, the computed Ra value agrees with the SPSS output above. Based on the R2 value, 65.4% of the variability in the family income is explained by the estimated multiple regression equation with, value of the house, age of the head of the household, current monthly mortgage payment and the gender of the head of the household as the independent variables. After adjusting the coefficient of determination for the number of independent variables in the model, the % of variability explained by the model is moderately high (58.5%). Thus, on this basis (without performing residual analysis), the estimated multiple regression equation fits fairly well. GLOBAL TEST (F-TEST) The F-test is used to investigate whether any of the independent variables have significant coefficients. As such, the null and alternate hypotheses are: H0 : 1= 2= 3= 4=0 H1 : not all s equal to 0
The null hypothesis is not rejected if all the regression coefficients are all zero. If the regression coefficients are all zero, it indicates that they are no use in estimating the dependent variable. The alternate hypothesis is not rejected when at least one of the coefficients is not zero. It is conclusive that at least one of the variables is significant in explaining the family income. The F distribution is used as the test statistic and the value F is found by the following equation:
F = SSR k SSE n ( k +1)
Step 1: Stating the hypothesis: H0 : 1= 2= 3= 4=0 H1 : not all s equal to 0 Step 2: Stating the level of significance: The test will be done using 0.05 significance level, = 0.05 n is the degree of freedom for the numerator = 4 n-(k+1) is the degree of freedom for the denominator = 25 (4+1) = 20 From the Appendix B.4 of the text book, the F critical value is 2.87 Step 3: Using the appropriate test statistic: The one-tailed test F distribution is used as the test statistic. Step 4: Stating the decisional rule: H0 is rejected if p-value < 0.05 or if F > 2.87 From the Appendix B.4 of the text book, the F critical value is 2.87 Step 5: Determine whether to reject H0 From the computed value using the SPSS data in the ANOVA table, the F value is:
F = SSR k 1 .5 7 4 7 3 = 9.173 20 = 9.461 SSE n ( k +1)
df = (4, 20)
10
2.87
ANOVA(b) Sum of Squares 17.357 9.173
Model 1
Regression Residual
df 4 20
Mean Square 4.339 .459
F 9.461
Sig. .000(a)
Total 26.530 24 a Predictors: (Constant), Gender, Mortgage_Payment, Age_Head, House_Value b Dependent Variable: Family_Income
Since the computed value and the SPPS output exceeds the critical value of 2.87 and p-value of 0.05, H0 is rejected and H1 is accepted. In conclusion, at least one of the regression coefficients does not equal to zero. TESTING OF INDIVIDUAL REGRESSION COEFFICIENTS The testing is to determine if any of the independent variables are considered unimportant and to be dropped from the regression model. The t distribution will be used as the test statistic. It will also be a two-tailed test. Thus the null and alternate hypotheses are as follows: H0 : 1 = 0 H1 : 1 0 H0 : 2 = 0 H1 : 2 0 H0 : 3 = 0 H1 : 3 0 H0 : 4 = 0 H1 : 4 0
The null hypothesis is rejected when the coefficient of each variable is not equal to zero. This implies that the alternate hypothesis is true and the variable is significant and has an inverse relationship with the dependent variable i.e. the family income. Also, the variable should be dropped from the regression model if the null hypothesis is not rejected. The formula of the testing the individual regression coefficient is as follows: b i t= i sbi Step 1: Stating the hypothesis: H 0 : 1 = 0 H 1 : 1 0 H 0 : 2 = 0 H 1 : 2 0 H 0 : 3 = 0 H 1 : 3 0 H 0 : 4 = 0 H 1 : 4 0
Step 2: Stating the level of significance: The test will be done using 0.05 significance level, = 0.05 Step 3: Using the appropriate test statistic:
11
t=
bi 0 sbi
The two-tailed test is used. The degrees of freedom is [(n-(k+1)] = 20. Thus, from the Appendix B.2 of the text book, the value of t is 2.086. Step 4: Stating the decisional rule: H0 is rejected if t > 2.086 or t < -2.086 Step 5: Calculation of test value, critical value and decisional making: The values of t1, t2, t3 and t4 are derived from the SPSS output in the blue rectangle box.
Coefficients(a) Unstandardized Coefficients Model 1 (Constant) House_Value Age_Head Mortgage_Payment Gender a Dependent Variable: Family_Income B 35.635 .025 .007 -.001 .716 Std. Error 1.345 .005 .027 .001 .285 Standardized Coefficients Beta .677 .037 -.081 .345 t 26.490 4.540 .273 -.557 2.507 Sig. .000 .000 .788 .584 .021
The t-value for value of the house t= bi 0 0.025 0 = =5 sbi 0.005
t1 value is 5.0 The t-value for age of the head of the household t= bi 0 0.007 0 = = 0.259 sbi 0.027
t2 value is 0.259 The t-value for value of the currently monthly mortgage payment t= bi 0 0.001 0 = = 1 sbi 0.001
t3 value is -1.0
12
The t-value for gender of the head of the household t= bi 0 0.716 0 = = 2.51 sbi 0.285
t4 value is 2.51 Based on the SPSS output and the computed values, the t-ratio for the value of the house and the gender of the head of the household exceed the t value but the computed values of the age of the head of the household and the current monthly mortgage payment are not in the rejected region. This indicates that the independent variables value of the house and gender of the head of the household should be retained and the other two variables should be dropped. RESIDUAL ANALYSIS Besides using the coefficient of multiple determination (R2) to determine the fit of the model, a more effective method i.e. residual analysis is used to validate a model. Furthermore, a high R2 does not guarantee that the model fits the data well. Use of a model that is that does not fit the data well cannot provide a solution to the underlying questions at hand. A residual scatterplot can be used to assess both linearity and homoscedasticity. From the SPSS output, the following scatterplot is generated.
Scatterplot
Dependent Variable: Fam ily_Incom e

2
Regression Standardized Residual
-1
-2 -3 -2 -1 0 1 2
R egression Standardized Predicted Value
13
The points in the plot seem to be fluctuating randomly in the horizontal band around zero. The residual plots show a random distribution of positive and negative values across the entire range of the variable plotted on the horizontal axis. The points are scattered and there is no obvious pattern. Thus, the plot supports the assumption of linearity. Also, the scatterplot does not suggest violations of the assumptions of zero means and constant variance of the random errors.
H istog ra m
D e pend ent V a ria ble: F am ily_Inc om e

6
Frequency
1 M ean =-9.19E -15 S td. D ev. =0.913 N =25 -2 -1 0 1 2
R egress io n S tan da rdized R es id u al
The SPSS package also generated a histogram to determine if the distribution is normal. The histogram shows that the residuals are roughly symmetrically distributed and clustered around zero. Hence, it indicates that a normal distribution exists. However, one residual is unusually large and should be investigated.
Qn 4 The findings suggest that the gender of the head of the household is a significant variable in predicting the family income. On an average, a male earns $71.60 higher than the female counterpart. The regression coefficient indicates that is has the highest value among the variables. The analysis reflects the present environment where the male workforce earns higher than the female workforce. Although the Bank should focus its mortgage loans business on the males, it is also prudent to target the females as well as there is a significant rise in the female workforce and their earnings are getting closer to the males. The next significant variable is the value of the house that a family owns. From the regression coefficient, it is found that the family income increase to $25 when the value of the house increase every $1000. Unsurprisingly, a family that pays a higher monthly mortgage loans tends to own a house of significant value. The Bank should target families that are scouting for their dream home. However, families may not want to incur any more debt due to the current economic climate.
14
Notwithstanding, the figures churned out are very basic parameters. Factors such as type of home, location and number or people in the household are important information to improve the quality of the data and thus, increase the accuracy of the tests. Such information is vital as it will give more comprehensive tests in indicating reliable predictors. It will ensure that the Bank will come up with a solid business plan for its mortgage loans activities. Another aspect to be considered is that these tests are to determine the possible existence of relationship between the independent and dependent variables. The tests are not a causal analysis but merely tools to determine correlation between the variables. Word Count : 300
References Lind, D.A.., Marchal, W.G., & Wathen, S.A (2008), Basic Statistic for Business and Economics, 6th International Edition, McGraw-Hill Companies Inc., New York USA
15

Statistics ECA Report

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Statistics ECA Report

Hochgeladen von

Copyright:

Verfügbare Formate

BUS105 STATISTICS

End-of-Course Assignment January Semester Assessment

Submitted by: Muhamad Fauzi Bin Mat Isa 14 March 2009

Title Content Page Question 1 Question 2 Question 3 Question 4 References

Page/s 2 3-4 4-7 7-14 15 16

37.0 100 125 150 175 200

37.0 200 300 400 500

Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N

Family_Income 1 25 .720(**) .000 25

House_Value .720(**) .000 2%5 1 25

** Correlation is significant at the 0.01 level (2-tailed).

R .720(a) a Predictors: (Constant), House_Value 1

Adjusted R Square .497

Std. Error of the Estimate .7457

Family_Income House_Value Valid N (listwise)

Slope of the regression line

a = Y - bX a = 39.928 0.026(153.96) = 35.925

t Lower Bound 43.440 4.971

Sig. Upper Bound .000 .000

Gender a Dependent Variable: Family_Income

Mean Square 4.339

Std. Error of the Estimate .6772

where n = number of observations, and p = number of independent variables.

Mean Square 4.339 .459

The t-value for value of the house t= bi 0 0.025 0 = =5 sbi 0.005

Dependent Variable: Fam ily_Incom e

Regression Standardized Residual

R egression Standardized Predicted Value

D e pend ent V a ria ble: F am ily_Inc om e

1 M ean =-9.19E -15 S td. D ev. =0.913 N =25 -2 -1 0 1 2

R egress io n S tan da rdized R es id u al

Das könnte Ihnen auch gefallen