Sie sind auf Seite 1von 27

# STATISTICALRELATIONSHIPBETWEENINCOMEAND EXPENDITURES, (INCOME=DEPENDENTVARIABLE& EXENDITURES=INDEPENDENTVARIABLE)

AProjectPresented By Rehan Ehsan Contact# +92 321 8880397 rehan.ehsan89@gmail.com To Dr. Naheed Sultana In partial fulfillment of the requirements for course completion of ECONOMETRICS

## LAHORE SCHOOL OF ACCOUNTING & FINANCE

The University of Lahore

Acknowledgement
To say this project is by Rehan Ehsan overstates the case. Without the significant contributions made by other people this project would certainly not exist. I would like to say thanks to general public who helped me out to have questionnaires regarding their income and expenses. Thanks to their cooperation and thanks to my colleges as well who helped me making my project completed.

ABSTRACT

We found that monthly expenditures are dependent on the monthly total income and the contribution of population is very low in this regards. As person who earns also make expenses and also save the surplus amount so total monthly income is break up of Expenditures and savings.

TABLE OF CONTENTS Introduction------------------------------------------------------------------------------------------------1 Data table---------------------------------------------------------------------------------------------------1 Descriptive statistics--------------------------------------------------------------------------------------2 Frequency table--------------------------------------------------------------------------------------------4 Histogram--------------------------------------------------------------------------------------------------6 Simple linear regression function-----------------------------------------------------------------------7 Regression analysis---------------------------------------------------------------------------------------7 Problems of Regression analysis------------------------------------------------------------------------7 Ordinary least square method---------------------------------------------------------------------------8 Test of regression estimates-----------------------------------------------------------------------------8 F-Test-------------------------------------------------------------------------------------------------------9 ANOVA----------------------------------------------------------------------------------------------------9 Reliability--------------------------------------------------------------------------------------------------9 Models of ANOVA-------------------------------------------------------------------------------------11 I. II. III. Fixed effect model Random effect model Mixed effect model

Assumptions----------------------------------------------------------------------------------------------11 Means-----------------------------------------------------------------------------------------------------12 Goodness to fit-------------------------------------------------------------------------------------------12 Chi square Goodness to fit-----------------------------------------------------------------------------12 Correlation------------------------------------------------------------------------------------------------13 Correlation coefficient----------------------------------------------------------------------------------14
4

I. II.

## T-Test-----------------------------------------------------------------------------------------------------19 Uses of T-Test-------------------------------------------------------------------------------------------20 Types of T-Test------------------------------------------------------------------------------------------20 Summary--------------------------------------------------------------------------------------------------21 Conclusion------------------------------------------------------------------------------------------------22

INTRODUCTION: I made survey on general public and ask them about their Income and Expenses. From the data gathered I rounded off the figures from 5,000 to 150,000 and put the expenditures to their nearest as per my research. This project is to show the relationship between monthly income and expenditures. DATA TABLE:

Sr# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Total

Income 5,000 10,000 15,000 20,000 25,000 30,000 35,000 40,000 45,000 50,000 55,000 60,000 65,000 70,000 75,000 80,000 85,000 90,000 95,000 100,000 105,000 110,000 115,000 120,000 125,000 130,000 135,000 140,000 145,000 150,000 2,325,000

Expenditur e 5,000 9,500 14,500 18,500 19,000 27,000 30,500 35,000 39,000 45,500 49,500 52,000 55,000 59,000 64,000 69,500 73,000 78,500 81,000 84,700 90,000 90,000 90,500 93,000 94,800 95,750 98,000 100,000 104,590 110,000 1,876,340

DESCRIPTIVE STATISTICS:

Descriptive Statistics
Std. Deviation Statistic 44017.042 32055.690 8036.376 5852.542

## Variance Statistic 19375000 00.000 10275672 67.126

Test Statistics
INCOME .000 29 1.000 EXPENDITURE .933 28 1.000

## Chi-Square(a,b) df Asymp. Sig.

a) 30 cells (100.0%) have expected frequencies less than 5. The minimum expected cell frequency is 1.0.
b) 29 cells (100.0%) have expected frequencies less than 5. The minimum expected cell frequency is 1.0.

INCOME
Observed N 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 55000 60000 65000 70000 75000 80000 85000 90000 95000 100000 105000 110000 115000 120000 125000 130000 135000 140000 145000 150000 Total 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 30 Expected N 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Residual 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5000 9500 14500 18500 19000 27000 30500 35000 39000 45500 49500 52000 55000 59000 64000 69500 73000 78500 81000 84700 90000 90500 93000 94800 95750 98000 100000 104590 110000 Total

EXPENDITURE
Observed N 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 30 Expected N 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Residual 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

FREQUENCY TABLE:
8

INCOME
Frequency 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 55000 60000 65000 70000 75000 80000 85000 Valid 90000 95000 10000 0 10500 0 11000 0 11500 0 12000 0 12500 0 13000 0 13500 0 14000 0 14500 0 15000 0 Total 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 30 Percent 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 100 Valid Percent 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 100 Cumulative Percent 3.3 6.7 10 13.3 16.7 20 23.3 26.7 30 33.3 36.7 40 43.3 46.7 50 53.3 56.7 60 63.3 66.7 70 73.3 76.7 80 83.3 86.7 90 93.3 96.7 100 Vali d 5000 9500 14500 18500 19000 27000 30500 35000 39000 45500 49500 52000 55000 59000 64000 69500 73000 78500 81000 84700 90000 90500 93000 94800 95750 98000 10000 0 10459 0 11000 0 Total

EXPENDITURE
Frequency 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 30 Percent 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 6.7 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 100 Valid Percent 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 6.7 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 100 Cumulative Percent 3.3 6.7 10 13.3 16.7 20 23.3 26.7 30 33.3 36.7 40 43.3 46.7 50 53.3 56.7 60 63.3 66.7 73.3 76.7 80 83.3 86.7 90 93.3 96.7 100

Statistics
INCOME N Mean Std. Error of Mean Median Mode Std. Deviation Variance Skewness Std. Error of Skewness Range Minimum Maximum Sum Percentiles 25 50 75 a) b) c) Valid Missing 30 0 77500.00 8036.376 77500.00(a) 5000(b) 44017.042 1937500000.000 .000 .427 145000 5000 150000 2325000 40000.00(c) 77500.00 115000.00 EXPENDITURE 30 0 62544.67 5852.542 66750.00(a) 90000 32055.690 1027567267.126 -.310 .427 105000 5000 110000 1876340 35000.00(c) 66750.00 90500.00

Calculated from grouped data. Multiple modes exist. The smallest value is shown Percentiles are calculated from grouped data.

## Ratio Statistics for INCOME / EXPENDITURE

Mean 95% Confidence Interval for Mean Median 95% Confidence Interval for Median Weighted Mean 95% Confidence Interval for Weighted Mean Minimum Maximum Std. Deviation Range Price Related Differential Coefficient of Dispersion Coefficient of Variation a) Median Centered Lower Bound Upper Bound Lower Bound Upper Bound Actual Coverage Lower Bound Upper Bound 1.197 1.156 1.238 1.169 1.148 1.222 95.7% 1.239 1.196 1.282 1.000 1.400 .110 .400 .966 .071 9.7%

The confidence interval for the median is constructed without any distribution assumptions. The actual coverage level may be greater than the specified level. Other confidence intervals are constructed by assuming a Normal distribution for the ratios.

10

## HISTOGRAM WITH NORMAL CURVE:

INCOME

Frequency

1 Mean =77500 Std. Dev. =44017.042 N =30 0 20000 40000 60000 80000 100000 120000 140000

INCOME

EXPENDITURE

Frequency

## Mean =62544.67 Std. Dev. =32055.69 N =30

EXPENDITURE

11

SIMPLE REGRESSION FUNCTION: In statistics, simple linear regression is the least squares estimator of a linear regression model with a single predictor variable. In other words, simple linear regression fits a straight line through the set of n points in such a way that makes the sum of squared residuals of the model (that is, vertical distances between the points of the data set and the fitted line) as small as possible. REGRESSION ANALYSIS: Regression analysis includes any techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables. More specifically, regression analysis helps one understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed. Most commonly, regression analysis estimates the conditional expectation of the dependent variable given the independent variables that is, the average value of the dependent variable when the independent variables are held fixed. Less commonly, the focus is on a quantile, or other location parameter of the conditional distribution of the dependent variable given the independent variables. In all cases, the estimation target is a function of the independent variables called the regression function. In regression analysis, it is also of interest to characterize the variation of the dependent variable around the regression function, which can be described by a probability distribution. Regression analysis is widely used for prediction and forecasting, where its use has substantial overlap with the field of machine learning. Regression analysis is also used to understand which among the independent variables are related to the dependent variable, and to explore the forms of these relationships. In restricted circumstances, regression analysis can be used to infer causal relationships between the independent and dependent variables. PROBLEMS IN REGRESSION ANALYSIS: MULTICOLLINEARITY Multicollinearity is a statistical phenomenon in which two or more predictor variables in a multiple regression model are highly correlated. In this situation the coefficient estimates may change erratically in response to small changes in the model or the data. Multicollinearity does not reduce the predictive power or reliability of the model as a whole, at least within the sample data themselves; it only affects calculations regarding individual predictors. That is, a multiple regression model with correlated predictors can indicate how well the entire bundle of predictors predicts the outcome variable, but it may not give valid results about any individual predictor, or about which predictors are redundant with respect to others. HETEROSCEDASTICITY In statistics, a sequence of random variables is heteroscedastic, or heteroscedastic, if the random variables have different variances. The term means "differing variance" and comes
12

from the Greek "hetero" ('different') and "skedasis" ('dispersion'). In contrast, a sequence of random variables is called homoscedastic if it has constant variance. ORDINARY LEAST SQUARE METHOD Ordinary least squares (OLS) or linear least squares are a method for estimating the unknown parameters in a linear regression model. This method minimizes the sum of squared vertical distances between the observed responses in the dataset, and the responses predicted by the linear approximation. The resulting estimator can be expressed by a simple formula, especially in the case of a single regressor on the right-hand side. The OLS estimator is consistent when the regressor are exogenous and there is no Multicollinearity, and optimal in the class of linear unbiased estimators when the errors are homoscedastic and serially uncorrelated. Under these conditions, the method of OLS provides minimum-variance mean-unbiased estimation when the errors have finite variances. Under the additional assumption that the errors be normally distributed, OLS is the maximum likelihood estimator. OLS is used in economics (econometrics) and electrical engineering (control theory and signal processing), among many areas of application. TEST OF REGRESSION ESTIMATES: To test if one variable significantly predicts another variable we need to only test if the correlation between the two variables is significant different to zero (i.e., as above). In regression, a significant prediction means a significant proportion of the variability in the predicted variable can be accounted for by (or "attributed to", or "explained by", or "associated with") the predictor variable.

Descriptive Statistics
N INCOME EXPENDITURE Valid N (listwise) 30 30 30 Mean 77500.00 62544.67 Std. Deviation 44017.042 32055.690

Model Fit
Mean Fit Statistic Stationary R-squared R-squared RMSE MAPE MaxAPE MAE MaxAE Normalized BIC 5 .428 .997 1882.23 1 3.282 16.348 1439.57 7 4395.07 6 15.307 SE 10 . . . . . . . . Minimum 25 .428 .997 1882.23 1 3.282 16.348 1439.57 7 4395.07 6 15.307 Maximum 50 .428 .997 1882.231 3.282 16.348 1439.577 4395.076 15.307 75 .428 .997 1882.23 1 3.282 16.348 1439.57 7 4395.07 6 15.307 90 .428 .997 1882.23 1 3.282 16.348 1439.57 7 4395.07 6 15.307 95 .428 .997 1882.23 1 3.282 16.348 1439.57 7 4395.07 6 15.307 Percentile 5 .428 .997 1882.23 1 3.282 16.348 1439.57 7 4395.07 6 15.307 10 .428 .997 1882.23 1 3.282 16.348 1439.57 7 4395.07 6 15.307 25 .428 .997 1882.23 1 3.282 16.348 1439.57 7 4395.07 6 15.307 50 .428 .997 1882.231 3.282 16.348 1439.577 4395.076 15.307

13

ANOVA (b)
Model 1 Sum of Squares 29230939495.261 568511251.405 29799450746.667 df 1 28 29 Mean Square 29230939495.261 20303973.264 F 1439.666 Sig. .000(a)

a) b)

## Predictors: (Constant), INCOME Dependent Variable: EXPENDITURE

F-TEST An F-test is any statistical test in which the test statistic has an F-distribution under the null hypothesis. It is most often used when comparing statistical models that have been fit to a data set, in order to identify the model that best fits the population from which the data were sampled. ANOVA Analysis of variance (ANOVA) is a collection of statistical models, and their associated procedures, in which the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form ANOVA provides a statistical test of whether or not the means of several groups are all equal, and therefore generalizes t-test to more than two groups. Doing multiple two-sample t-tests would result in an increased chance of committing a type I error. For this reason, ANOVAs are useful in comparing two, three or more means. RELIABILITY:
Case Processing Summary
N Cases Valid Excluded(a) Total a) 30 0 30 % 100.0 .0 100.0

## Listwise deletion based on all variables in the procedure.

Reliability Statistics

14

## Inter-Item Covariance Matrix

INCOME 1937500000.000 1397472413.793 EXPENDITURE 1397472413.793 1027567267.126

INCOME EXPENDITURE

## Inter-Item Correlation Matrix

INCOME 1.000 .990 EXPENDITURE .990 1.000

INCOME EXPENDITURE

## Summary Item Statistics

Mean Item Means Item Variances Inter-Item Covariances Inter-Item Correlations 70022.333 14825336 33.563 13974724 13.793 .990 Minimum 62544.667 10275672 67.126 13974724 13.793 .990 Maximum 77500.000 19375000 00.000 13974724 13.793 .990 Range 14955.333 90993273 2.874 .000 .000 Maximum / Minimum 1.239 1.886 1.000 1.000 Variance 11183099 7.556 41398878 91773752 00.000 .000 .000 N of Items 2 2 2 2

Item-Total Statistics
Scale Mean if Item Deleted 1.1186 .6402 Scale Variance if Item Deleted 2.439 .176 Corrected Item-Total Correlation .302 .302 Squared Multiple Correlation .091 .091 Cronbach's Alpha if Item Deleted .a .
a

VAR00001 VAR00002 .

The value is negative due to a negative average covariance among items. This Violates reliability model assumptions. You may want to check item codings.

Scale Statistics
Mean 140044.67 Variance 57600120 94.713 Std. Deviation 75894.744 N of Items 2

15

ANOVA
Sum of Squares 83520175373.33 3 Between Items Residual Total Total Grand Mean = 70022.33 3354929926.667 2466775373.333 5821705300.000 89341880673.33 3 df 29 1 29 30 59 Mean Square 2880006047.356 3354929926.667 85061219.770 194056843.333 1514269163.955 39.441 .000 F Sig

MODELS:

## (Model 1) FIXED EFFECTS MODEL

The fixed-effects model of analysis of variance applies to situations in which the experimenter applies one or more treatments to the subjects of the experiment to see if the response variable values change. This allows the experimenter to estimate the ranges of response variable values that the treatment would generate in the population as a whole.

## (Model 2) RANDOM EFFECT MODEL

Random effects models are used when the treatments are not fixed. This occurs when the various factor levels are sampled from a larger population. Because the levels themselves are random variables, some assumptions and the method of contrasting the treatments differ from ANOVA model 1.

## (Model 3) MIXED EFFECTS MODEL

A mixed-effects model contains experimental factors of both fixed and random-effects types, with appropriately different interpretations and analysis for the two types. Most random-effects or mixed-effects models are not concerned with making inferences concerning the particular values of the random effects that happen to have been sampled. For example, consider a large manufacturing plant in which many machines produce the same product. The statistician studying this plant would have very little interest in comparing the three particular machines to each other. Rather, inferences that can be made for all machines are of interest, such as their variability and the mean. However, if one is interested in the realized value of the random effect, best linear unbiased prediction can be used to obtain a "prediction" for the value. ASSUMPTIONS OF ANOVA The analysis of variance has been studied from several approaches, the most common of which use a linear model that relates the response to the treatments and blocks. Even when the statistical model is nonlinear, it can be approximated by a linear model for which an analysis of variance may be appropriate.
16

Independence of cases this is an assumption of the model that simplifies the statistical analysis. Normality the distributions of the residuals are normal. Equality (or "homogeneity") of variances, called homoscedasticity the variance of data in groups should be the same. Model-based approaches usually assume that the variance is constant. The constant-variance property also appears in the randomization (design-based) analysis of randomized experiments, where it is a necessary consequence of the randomized design.

MEANS:
Case Processing Summary
Cases Included N EXPENDITURE * INCOME 30 Percent 100.0% N 0 Excluded Percent .0% N 30 Total Percent 100.0%

Report

GOODNESS TO FIT: The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used in statistical hypothesis testing. CHI-SQUARE AS GOODNESS TO FIT When an analyst attempts to fit a statistical model to observed data, he or she may wonder how well the model actually reflects the data. How "close" are the observed values to those which would be expected under the fitted model? One statistical test that addresses this issue is the chi-square goodness of fit test.
Test Statistics

## EXPENDITURE .933 28 1.000

30 cells (100.0%) have expected frequencies less than 5. The minimum expected cell frequency is 1.0. 29 cells (100.0%) have expected frequencies less than 5. The minimum expected cell frequency is 1.0.

17

INCOME
Observed N 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 55000 60000 65000 70000 75000 80000 85000 90000 95000 100000 105000 110000 115000 120000 125000 130000 135000 140000 145000 150000 Total 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 30 Expected N 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Residual 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5000 9500 14500 18500 19000 27000 30500 35000 39000 45500 49500 52000 55000 59000 64000 69500 73000 78500 81000 84700 90000 90500 93000 94800 95750 98000 100000 104590 110000 Total

EXPENDITURES
Observed N 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 30 Expected N 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Residual 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

CORRELATION: Dependence refers to any statistical relationship between two random variables or two sets of data. Correlation refers to any of a broad class of statistical relationships involving dependence. Familiar examples of dependent phenomena include the correlation between the physical statures of parents and their offspring, and the correlation between the demand for a product and its price. Correlations are useful because they can indicate a predictive relationship that
18

can be exploited in practice. For example, an electrical utility may produce less power on a mild day based on the correlation between electricity demand and weather. In this example there is a causal relationship.

Descriptive Statistics
INCOME EXPENDITURE Mean 77500.00 62544.67 Std. Deviation 44017.042 32055.690 N 30 30

Correlations
INCOME 1 56187500 000.000 19375000 00.000 30 .990(**) .000 29799450746.667 1027567267.126 30 EXPENDITURE .990(**) .000 40526700000.000 1397472413.793 30 1

INCOME

EXPENDITURE

## Pearson Correlation Sig. (2-tailed) Sum of Squares and Cross-products

40526700 000.000 Covariance 13974724 13.793 N 30 ** Correlation is significant at the 0.01 level (2-tailed).

CORRELATION COEFFICEINT: Correlation coefficient may refer to: Pearson product-moment correlation coefficient, also known as r, R, or Pearson's r, a measure of the strength of the linear relationship between two variables that is defined in terms of the (sample) covariance of the variables divided by their (sample) standard deviations Correlation and dependence, a broad class of statistical relationships between two or more random variables or observed data values Goodness of fit, which refers to any of several measures that measure how well a statistical model fits observations by summarizing the discrepancy between observed values and the values expected under the model in question

19

## Coefficient of determination, a measure of the proportion of variability in a data set

that is accounted for by a statistical model; often called R2; equal in a single-variable linear regression to the square of Pearson's product-moment correlation coefficient.

## Coefficient Correlations (a)

Model 1 a) Correlations Covariances INCOME INCOME INCOME 1.000 .000

## Collinearity Diagnostics (a)

Eigenvalue Model 1 Dimension 1 2 a) Condition Index Variance Proportions

(Constant) INCOME (Constant) INCOME 1.873 1.000 .06 .06 .127 3.842 .94 .94 Dependent Variable: EXPENDITURE

RESIDUALS:

## Residuals Statistics (a)

Predicted Value Residual Std. Predicted Value Std. Residual a) Minimum 10252.15 -7624.422 -1.647 Maximum 114837.18 7620.241 1.647 Mean 62544.67 .000 .000 Std. Deviation 31748.440 4427.622 1.000 .983 N 30 30 30 30

CHARTS:
20

Histogram

5

Frequency

1.0

0.8

0.6

0.4

0.2

21

1.0

0.8

0.6

0.4

0.2

## Observed Cum Prob

Transforms: natural log

10459 0

## Dot/Lines show Modes

95 750 90 500

EXPENDITURE

81 000 69 500 55 000 45 500 30 500 18 500 50 00 5000 20000 35000 50000 65000 80000

INCOME

## CLASSICAL NORMAL LIINEAR REGRESSION MODEL:

22

Econometrics is all about causality. Economics is full of theory of how one thing causes another: increases in prices cause demand to decrease, better education causes people to become richer, etc. So to be able to test this theory, economists find data (such as price and quantity of a good, or notes on a population's education and wealth levels). Data always comes out looking like a cloud, and without using proper techniques, it is impossible to determine if this cloud gives any useful information. Econometrics is a tool to establish correlation and hopefully later, causality, using collected data points. We do this by creating an explanatory function from the data. The function is linear model and is estimated by minimizing the squared distance from the data to the line. The distance is considered an error term. This is the process of linear regression.

ASSUMPTIONS UNDERLYING CLASSICAL NORMAL LIINEAR REGRESSION MODEL There are 5 critical assumptions relating to CLRM. These assumptions are required to show that the estimation technique, Ordinary Least Squares (OLS), has a number of desirable properties, and also so that the hypothesis tests regarding the coefficient estimates could validly be conducted.

CRITICAL ASSUMPTIONS: The errors have zero mean. The variance of the errors is constant and finite over all values of X. The errors are statistically independent of one another. There is no relationship between the error and the corresponding X. is normally distributed. DETAILED ASSUMPTIONS The regression model is linear in parameters The value of the regressors, Xs (independent variables) are fixed in repeated samples. For given values of Xs, the mean value of the errors equals zero. For given values of Xs, the variance of the errors in constant. For given values of Xs there is no autocorrelation. The Xs are stochastic and the errors and the Xs are not correlated. The number of observations is greater than the number of independent variables. There is sufficient variability in the values of the Xs. The regression model is correctly specified.
23

## The error term is normally distributed.

T-TEST: A t-test is any statistical hypothesis test in which the test statistic follows a Student's t distribution, if the null hypothesis is supported. It is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known.

One-Sample Statistics
N INCOME EXPENDITURE 30 30 Mean 77500.00 62544.67 Std. Deviation 44017.042 32055.690 Std. Error Mean 8036.376 5852.542

One-Sample Test
Test Value = 0 Mean Sig. (2-tailed) Difference Lower 29 29 .000 .000

df Upper

## Upper Lower 77500.000 61063.77 62544.667 50574.88

ANOVA EXPENDITURE
Sum of Squares Between Groups (Combined) Linear Term Contrast Deviation Within Groups Total 29799450746.667 29230939495.261 568511251.405 .000 29799450746.667 df 29 1 28 0 29 Mean Square 1027567267.126 29230939495.261 20303973.264 . F . . . Sig. .

## USES: Among the most frequently used t-tests are:

24

A one-sample location test of whether the mean of a normally distributed population has a value specified in a null hypothesis. A two sample location test of the null hypothesis that the means of two normally distributed populations are equal. All such tests are usually called Student's t-tests, though strictly speaking that name should only be used if the variances of the two populations are also assumed to be equal; the form of the test used when this assumption is dropped is sometimes called Welch's t-test. These tests are often referred to as "unpaired" or "independent samples" t-tests, as they are typically applied when the statistical units underlying the two samples being compared are nonoverlapping. A test of the null hypothesis that the difference between two responses measured on the same statistical unit has a mean value of zero. For example, suppose we measure the size of a cancer patient's tumor before and after a treatment. If the treatment is effective, we expect the tumor size for many of the patients to be smaller following the treatment. This is often referred to as the "paired" or "repeated measures" t-test. A test of whether the slope of a regression line differs significantly from 0.

TYPES: UNPAIRED & PAIRED TWO SAMPLES T-Test Two-sample t-tests for a difference in mean can be either unpaired or paired. Paired t-tests are a form of blocking, and have greater power than unpaired tests when the paired units are similar with respect to "noise factors" that are independent of membership in the two groups being compared. In a different context, paired t-tests can be used to reduce the effects of confounding factors in an observational study. The unpaired, or "independent samples" t-test is used when two separate sets of independent and identically distributed samples are obtained, one from each of the two populations being compared. For example, suppose we are evaluating the effect of a medical treatment, and we enroll 100 subjects into our study, then randomize 50 subjects to the treatment group and 50 subjects to the control group. In this case, we have two independent samples and would use the unpaired form of the t-test. The randomization is not essential hereif we contacted 100 people by phone and obtained each person's age and gender, and then used a two-sample ttest to see whether the mean ages differ by gender, this would also be an independent samples t-test, even though the data are observational. Dependent samples (or "paired") t-tests typically consist of a sample of matched pairs of similar units, or one group of units that has been tested twice (a "repeated measures" t-test). A typical example of the repeated measures t-test would be where subjects are tested prior to a treatment, say for high blood pressure, and the same subjects are tested again after treatment with a blood-pressure lowering medication. A dependent t-test based on a "matched-pairs sample" results from an unpaired sample that is subsequently used to form a paired sample, by using additional variables that were measured
25

along with the variable of interest. The matching is carried out by identifying pairs of values consisting of one observation from each of the two samples, where the pair is similar in terms of other measured variables. This approach is often used in observational studies to reduce or eliminate the effects of confounding factors.

SUMMARY:

## Case Processing Summary (a)

Cases Included N INCOME EXPENDITURE a) Percent N 30 100.0% 30 100.0% Limited to first 100 cases. Excluded 0 0 Percent .0% .0% N 30 30 Total Percent 100.0% 100.0%

26

## Case Summaries (a)

Case Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Mean Minimum Maximum Range Variance N a) Limited to first 100 cases. INCOME 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 55000 60000 65000 70000 75000 80000 85000 90000 95000 100000 105000 110000 115000 120000 125000 130000 135000 140000 145000 150000 77500.00 5000 150000 145000 19375000 00.000 30 EXPENDITURE 5000 9500 14500 18500 19000 27000 30500 35000 39000 45500 49500 52000 55000 59000 64000 69500 73000 78500 81000 84700 90000 90000 90500 93000 94800 95750 98000 100000 104590 110000 62544.67 5000 110000 105000 1027567267.126 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Total

CONCLUSION: Hence from all the above discussion, we found that monthly expenditures are dependent on the monthly total income and the contribution of population is very low in this regards. As the person who earns make expenses and also save the surplus amount so total monthly income is break up of Expenditures and savings.

27