Fitting Regression Model Using Statistical Tools

Jastini Mohd Jamil
Zahayu Md Yusof
Izwan Nizal Mohd Shaharanee
Fitting Regression Model
Using Statistical Tools
School Of Quantitative Sciences
Universiti Utara Malaysia
2

This work is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License.
Fitting Regression Model
Using Statistical Tools
Jastini Mohd Jamil
Zahayu Md Yusof
Izwan Nizal Mohd Shaharanee

School of Quantitative Sciences
Universiti Utara Malaysia

2014
1

FITTING REGRESSION MODEL USING STATISTICAL TOOLS

1.0 What is regression analysis?
2.0 Linear Regression analysis
3.0 Fitting regression model using Microsoft Excel
4.0 Fitting regression model using SPSS
5.0 Fitting regression model using SAS
6.0 Regression results
7.0 How to handle missing value?

2

1.0 What is regression analysis?

A statistical process for estimating the relationships among variables.
It includes many techniques for modeling and analyzing several variables,
when the focus is on the relationship between a dependent variable and
one or more independent variables.
A regression model is a mathematical equation that describes the
relationship between two or more variables.
A simple regression model includes only two variables: one independent
and one dependent. The dependent variable is the one being explained,
and the independent variable is the one used to explain the variation in
the dependent variable.
While the result of multiple regression is an equation that represents the
best prediction of a dependent variable from several independent
variables.
Regression analysis is used when independent variables are correlated
with one another and with the dependent variable.
Independent variables can be either continuous or categorical. However,
in the latter case these variables must be coded as dummy variables.
In contrast, the dependent variable must be measured on a continuous
scale. If the dependent variable is not continuous, the discriminant
function analysis is appropriate.
The purpose of the regression model is to enable the researcher to see the
trend and make predictions on the basis of the data.

2.0 Linear regression analysis
Analysis of the strength of the linear relationship between independent
variables and dependent variable.

MULTIPLE REGRESSION: y = b0 + b1X1 + b2X2

Expected value
of y (outcome)
Intercept
Term Coefficient
Predictor
variable
3

3.0 Fitting regression model using Microsoft Excel

1. Consider the following data relating family size and income to food expenditures.

2. Click File Options Excel Options

4

3. In the Excel Options, click Add-ins.
In Add-ins box, select Analysis ToolPak and click Go.

4. In the Add-ins available box, check the Analysis ToolPak and then OK.
If Analysis ToolPak is not listed in the Add-ins available box, click Browse to locate it.

5

5. Click Data Data Analysis Regression OK

6. The pop-up input dialog box is shown below.

The Input Y range refers to the spreadsheet cells containing the dependent variable y
(Food)
The Input X range to those containing independent x (Income and Family Size)

6

7. Regression output has the following format:

= 1.118 +0.148Income +0.793Family size

This table gives the beta coefficients. Based on this table, the equation for the regression line is:

y = b0 + b1X1 + b2X2

7

4.0 Fitting regression model using SPSS

1. Open the IBM SPSS Statistics File Open Desktop 2Sept2014_Workshop
Multiple Regression Data

2. Click Analyze Regression Linear

3. In the main dialog box, input the dependent variable and independent variables. In this
case, we want to predict food expenditure (Food$). We are going to use two independent
variables: Income$ and FamilySize.

Leave this drop-down menu set to the default value (Enter).
8

4. Click on the Statistics button to view dialog box. Then Click Continue OK.

Check this box to get descriptive statistics for the different variables in the equation.

5. Heres the output

These are the descriptive statistics, based on the option that we selected.
Correlations

Food $ Income $ Family Size
Pearson Correlation Food $ 1.000 .946 .787
Income $ .946 1.000 .676
Family Size .787 .676 1.000
Sig. (1-tailed) Food $ . .000 .000
Income $ .000 . .001
Family Size .000 .001 .
N Food $ 20 20 20
Income $ 20 20 20
Family Size 20 20 20
The Descriptive command also gives a correlation matrix, showing the Pearson
correlation between the variables (in the top part of the table).
Descriptive Statistics

Mean Std. Deviation N
Food $ 7.965000 4.6642284 20
Income $ 45.50 23.955 20
Family Size 2.95 1.605 20
9

Model Summary
Model R
R Square Adjusted R
Square
Std. Error of the
Estimate
1 .967
a
.935 .927 1.2610135
a. Predictors: (Constant), Family Size, Income $

The summary table tells what % of variability in the dependent variable is accounted for by all of
the independent variables together. The footnote on this table tells which variables were
included in this equation.

ANOVA
b

Model Sum of Squares df Mean Square F Sig.
1 Regression 386.313 2 193.156 121.470 .000
a

Residual 27.033 17 1.590

Total 413.346 19

a. Predictors: (Constant), Family Size, Income $
b. Dependent Variable: Food $

This table gives an F-test to determine whether the model is a good fit to the data. According to
this p-value, it is.

Coefficients
a

Model
Unstandardized Coefficients
Standardized
Coefficients
t Sig. B Std. Error Beta
1 (Constant) -1.118 .655

-1.708 .106
Income $ .148 .016 .761 9.049 .000
Family Size .793 .244 .273 3.245 .005
a. Dependent Variable: Food $

This table gives the beta coefficients. Based on this table, the equation for the regression
line is :

= 1.118 +0.148Income +0.793Family Size

10

5.0 Fitting regression model using SAS

Creating a Project & Importing Data into SAS Format.
1. Open the SAS 9.3
2. Click Solution > Analysis > Enterprise Miner.
3. Select File > New > Project. Create a new project by giving a name (Regression) and specify the
location of the file and then select Create.
4. Rename the diagram by right click onto the diagram > select rename you can type the name of
the diagram (multiplereg).
5. Click File > Import Data

6. Tick the Standard data source and make sure that in the dropdown list, you choose the
Microsoft Excel Workbook(*.xls *.xlsb *.xlsm *.xlsx) and Click Next.

11

7. Browse the location of the Excel Workbook (foodexpenditure.xlsx) > Choose the
foodexpenditure.xlsx data > Click Open and OK.

8. Choose the Sheet1$ table from the list down menu and Click Next

9. Choose the EMDATA Library from the list down menu.

12

10. Give a new name as foodex in the Member box and Click Finish

Setting Up the Input Data Source Node
11. To add the foodexpenditure data set into the multiplereg diagram, click and drag the Input
Data Source node into the workspace

13

12. To open the Input Data Source node, Double-Click the Input Data Source node and a new
window will open.

13. To Load the foodex data set, click Select and Browse the foodex data set in Library :
Emdata and Click OK

14

14. The Input Data Source node will display the information of the foodex data set. Change the
Metadata sample to Use complete data as a sample

Building the Regression Model and Interpreting the Results
15. Open the food data set using the Input Data Source node. Click the Variable tab. Set the role
for FAMILY to ID, for FOOD to Target, and for INCOME and FAMILY SIZE to Input.

15

16. Close the Input Data Source node and SAVE

17. Add the Regression node to the diagram workspace as shown below.

16

18. Double click the Regression node the examine the property of the foodex data set

19. Click the Model Option tab to view the regression type

# Because the target variable has a continuous variable, the regression node will perform
a multiple liner regression analysis.

20. Click the Selection Method tab to choose the suitable regression methods. For this analysis we
just maintained the defaulted setting.

17

21. Click on the Output tab and tick the Training, Validation, and Test box for Process or Score. Save
changes to yes.

22. Run the Regression node

23. Save and name the regression model

18

24. The regression result using SAS Enterprise Miner

19

6.0 Regression Results
EXCEL OUTPUT

The equation for the regression line is:
= 1.118 + 0.148Income + 0.793Family Size

SPSS OUTPUT
Coefficients
a

Model
Unstandardized Coefficients
Standardized
Coefficients
t Sig. B Std. Error Beta
1 (Constant) -1.118 .655

-1.708 .106
Income $ .148 .016 .761 9.049 .000
Family Size .793 .244 .273 3.245 .005
a. Dependent Variable: Food $

= 1.118 + 0.148Income + 0.793Family Size

SAS OUTPUT

y=-1.118+0.148(Income)+0.793(Family Size)
Coefficients
Standard
Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept -1.11829469 0.654852438 -1.70770486 0.105886403 -2.499912564 0.263323184 -2.499912564 0.263323184
X Variable 1 0.148211726 0.016378632 9.049090678 6.55886E-08 0.113655833 0.182767618 0.113655833 0.182767618
X Variable 2 0.793105484 0.244441129 3.244566438 0.004767364 0.277379782 1.308831186 0.277379782 1.308831186
20

7.0 How to handle missing values
1. By using the same project Regression, create a new diagram by right click > select new diagram
> give a name to the new diagram (missing)

2. Click File > Import Data

3. Tick the Standard data source and make sure that in the dropdown list, you choose the
Microsoft Excel Workbook(*.xls *.xlsb *.xlsm *.xlsx) and Click Next.

21

4. Browse the location of the Excel Workbook (foodexpmissing.xlsx) > Choose the
foodexmissing.xlsx data > Click Open and OK.

5. Choose the Sheet1$ table from the list down menu and Click Next

6. Choose the EMDATA Library from the list down menu.

7. Give a new name as foodmissing in the Member box and Click Finish
22

Setting Up the Input Data Source Node
8. To add the foodexpmissing data set into the missing diagram, click and drag the Input Data
Source node into the workspace

9. To open the Input Data Source node, Double-Click the Input Data Source node and a new
window will open.

10. To Load the foodexpmissing data set, click Select and Browse the foodmissing data
set in Library : Emdata and Click OK

23

11. The Input Data Source node will display the information of the foodexpmissing data set.
Change the Metadata sample to Use complete data as a sample

12. Open the food data set using the Input Data Source node. Click the Variable tab. Set the role for
FAMILY to ID, for FOOD to Target, and for INCOME and FAMILY SIZE to Input.

13. Click the interval variable tab. You can investigate the number of levels, percentage of missing
values 20% at variables income. Save changes to yes.

24

14. Add a Replacement node by dragging the node form the tools tab into the diagram workspace
and connect to the Input Data Source node.

15. Open the Replacement node by double click the node > Select Imputation Methods tab > click
list down menu and select distribution-based for interval. You also can try other methods for
handling missing values. Save changes to yes. Right click replacement node > select Run

25

16. View the results. You will get the complete data set.

Fitting Regression Model Using Statistical Tools

Hochgeladen von

Dokumentinformationen

Copyright

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Fitting Regression Model Using Statistical Tools

Hochgeladen von

Copyright:

Jastini Mohd Jamil

Das könnte Ihnen auch gefallen