Beruflich Dokumente
Kultur Dokumente
Meister
meister_hilmi@yahoo.com
Table of Contents
1.0 1.1 1.2 a) b) c) 2.0 2.1 2.1.1 2.1.2 2.2 2.2.1 2.2.2 2.2.3 2.3 2.3.1 2.3.2 2.3.3 2.3.4 2.3.5 2.3.6 2.3.7 2.4 2.4.1 3.0 3.1 3.2 4.0 TASK 1 ............................................................................................................................................... 3 Introduction .................................................................................................................................. 3 SAS Macro Programming .............................................................................................................. 5 Substituting text with %LET .............................................................................................................. 5 Creating Modular Code with Macros ................................................................................................ 7 Adding Parameters to Macros .......................................................................................................... 8 TASK 2 ............................................................................................................................................... 9 Introduction .................................................................................................................................. 9 Source of Dataset ...................................................................................................................... 9 Description of Dataset .............................................................................................................. 9 Analysis of Variance (ANOVA) ..................................................................................................... 10 Descriptive Analysis ................................................................................................................ 10 Analysis of Variance (ANOVA) ................................................................................................. 12 Report on Analysis of Variance (ANOVA) ................................................................................ 13 Regression Analysis ..................................................................................................................... 14 Model Adequacy Checking ...................................................................................................... 14 Regression Model ................................................................................................................... 16 Correlation Analysis ................................................................................................................ 18 Multicollinearity Test .............................................................................................................. 19 Stepwise Analysis .................................................................................................................... 20 Lack of Fit Test......................................................................................................................... 24 Report on Regression Analysis ................................................................................................ 26 Correlation Analysis using Macro ............................................................................................... 27 Report on Correlation Analysis ............................................................................................... 28
1.0 TASK 1
1.1 Introduction
The data is about Sales of Mini Supermarket. This data is about the study of sales of mini supermarket of different branch. There are all 16 observation from this data and 6 variables. The variables are Branch, State, Date, NumWorker, SaleYear and Manager. The variable description are shown in the table below.
Description List branch area in Malaysia Name of state of the branch Date of the branch been open Number of worker for each branch Number of sales per year Name of manager in charge of the branch
libname indi 'C:\Users\Meister\Documents\SAS\individual project'; data indi.task; infile 'C:\Users\Meister\Documents\SAS\individual project\data\task1.txt'; input Branch $ 1-16 State $ 17-32 Date 33-48 NumWorker 49-56 SaleYear 57-65 ManagerName $ 66-75; run; proc print data=indi.task; title 'Meister Supermarket Sales'; format SaleYear dollar9. Date date9.; run;
The data:
2.0 TASK 2
2.1 Introduction 2.1.1 Source of Dataset
We are taking our data from database from New York University Stern (NYU Stern). The title of the data is Movie Buzz Data
The 6 qualitative variables are MPRating, Sequel, Action, Comedy, Animated and Horror. Variable MPRating mean MPAA rating code. Sequel stand for sequel movie. Variable Action is the variable for action movie. Comedy is the variable for comedy movie. The variable Animated is the variable for animated movie. The last variable Horror is the variable for horror movie.
Descriptions MPAA Rating where code 1=G (general) 2=PG (parental guide) 3=PG13 (parental guide and may not appropriate under 13 years old) 4=R (Restricted) Sequel movie where code 1=sequel 2=not sequel Action movie where code 1=action film 2=not action film Comedy movie where code 1=comedy film 2=not comedy film Animated movie where code 1=animated film 2=not animated film Horror movie where code 1=horror film 2=not horror film
10
Based on Kolmogorov-Smirnov analysis, the p-value is 0.15 where the value is greater than alpha 0.05. We can conclude that the variable horror is normal. 2. Normality analysis for variable comedy
Based on Kolmogorov-Smirnov analysis, the p-value is 0.15 where the value is greater than alpha 0.05. We can conclude that the variable comedy is normal. 3. Normality analysis for variable animated
Based on Kolmogorov-Smirnov analysis, the p-value is 0.15 where the value is greater than alpha 0.05. We can conclude that the variable animated is normal. 4. Normality analysis for variable action
Based on Kolmogorov-Smirnov analysis, the p-value is 0.0851 where the value is greater than alpha 0.05. We can conclude that the variable action is normal.
11
Based on the analysis of variance, the p-value is 0.5102 where this value is greater than alpha 0.05. we can conclude that the sales does not have relationship with the types of movie.
12
13
14
P-value of Kolmogorov-Smirnov less than P-value of Kolmogorov-Smirnov more than alpha. We can conclude that the distribution alpha. We can conclude that the distribution is not normal. Transformation needed is normal. Transformation succeed.
15
The model is significant since the p-value = 0.0001 less than alpha 0.05
16
Regression model Sales = 15.25374 0.21222MPRating + 0.00515Budget 0.00471StarPowr + 0.39390Sequel 0.74909Action 0.00164Comedy 0.82118Animated + 0.43770Horror + 0.00002216Addict 0.00013618CmngSoon + 0.00020521Fandango + 3.29154CntWait
17
The correlation matrix above shows the coefficient of Pearson correlation between quantitative variables in the data set. There is 5 predictor variables that are significant correlated with the dependent variable (Sales). That is BUDGET, ADDICT, CMGSOON, FANDAGO and CNTWAIT with correlation 0.45708, 0.43750, 0.0133, 0.37974 and 0.65501 respectively. Some of the predictor variables also shows a correlation exist among them. BUDGET and STARPOWR have a significant correlated and this indicates that a movie with high STARPOWR tend to increase the budget to make the movie.
18
Variables
MPAA Rating Budget StarPower Sequel Action Comedy Animated
Interpretation of Collinearity Statistics Tolerance VIF No multicollinearity since tolerance value No multicollinearity since VIF value 0.602 more than 0.2 1.660 less than 10 No multicollinearity since tolerance value No multicollinearity since VIF value 0.401 more than 0.2 2.495 less than 10 No multicollinearity since tolerance value No multicollinearity since VIF value 0.547 more than 0.2 1.827 less than 10 No multicollinearity since tolerance value No multicollinearity since VIF value 0.544 more than 0.2 1.837 less than 10 No multicollinearity since tolerance value No multicollinearity since VIF value 0.498 more than 0.2 2.009 less than 10 No multicollinearity since tolerance value No multicollinearity since VIF value 0.515 more than 0.2 1.940 less than 10 No multicollinearity since tolerance value No multicollinearity since VIF value 0.523 more than 0.2 1.912 less than 10
19
No multicollinearity since tolerance value 0.617 more than 0.2 No multicollinearity since tolerance value 0.465 more than 0.2 No multicollinearity since tolerance value 0.376 more than 0.2 No multicollinearity since tolerance value 0.570 more than 0.2 No multicollinearity since tolerance value 0.440 more than 0.2
No multicollinearity since VIF value 1.621 less than 10 No multicollinearity since VIF value 2.150 less than 10 No multicollinearity since VIF value 2.660 less than 10 No multicollinearity since VIF value 1.754 less than 10 No multicollinearity since VIF value 2.273 less than 10
20
1. Steps of stepwise
Step 1
Step 2
1. 2.
Variable CNTWAIT have the higher 1. correlation value. Which is 0.4290. Thus the variable is selected as the first 2. variable to be enter to the model. 3.
Variable ACTION have the higher correlation value. Which is 0.4856. Thus the variable is selected to be enter to the model. The two variable tested and no variable are deleted.
21
Step 3
Step 4
1. 2. 3.
Variable ADDICT have the higher 1. correlation value. Which is 0.5217. Thus the variable is selected to be enter 2. to the model. The three variable tested and no variable 3. are deleted.
Variable SEQUEL have the higher correlation value. Which is 0.5420. Thus the variable is selected to be enter to the model. The four variable tested and no variable are deleted.
22
Based on 12 variable 4 are selected for the final model. All the selected variables are CNTWAIT, ADDICT, SEQUEL and ACTION. Therefore the final model is y = 14.56468 + 0.41590SEQUEL 0.69464ACTION + 0.00002895ADDICT 3.81397CNTWAIT Here the test of the model Hypothesis H0: 1=2=3=4 =0 H1: at least one i is not equal zero Significant value =0.05 Test statistic P value=0.0001 Decision Since P-value=0.0001 < =0.05 to reject H0. Conclusion The model is significant.
23
24
Full model
Hypothesis H0: There is no lack of fit H1: There is lack of fit Significant value =0.05 Test statistic P value=0.0001 Decision Since P-value=0.0001 < =0.05 to reject H0. Conclusion The model have lack of fit.
Hypothesis H0: There is no lack of fit H1: There is lack of fit Significant value =0.05 Test statistic P value=0.0001 Decision Since P-value=0.0001 < =0.05 to reject H0. Conclusion The model have lack of fit.
25
26
27
28
3.0 TASK 3
3.1 Question 1
Coding
libname indi 'C:\Users\Meister\Documents\SAS\individual project'; proc import out = indi.project datafile = "C:\Users\Meister\Documents\SAS\individual project\data\country.txt" dbms = tab replace; run; proc print data = indi.project; run;
Partial output
29
3.2 Question 2
a) Total population of the world SAS Command:
proc sql; select sum(population) as TotalPopulation from indi.project; quit;
Result:
Result:
30
Result:
Result:
Result:
31
Result:
Result:
32
Result:
Result:
33
j) List of countries per capita with population more than 200 million SAS Command:
proc sql; title 'List of Countries per Capita'; title2 'Population more than 200 million'; select name, (gdp/population) as capita from indi.project where population ge 200000000; quit;
Result:
Result:
34
Result:
35
Result:
Result:
36
Result:
37
Result:
38
4.0 References
Books: Douglas C. Montgomery, Elizabeth A. Peck and G. Geoffrey Vining [2006], Introduction to Linear Regression Analysis Michael H. Kutner, Christopher J. Nachtsheim, John Neter and William Li, Applied Linear Statistical Models
39