Sie sind auf Seite 1von 56

Statistics DGGB 6820: Summer 2010

Excel Techniques Data Analysis


Wei Wang Graduate Assistant goldblade84@hotmail.com YouTube Channel: www.youtube.com/user/fordhamstats Tutoring Hours: Tues, Wed, Thurs: 8pm 10pm (By Appointment) Ive decided to shift the tutoring to evening hours, but only by appointment. Please e-mail me 48 hours before the desired date. However, I suggest sending questions to me via E-mail as a faster alternative.

Installing the Data Analysis Package:

Office 2003: How to check if you have it: o If [Data Analysis] is in your [Tools] menu, then you have it. Installing the Data Analysis Toolpak: o On the [Tools] menu, click [Add-Ins]. o In the Add-Ins available box, select the check box next to *Analysis Toolpak+ and *Analysis Toolpak VBA+ o If you see a message that tells you the Analysis Toolpak is not currently installed on your computer, click [ o Click [Tools] on the menu bar. When you load the Analysis Toolpak, the [Data Analysis] command is added o Note: You may need the internet or the Office Install CD to download or install this package. Office 2007: How to check if you have it: o In the Ribbon (the tabs on the top) click: [Data] On the Data Ribbon check for a button called [Data Analysis] under a box with the label Analysis. ( If you have it, ignore this section, if you do not have it, read on Installing the Data Analysis ToolPak: o Click the top left [Office Button] o Click the button on the bottom called: [Excel Options] o Click the Tab on the left side called [Add-Ins] o Make sure the selection next to the *Go+ button on the bottom says *Excel Add-Ins+ o Click *GO+ at the bottom o Check Box for *Analysis ToolPak+ and *Analysis ToolPak VBA+ o Click [Ok] o Note: You may need the internet or the Office Install CD to download or install this package. Office 2008 - Mac: How to check if you have it: o Um, you dont have to check cause I know you dont have it! o Aha! Macs arent perfect lol (Im a PC, can you tell?) StatPlus LE: o Instead you go here: http://www.analystsoft.com/en/products/statplusmacle/ o Download the free LE version of the software. o The LE version should do everything we need for the purpose of this class.

Note: Im not going to cover the use of the StatPlus LE program, it is 3rd party, but more importantly, I dont have a Mac to pl

so read the instructions and figure it out. It is supposed to be pretty much the same thing as the PC version but in a different SPSS: -

The following link is to the $35 6 month student license of the most basic version which is (as far as I can see, it h http://e5.onthehub.com/WebStore/OfferingDetails.aspx?ws=49c547ba-f56d-dd11-bb6c-0030485a6b08&vsro

pak+ and *Analysis Toolpak VBA+ (if its there), and then click OK. nstalled on your computer, click [Yes] to install it. [Data Analysis] command is added to the [Tools] menu. install this package.

der a box with the label Analysis. (usually the furthermost right of all the boxes)

xcel Add-Ins+

install this package.

portantly, I dont have a Mac to play around with the damn thing

s the PC version but in a different program.

on which is (as far as I can see, it has all that you need for this class). d-dd11-bb6c-0030485a6b08&vsro=8&o=e9fcd8b9-15c3-de11-886d-0030487d8897

Statistics DGGB 6820 - Excel Techniques (Summer 2010)


Sheet 1 - In Cell Functions
Wei Wang - Graduate Assistant - goldblade84@hotmail.com YouTube: www.youtube.com/user/fordhamstats Dataset 1 4 2 2 6 7 7 8 9 3 Z-Score In Cell Function Count Mean Median Mode Percentile Quartile Sample STDEV Population STDEV Standardize Sum Sample VAR Population VAR Result

Statistics DGGB 6820 - Excel Techniques (Summer 2010)


Sheet 2 - In Cell Functions - Results
Wei Wang - Graduate Assistant - goldblade84@hotmail.com YouTube: www.youtube.com/user/fordhamstats Dataset 1 4 2 2 6 7 7 8 9 3 Z-Score -1.37032 -0.31623 -1.01896 -1.01896 0.386501 0.737865 0.737865 1.089229 1.440593 -0.66759 In Cell Function Count Mean Median Mode Percentile Quartile Sample STDEV Population STDEV Standardize Sum Sample VAR Population VAR Result 10 4.9 5 2 5 7 2.846 2.7 -1.37 49 8.1 7.29

Statistics DGGB 6820 - Excel Techniques (Summer 2010)


Sheet 3 - Descriptive Statistics
Wei Wang - Graduate Assistant - goldblade84@hotmail.com YouTube: www.youtube.com/user/fordhamstats Describe Dataset A, and find the Z score for each value of the Dataset Dataset A Z-Score 1 4 2 2 6 7 7 8 9 3

Statistics DGGB 6820 - Excel Techniques (Summer 2010)


Sheet 4 - Descriptive Statistics - Results
Wei Wang - Graduate Assistant - goldblade84@hotmail.com YouTube: www.youtube.com/user/fordhamstats Describe Dataset A, and find the Z score for each value of the Dataset Dataset A 1 4 2 2 6 7 7 8 9 3 Z-Score -1.37032 -0.31623 -1.01896 -1.01896 0.386501 0.737865 0.737865 1.089229 1.440593 -0.66759 Dataset A Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count 4.9 0.9 5 2 2.846049894 8.1 -1.672146377 0.017351318 8 1 9 49 10

Statistics DGGB 6820 - Excel Techniques (Summer 2010)


Sheet 5 - Histogram
Wei Wang - Graduate Assistant - goldblade84@hotmail.com YouTube: www.youtube.com/user/fordhamstats Create a Histogram of Dataset A, what can you tell me based on the Histogram? Dataset A 1 3 2 4 5 6 5 8 9 10 12 23 21 15 24 33 19 29 56 Bin 1 6 11 16 21 26 31 36 41 46 51 56 Bin2 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58

Statistics DGGB 6820 - Excel Techniques (Summer 2010)


Sheet 6 - Histogram - Results
Wei Wang - Graduate Assistant - goldblade84@hotmail.com YouTube: www.youtube.com/user/fordhamstats Create a Histogram of Dataset A, what can you tell me based on the Histogram? Dataset A 1 3 2 4 5 6 5 8 9 10 12 23 21 15 24 33 19 29 56 Bin 1 6 11 16 21 26 31 36 41 46 51 56 Bin2 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 Bin 1 6 11 16 21 26 31 36 41 46 51 56 More Bin2 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 More Frequency 1 6 3 2 2 2 1 1 0 0 0 1 0 Frequency 1 3 3 3 1 1 1 1 2 0 1 1 0 0 0 0 0 0 0 1 0

8 6 4 2 0 1 4 3 2 1 0

The Dataset is positively skewed Most of the data is between 0 an There is an outlier towards the r

Frequency

Frequency

Histogram

Frequency 1 6 11 16 21 26 31 36 41 46 51 56 More Bin

Histogram

Frequency

Bin2

The Dataset is positively skewed Most of the data is between 0 and 26 There is an outlier towards the right side of the distribution (56)

Statistics DGGB 6820 - Excel Techniques (Summer 2010)


Sheet 7 - Correlation Coefficient
Wei Wang - Graduate Assistant - goldblade84@hotmail.com YouTube: www.youtube.com/user/fordhamstats ` How strongly is Dataset A correlated to Dataset B, C, D, and E? Dataset A Dataset B Dataset C Dataset D Dataset E 1 1 10 1 9 2 2 9 2 7 3 3 8 3 6 4 4 7 4 8 5 5 6 2 5 6 6 5 5 3 7 7 4 6 5 8 8 3 3 2 9 9 2 9 3 10 10 1 8 1

Statistics DGGB 6820 - Excel Techniques (Summer 2010)


Sheet 8 - Correlation Coefficient - Results
Wei Wang - Graduate Assistant - goldblade84@hotmail.com YouTube: www.youtube.com/user/fordhamstats How strongly is Dataset A correlated to Dataset B, C, D, and E? Dataset A Dataset B Dataset C Dataset D Dataset E 1 1 10 1 9 2 2 9 2 7 3 3 8 3 6 4 4 7 4 8 5 5 6 2 5 6 6 5 5 3 7 7 4 6 5 8 8 3 3 2 9 9 2 9 3 10 10 1 8 1 Dataset A Dataset B 1 1 1 -1 -1 0.831954 0.831954 -0.90926 -0.90926 Perfect positive correlation Perfect negative correlation Strong positive correlation Strong negative correlation

Dataset A Dataset B Dataset C Dataset D Dataset E A and B A and C A and D A and E

Note: Dataset D and E have a very weak, negative corre

Dataset C Dataset D Dataset E

1 -0.83195 0.909262

1 -0.65672

sitive correlation gative correlation itive correlation ative correlation have a very weak, negative correlation

Statistics DGGB 6820 - Excel Techniques (Summer 2010)


Sheet 9 - t-Test: Single Sample
Wei Wang - Graduate Assistant - goldblade84@hotmail.com YouTube: www.youtube.com/user/fordhamstats Recently, School X published that the average GPA for the graduating class of 2009 was 3.6. Is there a significant difference between this year's GPA scores (Dataset A) and the class average of last year? Dataset A 2009 Avg. 4.0 3.6 2.0 3.6 3.0 3.6 4.0 3.6 3.2 3.6 3.8 3.6 3.5 3.6 3.4 3.6 3.4 3.6 4.0 3.6

Statistics DGGB 6820 - Excel Techniques (Summer 2010)


Sheet 10 - t-Test: Single Sample - Results
Wei Wang - Graduate Assistant - goldblade84@hotmail.com YouTube: www.youtube.com/user/fordhamstats Recently, School X published that the average GPA for the graduating class of 2009 was 3.6. Is there a significant difference between this year's GPA scores (Dataset A) and the class average of last year? Dataset A 2009 Avg. 4.0 3.6 2.0 3.6 3.0 3.6 4.0 3.6 3.2 3.6 3.8 3.6 3.5 3.6 3.4 3.6 3.4 3.6 4.0 3.6 t-Test: Two-Sample Assuming Unequal Variances Dataset A 3.43 0.377888889 10 0 9 -0.874514189 0.202285205 1.833112923 0.404570411 2.262157158 2009 Avg. 3.6 2.19128E-31 10

Mean Variance Observations Hypothesized Mean Difference df t Stat P(T<=t) one-tail t Critical one-tail P(T<=t) two-tail t Critical two-tail

Because the absolute value of the t Stat is smaller than t Critical two-tail or Because the Probability that the null hypothesis is true is not smaller than Alpha Therefore, must affirm the Null Hypothesis that there is no statistical difference between Dataset A and 2009 Avg.

Statistics DGGB 6820 - Excel Techniques (Summer 2010)


Sheet 11 - t-Test: Paired Two Sample
Wei Wang - Graduate Assistant - goldblade84@hotmail.com YouTube: www.youtube.com/user/fordhamstats

Sales Manager of Company X decides to try a new training program for the sales representatives. Compare the average customer satisfaction score for each sales rep. before and after the training program Is there a statistically significant difference? And if so, did the program improve the quality of service these sales reps prov Before 4.2 6.3 5.7 4.8 3.5 3.2 4.4 5.2 3.9 4.3 After 7.3 8.2 5.9 6.4 6.3 5.7 8.2 6.4 5.1 4.1

training program y of service these sales reps provided?

Statistics DGGB 6820 - Excel Techniques (Summer 2010)


Sheet 12 - t-Test: Paired Two Sample - Results
Wei Wang - Graduate Assistant - goldblade84@hotmail.com YouTube: www.youtube.com/user/fordhamstats Sales Manager of Company X decides to try a new training program for the sales representatives. Compare the average customer satisfaction score for each sales rep. before and after the training program Is there a statistically significant difference? And if so, did the program improve the quality of the customer service? Before 4.2 6.3 5.7 4.8 3.5 3.2 4.4 5.2 3.9 4.3 After 7.3 8.2 5.9 6.4 6.3 5.7 8.2 6.4 5.1 4.1 t-Test: Paired Two Sample for Means Before Mean Variance Observations Pearson Correlation Hypothesized Mean Difference df t Stat P(T<=t) one-tail t Critical one-tail P(T<=t) two-tail t Critical two-tail After

4.55 6.36 0.936111111 1.667111111 10 10 0.396685546 0 9 -4.507970748 0.000735997 1.833112923 0.001471993 2.262157158

Because the absolute value of the t Stat is greater than t Critical two-tail or Because the Probability that the null hypothesis is true is smaller than Alpha Therefore, we can reject the Null Hypothesis that there is no statistical difference between the two datasets Yes, there is a significant difference between the before and after scores Yes, the training program improved the quality of the customer service

Statistics DGGB 6820 - Excel Techniques (Summer 2010)


Sheet 13 - t-Test: Two Sample
Wei Wang - Graduate Assistant - goldblade84@hotmail.com YouTube: www.youtube.com/user/fordhamstats You are a district manager for a line of products. You have collected sales figures from Area A and area B Is there a significant difference between the sales in Area A and Area B? If so, which area is doing better in Sales? And is there any reason why? Product # Area A Area B 1 $30,000 $28,000 2 $22,000 $17,000 3 $15,000 $11,000 4 $28,000 $24,000 5 $44,000 $43,000 6 $38,000 $37,000 7 $17,000 $15,000 8 $18,000 $98,000 9 $30,000 $130,000 10 $15,000 $150,000

Statistics DGGB 6820 - Excel Techniques (Summer 2010)


Sheet 14 - t-Test: Two Sample - Results
Wei Wang - Graduate Assistant - goldblade84@hotmail.com YouTube: www.youtube.com/user/fordhamstats You are a district manager for a line of products. You have collected sales figures from Area A and area B Is there a significant difference between the sales in Area A and Area B? If so, which area is doing better in Sales? And is there any reason why? Product # Area A Area B 1 $30,000 $28,000 2 $22,000 $17,000 3 $15,000 $11,000 4 $28,000 $24,000 5 $44,000 $43,000 6 $38,000 $37,000 7 $17,000 $15,000 8 $18,000 $98,000 9 $30,000 $130,000 10 $15,000 $150,000 t-Test: Two-Sample Assuming Equal Variances Area A 25700 100677777.8 10 1363455556 0 18 -1.792487849 0.044939316 1.734063592 0.089878633 2.100922037 Area B 55300 2626233333 10

Mean Variance Observations Pooled Variance Hypothesized Mean Difference df t Stat P(T<=t) one-tail t Critical one-tail P(T<=t) two-tail t Critical two-tail

Based on the Two Tail, there is no significant difference. Since the comparison has no significant difference, neither area is doing much better than the other.

However, notice that Area B has 3 outlyers at the bottom of (98k, 130k, and 150k) So do consider the actual data and not only the anlysis when you're making decisions.

Statistics DGGB 6820 - Excel Techniques (Summer 2010)


Sheet 15 - ANOVA - Single Factor
Wei Wang - Graduate Assistant - goldblade84@hotmail.com YouTube: www.youtube.com/user/fordhamstats You are a pharmaceutical company, and you're testing the efficacy of a new drug. You bring in 18 people for this test, 6 uses your drug (Drug X), 6 uses the competition (Drug Y), and 6 uses a placebo (Drug The drugs are rated from 1 to 10, with 10 being very effective What can you conclude from the findings? How well might your drug perform? Drug X 6 8 7 4 8 9 Drug Y 5 3 2 1 4 3 Drug Z 2 1 3 2 1 1

and 6 uses a placebo (Drug Z).

Statistics DGGB 6820 - Excel Techniques (Summer 2010)


Sheet 16 - ANOVA - Single Factor - Results
Wei Wang - Graduate Assistant - goldblade84@hotmail.com YouTube: www.youtube.com/user/fordhamstats You are a pharmaceutical company, and you're testing the efficacy of a new drug. You bring in 18 people for this test, 6 uses your drug (Drug X), 6 uses the competition (Drug Y), and 6 uses a placebo (Drug The drugs are rated from 1 to 10, with 10 being very effective What can you conclude from the findings? How well might your drug perform in the market? Drug X 6 8 7 4 8 9 Drug Y 5 3 2 1 4 3 Drug Z 2 1 3 2 1 1 Anova: Single Factor SUMMARY Groups Drug X Drug Y Drug Z

Count 6 6 6

Sum

Average 42 7 18 3 10 1.666666667

ANOVA Source of Variation Between Groups Within Groups Total

SS 92.44444444 29.33333333 121.7777778

df

MS 2 46.22222222 15 1.955555556 17

The F value is well above the F Critical value. The P-value reflects this finding by being significantly smaller than 0.05. Thus we reject the Null Hypothesis, and conclude that there is a significant dif

Although both drugs performed better than the placebo, Drug X clearly perfor It is a good idea at this time to perform a t-Test: Two Sample between Drug X But based on the ANOVA test, I feel safe in saying that Drug X will outperform

d 6 uses a placebo (Drug Z).

Variance 3.2 2 0.666666667

F 23.63636364

P-value F crit 2.30914E-05 3.682320344

y smaller than 0.05. at there is a significant difference between the groups.

ebo, Drug X clearly performed better than Drug Y. Sample between Drug X and Drug Y. at Drug X will outperform Drug Y and will be a big hit on the market.

Statistics DGGB 6820 - Excel Techniques (Summer 2010)


Sheet 17 - ANOVA - Two Factor without Replication
Wei Wang - Graduate Assistant - goldblade84@hotmail.com YouTube: www.youtube.com/user/fordhamstats You are a Drill Sergeant for Phase Two of basic training for the Marines on the East Coast. You've trained the new recruits and they are now being tested in three events. Swim Qualification (SQ), Rifle Qualification (RQ), and Team Qualification (TQ). Each recruit is tested in all three events, and all events were scored out of 35. What can you conclude from the results? Recruit 1 2 3 4 5 6 SQ 19 18 25 20 17 21 RQ 8 10 10 18 7 16 TQ 21 31 26 28 14 24

Statistics DGGB 6820 - Excel Techniques (Summer 2010)


Sheet 18 - ANOVA - Two Factor without Replication - Results
Wei Wang - Graduate Assistant - goldblade84@hotmail.com YouTube: www.youtube.com/user/fordhamstats You are a Drill Sergeant for Phase Two of basic training for the Marines on the East Coast. You've trained the new recruits and they are now being tested in three events. Swim Qualification (SQ), Rifle Qualification (RQ), and Team Qualification (TQ). Each recruit is tested in all three events, and all events were scored out of 35. What can you conclude from the results? Recruit 1 2 3 4 5 6 SQ 19 18 25 20 17 21 RQ 8 10 10 18 7 16 TQ 21 31 26 28 14 24 Anova: Two-Factor Without Replication SUMMARY 1 2 3 4 5 6 SQ RQ TQ Count 3 3 3 3 3 3 6 6 6 Sum 48 59 61 66 38 61 120 69 144

ANOVA Source of Variation Rows Columns Error Total

SS 181.8333 489 135.6667 806.5

df 5 2 10 17

Recruit 4 was the best performing recruit within the group. However, there is no statistically significant difference between th There is however, a statistically significant difference between the Looking at the test means, we can see that the Rifle Qualification's Thus, either the Rifle Qualification is too hard, or the recruits are b

Average 16 19.66667 20.33333 22 12.66667 20.33333 20 11.5 24

Variance 49 112.3333333 80.33333333 28 26.33333333 16.33333333 8 19.9 35.6

MS F P-value F crit 36.36667 2.680589681 0.086617815 3.325834529 244.5 18.02211302 0.000483197 4.102821015 13.56667

cruit within the group. ificant difference between the recruits (as indicated by the data for Rows) icant difference between the scores of the three tests. e that the Rifle Qualification's scores is much lower than the other two. too hard, or the recruits are better suited to be Life Guards rather than Marines.

Statistics DGGB 6820 - Excel Techniques (Summer 2010)


Sheet 19 - ANOVA - Two Factor with Replication
Wei Wang - Graduate Assistant - goldblade84@hotmail.com YouTube: www.youtube.com/user/fordhamstats You are the Guy who bosses the Drill Sergeant around. You are comparing the East Coast basic training program, and the West Coast basic training program for the Marines. You're looking at Phase Two of the basic training program scores for Swimming Qualification (SQ), Rifle Qualification (RQ) What can you conclude from the results? Group East SQ 19 18 25 20 17 21 15 26 17 14 17 13 RQ 8 10 10 18 7 16 23 32 26 16 27 18 TQ 21 31 26 28 14 24 31 31 26 20 14 9

West

aining program for the Marines. fication (SQ), Rifle Qualification (RQ) and Team Qualification (TQ).

Statistics DGGB 6820 - Excel Techniques (Summer 2010)


Sheet 20 - ANOVA - Two Factor with Replication - Results
Wei Wang - Graduate Assistant - goldblade84@hotmail.com YouTube: www.youtube.com/user/fordhamstats You are the Guy who bosses the Drill Sergeant around. You are comparing the East Coast basic training program, and the West Coast basic training program for the Marines. You're looking at Phase Two of the basic training program scores for Swimming Qualification (SQ), Rifle Qualification (RQ) What can you conclude from the results? Group East SQ 19 18 25 20 17 21 15 26 17 14 17 13 RQ 8 10 10 18 7 16 23 32 26 16 27 18 TQ 21 31 26 28 14 24 31 31 26 20 14 9 Anova: Two-Factor With Replication SUMMARY
East

SQ 6 120 20 8
West

West

Count Sum Average Variance

Count Sum Average Variance


Total

6 102 17 22

Count Sum Average Variance

12 222 18.5 16.09090909

ANOVA Source of Variation Sample Columns Interaction Within Total

SS 49 195.1666667 436.1666667 1019.666667 1700

East or West by itself does not affect the test scores, in other word There is no significant difference when only considering East or W There is no significant difference when only considering the differe There is however a significant difference when considering East an

We can conclude that West Coast basic training will produce bette We can now conclude that the Rifle Qualification test wasn't too h

aining program for the Marines. fication (SQ), Rifle Qualification (RQ) and Team Qualification (TQ).

RQ 6 69 11.5 19.9

TQ

Total 6 18 144 333 24 18.5 35.6 47.44117647

6 6 18 142 131 375 23.66666667 21.83333333 20.83333333 35.46666667 82.96666667 49.67647059

12 12 211 275 17.58333333 22.91666667 65.53787879 55.17424242

df

MS F P-value F crit 1 49 1.441647597 0.239267892 4.170876757 2 97.58333333 2.871036286 0.072297426 3.315829501 2 218.0833333 6.41631252 0.004788489 3.315829501 30 33.98888889 35

ot affect the test scores, in other words nce when only considering East or West by itself (as shown by the results from the Sample) nce when only considering the different test types (as shown by the results from the Column) t difference when considering East and West and its relationship to the test types (as shown by the results from the Interaction)

oast basic training will produce better Marines, also based on the last test (ANOVA w/o Rep) e Rifle Qualification test wasn't too hard, and that the East Coast recruits need to improve their skills or become Life Guards.

rom the Interaction)

ome Life Guards.

Statistics DGGB 6820 - Excel Techniques (Summer 2010)


Sheet 21 - Regression
Wei Wang - Graduate Assistant - goldblade84@hotmail.com YouTube: www.youtube.com/user/fordhamstats You just got your GMAT score back and it says 510 (sad face). But because you're just that awesome, you got accepted into Fordham GBA's MBA program (happy face). You found some data about GMAT scores and graduating GPA at Fordham GBA's MBA program. Based on your GMAT score, you'd like to predict what GPA you'll end up with. Student 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 GPA 3.25 3.90 4.00 3.33 2.87 4.00 2.76 3.91 3.55 3.40 2.44 3.85 3.22 3.44 3.60 GMAT 580 510 680 620 530 680 510 580 630 520 550 670 620 560 610

m (happy face).

Statistics DGGB 6820 - Excel Techniques (Summer 2010)


Sheet 22 - Regression - Results
Wei Wang - Graduate Assistant - goldblade84@hotmail.com YouTube: www.youtube.com/user/fordhamstats You just got your GMAT score back and it says 510 (sad face). But because you're just that awesome, you got accepted into Fordham GBA's MBA program (happy face). You found some data about GMAT scores and graduating GPA at Fordham GBA's MBA program. Based on your GMAT score, you'd like to predict what GPA you'll end up with. Student 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 GPA 3.25 3.90 4.00 3.33 2.87 4.00 2.76 3.91 3.55 3.40 2.44 3.85 3.22 3.44 3.60 GMAT 580 510 680 620 530 680 510 580 630 520 550 670 620 560 610 SUMMARY OUTPUT Regression Statistics Multiple R 0.559872976 R Square 0.31345775 Adjusted R Square 0.260646807 Standard Error 0.407902864 Observations 15 ANOVA df Regression Residual Total 1 13 14 SS 0.987571627 2.163001706 3.150573333

Intercept GMAT

Coefficients Standard Error 0.822980159 1.077158069 0.004426587 0.001816944

RESIDUAL OUTPUT Observation 1 2 3 4 5 6 7 8 9 10 11 12 Predicted GPA 3.390400794 3.080539683 3.833059524 3.567464286 3.169071429 3.833059524 3.080539683 3.390400794 3.611730159 3.124805556 3.257603175 3.788793651 Residuals -0.140400794 0.819460317 0.166940476 -0.237464286 -0.299071429 0.166940476 -0.320539683 0.519599206 -0.061730159 0.275194444 -0.817603175 0.061206349

13 14 15

3.567464286 3.301869048 3.523198413

-0.347464286 0.138130952 0.076801587

m (happy face).
4.50 4.00 3.50 3.00 2.50 2.00 1.50 1.00 0.50 0.00 500 550

GMAT Line Fit Plot

GPA

GPA Predicted GPA

600 GMAT

650

700

MS F Significance F 0.987571627 5.935469729 0.029975415 0.166384747

t Stat P-value 0.764029145 0.458493457 2.436281948 0.029975415

Lower 95% Upper 95% Lower 95.0% Upper 95.0% -1.504078366 3.150038684 -1.504078366 3.150038684 0.000501319 0.008351856 0.000501319 0.008351856

Based on the regression, I can conclude that because I have a GMAT score of 510, I will get roughly a 3.1 GPA However, because the R Square is low, and the Intercept P-value isn't statistically siginificant I have serious doubts as to how well this regression actually predicts the GPA outcome based on GMAT scor

Predicted GPA

510, I will get roughly a 3.1 GPA when I graduate. cally siginificant A outcome based on GMAT scores.

Statistics DGGB 6820 - Excel Techniques (Summer 2010)


Sheet 23 - Multiple Regression
Wei Wang - Graduate Assistant - goldblade84@hotmail.com YouTube: www.youtube.com/user/fordhamstats You are a MBA student taking a Stats class You have Final in two days, but there's an awesome party tomorrow night. You've gathered some previous data about a similar situation to help you do a little planning: 1. How many beers you had the night before (1 - 10) 2. How much sleep you got the night before the test (1 - 10) 3. How many coffees you drank before the test (1 - 5) and previous test scores on a 4.0 GPA scale. After doing the analysis, what might you do? GPA 3.25 3.90 4.00 3.33 2.87 4.00 2.76 3.91 3.55 3.40 2.44 3.85 3.22 3.44 3.60 Beers 4 2 1 5 6 2 8 2 3 3 10 3 4 4 3 Sleep 6 7 8 10 6 9 5 7 7 6 4 8 6 7 6 Coffee 2 1 1 5 3 1 1 2 3 4 0 1 3 0 1

Statistics DGGB 6820 - Excel Techniques (Summer 2010)


Sheet 24 - Multiple Regression - Results
Wei Wang - Graduate Assistant - goldblade84@hotmail.com YouTube: www.youtube.com/user/fordhamstats You are a MBA student taking a Stats class You have Final in two days, but there's an awesome party tomorrow night. You've gathered some previous data about a similar situation to help you do a little planning: 1. How many beers you had the night before (1 - 10) 2. How much sleep you got the night before the test (1 - 10) 3. How many coffees you drank before the test (1 - 5) and previous test scores on a 4.0 GPA scale. After doing the analysis, what might you do? GPA 3.25 3.90 4.00 3.33 2.87 4.00 2.76 3.91 3.55 3.40 2.44 3.85 3.22 3.44 3.60 Beers 4 2 1 5 6 2 8 2 3 3 10 3 4 4 3 Sleep 6 7 8 10 6 9 5 7 7 6 4 8 6 7 6 Coffee 2 1 1 5 3 1 1 2 3 4 0 1 3 0 1 SUMMARY OUTPUT Regression Statistics Multiple R 0.984678962 R Square 0.969592659 Adjusted R Square 0.961299748 Standard Error 0.093322782 Observations 15 ANOVA df Regression Residual Total 3 11 14 Coefficients 3.628345838 -0.158584431 0.084918308 -0.073278181

Intercept Beers Sleep Coffee

RESIDUAL OUTPUT Observation 1 2 3 4 5 6 Predicted GPA 3.356961597 3.832326948 4.075829686 3.318215853 2.966514553 4.002163563

7 8 9 10 11 12 13 14 15

2.710983744 3.759048767 3.527186155 3.368989666 2.382174755 3.758660824 3.283683416 3.588436266 3.588824209

SS 3.054772775 0.095800558 3.150573333

MS F Significance F 1.018257592 116.9182494 1.26267E-08 0.008709142

Standard Error t Stat P-value 0.178036846 20.37974676 4.36734E-10 0.01298053 -12.21709997 9.67282E-08 0.021603502 3.930765923 0.002348982 0.018110098 -4.046260836 0.001927797

Lower 95% Upper 95% Lower 95.0% 3.236489382 4.020202295 3.236489382 -0.187154385 -0.130014478 -0.187154385 0.037369321 0.132467294 0.037369321 -0.113138238 -0.033418124 -0.113138238

Residuals -0.106961597 0.067673052 -0.075829686 0.011784147 -0.096514553 -0.002163563

All three independent variables significantly influence the GPA Because the R Square is high, and Because all P-values are smaller than 0.05 I am very confident in the belief that: If I want to do very well on the Final, I should have Only 1 Beer at the party No more than 8.5 hours of sleep No more than 2 cups of coffee

0.049016256 0.150951233 0.022813845 0.031010334 0.057825245 0.091339176 -0.063683416 -0.148436266 0.011175791

Beers Line Fit Plot


4.50 4.00 3.50 3.00 GPA 2.50 2.00 1.50 1.00 0.50 0.00 0 1 2 3 4 5 Beers 6 7 8 9 10 y = -0.1862x + 4.1795 R = 0.9309

Sleep Line Fit Plot


4.50 4.00 3.50 3.00 GPA 2.50 2.00 1.50 1.00 0.50 0.00 4 5 6 7 Sleep 8 9 y = -0.0903x2 + 1.4896x - 2.3226 R = 0.8319

Upper 95.0% 4.020202295 -0.130014478 0.132467294 -0.033418124

nce the GPA

Coffee Line Fit Plot


4.50 4.00 3.50 3.00

3.00 GPA 2.50 2.00 1.50 1.00 0.50 0.00 0 0.5 1 1.5 2 2.5 Coffee 3 3.5 4 4.5 y = 0.0604x3 - 0.4957x2 + 1.0448x + 3.0194 R = 0.2502

GPA Predicted GPA Linear (Predicted GPA)

10

2.3226

GPA Predicted GPA Poly. (Predicted GPA)

10

1.0448x + 3.0194

GPA Predicted GPA Poly. (Predicted GPA)