Sie sind auf Seite 1von 17

Project Learning About Stat 201 Students

Summer Session
August 6, 2010
Misty Davis
Sonya Newman
Hunter Somerville
Paul Toney

Part 1, Question 1
Percent of Males and Females that took Statistics 201 in Spring 2010
60%

40%

40.0%

60.0%

female

male
Gender by percent

Gender by percent
female

male

The sample, categorical data collected from the survey of all Stat 201 students who took the
course in Spring, 2010 shows that 60 % of the population is males and 40% is females. The
graphs fit the area principle and accurately display the information in a clear and understandable
manner.

Part 1, Question 2
Born in Tennessee and whether UT is first choice for college.

9 UT First
Choice?

a) We chose the categorical variables of Born in Tennessee and UT as first choice. We


feel there would be an association because most people would want to remain in
Tennessee to attend school and attend a major university.
b) Mosaic Plot

Contingency Table
7 Born in TN? By 9 UT First Choice?
Count
No
Yes
Total %
Col %
Row %
No
11
17
13.75
21.25
50.00
29.31
39.29
60.71
Yes
11
41
13.75
51.25
50.00
70.69
21.15
78.85
22
58
27.50
72.50

28
35.00

52
65.00

80

Tests
N
80
Test
Likelihood Ratio
Pearson
Fisher's Exact
Test
Left
Right
2-Tail

DF
1

-LogLike
1.4618801
ChiSquare
2.924
3.001

RSquare (U)
0.0311

Prob>ChiSq
0.0873
0.0832

Prob Alternative Hypothesis


0.9758 Prob(9 UT First Choice?=Yes) is greater for 7 Born in TN?=No than Yes
0.0721 Prob(9 UT First Choice?=Yes) is greater for 7 Born in TN?=Yes than No
0.1155 Prob(9 UT First Choice?=Yes) is different across 7 Born in TN?

c) There is a relationship. Seventy-two percent of people born in Tennessee did choose UT


as their first choice. Thus, the graph does support our expectations.

Part 1, Question 3
GPA

Normal (3.26875,0.37447)

Quantiles
100.0%
99.5%
97.5%
90.0%
75.0%
50.0%
25.0%
10.0%
2.5%
0.5%
0.0%

maximum

quartile
median
quartile

minimum

4.0000
4.0000
3.9588
3.7570
3.5800
3.2100
3.0025
2.7820
2.5818
2.0900
2.0900

Moments
Mean
Std Dev
Std Err Mean
Upper 95% Mean
Lower 95% Mean
N

3.26875
0.3744747
0.0418675
3.3520853
3.1854147
80

Part 1, Question 3 (cont.)


Fitted Normal
Parameter Estimates
Type
Location
Dispersion

Parameter

Estimate
3.26875
0.3744747

Lower 95%
3.1854147
0.3240902

Upper 95%
3.3520853
0.4435551

-2log(Likelihood) = 68.8732157776458

Goodness-of-Fit Test
Shapiro-Wilk W Test
W
0.979791

Prob<W
0.2367

Note: Ho = The data is from the Normal distribution. Small p-values reject Ho.

a) The histogram is unimodal, symmetric, has one outlier and a gap between 2 and 2.5.
The mean and the standard deviation will be used since the histogram is symmetrical.
b) The mean, the average of the data, is 3.27 and the standard deviation, how far each data
value is from the mean, is .37. The boxplot correlates with the histogram.
c) The line in the normal probability plot is relatively straight, which indicates the
distribution is normal and appropriate for the Empirical Rule. The Goodness-of-fit test,
which gives objective data, is 0.23 thus confirming that the distribution is normal.

Part 1, Question 4
a) Comparison of Number of Text Messages sent by Gender.
Distributions 2 Gender=female
21 Texts Sent Per Day
Quantiles

100

200

300

400

500

100.0% maximum
99.5%
97.5%
90.0%
75.0%
quartile
50.0%
median
25.0%
quartile
10.0%
2.5%
0.5%
0.0%
minimum

Moments
500
500
500
300
60
45
26.25
13
3
3
3

Mean
Std Dev
Std Err Mean
Upper 95% Mean
Lower 95% Mean
N

83.6875
116.42094
20.580509
125.66172
41.713276
32

Distributions 2 Gender=male
21 Texts Sent Per Day
Quantiles

100

200

300

400

500

100.0% maximum
99.5%
97.5%
90.0%
75.0%
quartile
50.0%
median
25.0%
quartile
10.0%
2.5%
0.5%
0.0%
minimum

Moments
300
300
277.5
127.5
60
30
11.25
7.7
3.45
3
3

Oneway Analysis of 21 Texts Sent Per Day By 2 Gender

21 Texts
Sent Per Day

500
400
300
200
100
0

female

male
2 Gender

Mean
Std Dev
Std Err Mean
Upper 95% Mean
Lower 95% Mean
N

50.125
57.138736
8.2472661
66.716359
33.533641
48

Part 1, Question 4
b) In both histograms of males and females as to the quantity of texts they send per day, the
shape is skewed to the right. The center of the distribution is roughly similar with the median
being 30 for males and 45 for females. The spread is for males (3,300) is less than that of the
females (3,500). The mean for males is 50.13 and 83.69 for females.
c) The number of text messages for males is less than that of females. There are a few outliers
that change the median and mean of the graphs. If the extreme high values were excluded the
data sets would be roughly similar and have similar shapes, centers and spreads.

Part 1, Question 5
Comparison of Gender with Age at which you hope to be married and Age at which you hope to
have your first child.
a) Females
Correlations
17 Age Hope to be Married
18 Age Hope to Have First Child

17 Age Hope to be Married 18 Age Hope to Have First Child


1.0000
0.9068
0.9068
1.0000

The correlations are estimated by REML method.

Males
Correlations
17 Age Hope to be Married
18 Age Hope to Have First Child

17 Age Hope to be Married 18 Age Hope to Have First Child


1.0000
0.8835
0.8835
1.0000

The correlations are estimated by REML method.

It appears that females want to get married and have children at a younger age than males.
Essentially, the correlations are the same. The female correlation is slightly stronger. We did
not find this surprising.

b)
Multivariate 2 Gender=female
Correlations
17 Age Hope to be Married
18 Age Hope to Have First Child

17 Age Hope to be Married 18 Age Hope to Have First Child


1.0000
0.8173
0.8173
1.0000

The correlations are estimated by REML method.

Scatterplot Matrix
32
30
28

17 Age Hope

26

to be Married

24
22
20
32
30
18 Age Hope to

28

Have First Child


26
24
22
20

22

24

26

28

30

32

22

24

26

28

30

32

Multivariate 2 Gender=male
Correlations
17 Age Hope to be Married
18 Age Hope to Have First Child

17 Age Hope to be Married 18 Age Hope to Have First Child


1.0000
0.7056
0.7056
1.0000

The correlations are estimated by REML method.

Scatterplot Matrix
30
29
28
27
26
25

17 Age Hope
to be Married

24
23
22
21
20
33
32
31
30

18 Age Hope to

29

Have First Child

28
27
26
25
20

22 23 24 25 26 27 28 29 30 25 26 27 28 29 30 31 32 33

c) The assumptions were met. The data is quantitative since it deals with age, linear and
moderately strong with outliers.
The outliers were removed and the correlations changed. The correlations weakened the strength
and created more outliers.

Part 1, Question 6
Bivariate Fit of 24 $ Spent on Haircut By 36 Number Body Piercings

24 $ Spent
on Haircut

a) The equation of the least squares regression line is 15.791 +3.665(x).

Residual

70
40
10
-20
-50
-1

36 Number Body Piercings

The assumptions for the linear regression are quantitative data is utilized, straight enough/linear
and no outliers present. The data is quantitative. However, the regression line is scattered and
doesnt follow a very straight line and there are several outliers present.

Linear Fit
24 $ Spent on Haircut = 15.790629 + 3.6645273*36 Number Body Piercings

Summary of Fit
RSquare
RSquare Adj
Root Mean Square Error
Mean of Response
Observations (or Sum Wgts)

0.156731
0.14592
18.65974
21.15
80

b) R Square is 0.156731 which means the correlation is very low between the amount
spent on a haircut and the number of body piercings.
Lack Of Fit
Source
Lack Of Fit
Pure Error
Total Error

DF
7
71
78

Sum of Squares
9449.829
17708.668
27158.497

Mean Square
1349.98
249.42

F Ratio
5.4125
Prob > F
<.0001*
Max RSq
0.4501

Analysis of Variance
Source
Model
Error
C. Total

DF
1
78
79

Sum of Squares
5047.703
27158.497
32206.200

Mean Square
5047.70
348.19

F Ratio
14.4972
Prob > F
0.0003*

Parameter Estimates
Term
Intercept
36 Number Body Piercings

Estimate
15.790629
3.6645273

Std Error
2.516664
0.962447

t Ratio
6.27
3.81

Linear Fit
24

Spent on Haircut = 15.790629 + 3.6645273*36 Number Body Piercings.

24 $ Spent
on Haircut

Bivariate Fit of 24 $ Spent on Haircut By 36 Number Body Piercings

Linear Fit
24 $ Spent on Haircut = 15.790629 + 3.6645273*36 Number Body Piercings

Summary of Fit
RSquare
RSquare Adj
Root Mean Square Error
Mean of Response
Observations (or Sum Wgts)

0.156731
0.14592
18.65974
21.15
80

Prob>|t|
<.0001*
0.0003*

Lack Of Fit
Source
Lack Of Fit
Pure Error
Total Error

DF
7
71
78

Sum of Squares
9449.829
17708.668
27158.497

Mean Square
1349.98
249.42

F Ratio
5.4125
Prob > F
<.0001*
Max RSq
0.4501

Analysis of Variance
Source
Model
Error
C. Total

DF
1
78
79

Sum of Squares
5047.703
27158.497
32206.200

Mean Square
5047.70
348.19

F Ratio
14.4972
Prob > F
0.0003*

Parameter Estimates
Estimate
15.790629
3.6645273

Std Error
2.516664
0.962447

t Ratio
6.27
3.81

Prob>|t|
<.0001*
0.0003*

Residual

Term
Intercept
36 Number Body Piercings

c) The equation remains 15.790629 + 3.6645273(x). The data is more linear and has no
outliers. R-squared also remained the same.

Part 2, Question 7
What type of computer do you use most often?

Frequencies
Level
PC
Mac
Other
Total

Count
54
24
2
80

Prob
0.67500
0.30000
0.02500
1.00000

N Missing
0
3 Levels

Confidence Intervals
Level
PC
Mac
Other
Total

Count
54
24
2
80

Prob
0.67500
0.30000
0.02500

Lower CI
0.584368
0.223401
0.008308

Upper CI
0.754182
0.389684
0.07277

1-Alpha
0.900
0.900
0.900

Note: Computed using score confidence intervals.

a) The sample proportion for PC users is .675. The confidence interval is (.58, .75). We are
90% confident that the sample proportion for PC users is in the interval.
b) The np^ and the n(1-p^) are both greater than 10 (54 and 26). The data is appropriate for
this normal model.
c) The population proportion is .6334. This is within the 90% confidence interval.

Part 2, Question 8
Fastest Driven in a Car
a) Outliers Included
Distributions
20 Fastest Driven in a Car (MPH)
Quantiles

70 80 90

110

130

150

170

100.0% maximum
99.5%
97.5%
90.0%
75.0%
quartile
50.0%
median
25.0%
quartile
10.0%
2.5%
0.5%
0.0%
minimum

Moments
170
170
156.825
134
115
105
95
85
80.075
80
80

Mean
Std Dev
Std Err Mean
Upper 95% Mean
Lower 95% Mean
N

106.0875
17.782222
1.9881129
110.04474
102.13026
80

a) Outliers Excluded
Distributions
20 Fastest Driven in a Car (MPH)
Quantiles

80

90

100 110

120

130

140

100.0% maximum
99.5%
97.5%
90.0%
75.0%
quartile
50.0%
median
25.0%
quartile
10.0%
2.5%
0.5%
0.0%
minimum

Moments
135
135
135
122.9
110
102
92.75
85
80
80
80

Mean
Std Dev
Std Err Mean
Upper 95% Mean
Lower 95% Mean
N

103.55263
14.017508
1.6079184
106.75577
100.34949
76

b) The shape meets the Nearly Normal Condition, it is unimodal and it is symmetric.
c) The sample mean is 103.55 miles per hour. For a 98% Confidence Interval the values are
(99.81, 107.29) for this sample. This means that 98% of the drivers will have driven at
speeds between 99.81 and 107.29 miles per hour at their fastest driven speed.
d) The population mean was found to be 105 mph. Yes it is within the interval.

Part 2, Question 9
2 Sample Test with Extremes
Oneway Analysis of 15 Hours Study Per Day By 8 Frat/Sorority?
25

15 Hours
Study Per Day

20
15
10
5
0

No
8 Frat/Sorority?

Yes

t Test
Yes-No
Assuming unequal variances
Difference
0.4202 t Ratio
Std Err Dif
1.2462 DF
Upper CL Dif
2.9328 Prob > |t|
Lower CL Dif
-2.0924 Prob > t
Confidence
0.95 Prob < t

0.337197
43.41902
0.7376
0.3688
0.6312

-4

-3 -2

-1

2 Sample Test without Extremes


Oneway Analysis of 15 Hours Study Per Day By 8 Frat/Sorority?
9
8
15 Hours
Study Per Day

7
6
5
4
3
2
1
0

Excluded Rows

No
8 Frat/Sorority?

Yes

11

t Test
Yes-No
Assuming unequal variances
Difference
-0.1020 t Ratio
Std Err Dif
0.5294 DF
Upper CL Dif
0.9665 Prob > |t|
Lower CL Dif
-1.1705 Prob > t
Confidence
0.95 Prob < t

-0.19272
41.85247
0.8481
0.5759
0.4241 -2.0

-1.0

0.0 0.5 1.0 1.5 2.0

a) Yes, the histograms are normal enough to perform a 2 Sample Test.


b) The hypotheses are as follows:
H: P(yes) P(no) = 0 There is no difference in study time in or out of fraternity or
sorority.
H: P(yes) P(no) > 0 There is an increase in study times when a member of a fraternity
or sorority.
Means and Std Deviations
Level
No
Yes

Number
47
22

Mean
3.17021
3.06818

Std Err
Std Dev
Mean Lower 95% Upper 95%
2.07542 0.30273
2.5608
3.7796
2.03713 0.43432
2.1650
3.9714

The difference of the Mean is -0.1022. The standard error of the difference is 0.132. The pvalue is 0.5759 is greater than 0.05. In this case we reject the Null Hypothesis and conclude that
yes students that are in fraternities or sororities spend more hours studying.

Das könnte Ihnen auch gefallen