Stats 201 Project Summer 2010

Project Learning About Stat 201 Students
Summer Session
August 6, 2010
Misty Davis
Sonya Newman
Hunter Somerville
Paul Toney
Part 1, Question 1
Percent of Males and Females that took Statistics 201 in Spring 2010
60%
40%
40.0%
60.0%
female
male
Gender by percent
Gender by percent
female
male
The sample, categorical data collected from the survey of all Stat 201 students who took the
course in Spring, 2010 shows that 60 % of the population is males and 40% is females. The
graphs fit the area principle and accurately display the information in a clear and understandable
manner.
Part 1, Question 2
Born in Tennessee and whether UT is first choice for college.
9 UT First
Choice?
a) We chose the categorical variables of Born in Tennessee and UT as first choice. We

feel there would be an association because most people would want to remain in
Tennessee to attend school and attend a major university.
b) Mosaic Plot
Contingency Table
7 Born in TN? By 9 UT First Choice?
Count
No
Yes
Total %
Col %
Row %
No
11
17
13.75
21.25
50.00
29.31
39.29
60.71
Yes
11
41
13.75
51.25
50.00
70.69
21.15
78.85
22
58
27.50
72.50
28
35.00
52
65.00
80
Tests
N
80
Test
Likelihood Ratio
Pearson
Fisher's Exact
Test
Left
Right
2-Tail
DF
1
-LogLike
1.4618801
ChiSquare
2.924
3.001
RSquare (U)
0.0311
Prob>ChiSq
0.0873
0.0832
Prob Alternative Hypothesis

0.9758 Prob(9 UT First Choice?=Yes) is greater for 7 Born in TN?=No than Yes
0.0721 Prob(9 UT First Choice?=Yes) is greater for 7 Born in TN?=Yes than No
0.1155 Prob(9 UT First Choice?=Yes) is different across 7 Born in TN?
c) There is a relationship. Seventy-two percent of people born in Tennessee did choose UT

as their first choice. Thus, the graph does support our expectations.
Part 1, Question 3
GPA
Normal (3.26875,0.37447)
Quantiles
100.0%
99.5%
97.5%
90.0%
75.0%
50.0%
25.0%
10.0%
2.5%
0.5%
0.0%
maximum
quartile
median
quartile
minimum
4.0000
4.0000
3.9588
3.7570
3.5800
3.2100
3.0025
2.7820
2.5818
2.0900
2.0900
Moments
Mean
Std Dev
Std Err Mean
Upper 95% Mean
Lower 95% Mean
N
3.26875
0.3744747
0.0418675
3.3520853
3.1854147
80
Part 1, Question 3 (cont.)

Fitted Normal
Parameter Estimates
Type
Location
Dispersion
Parameter
Estimate
3.26875
0.3744747
Lower 95%
3.1854147
0.3240902
Upper 95%
3.3520853
0.4435551
-2log(Likelihood) = 68.8732157776458
Goodness-of-Fit Test
Shapiro-Wilk W Test
W
0.979791
Prob<W
0.2367
Note: Ho = The data is from the Normal distribution. Small p-values reject Ho.
a) The histogram is unimodal, symmetric, has one outlier and a gap between 2 and 2.5.
The mean and the standard deviation will be used since the histogram is symmetrical.
b) The mean, the average of the data, is 3.27 and the standard deviation, how far each data
value is from the mean, is .37. The boxplot correlates with the histogram.
c) The line in the normal probability plot is relatively straight, which indicates the
distribution is normal and appropriate for the Empirical Rule. The Goodness-of-fit test,
which gives objective data, is 0.23 thus confirming that the distribution is normal.
Part 1, Question 4
a) Comparison of Number of Text Messages sent by Gender.
Distributions 2 Gender=female
21 Texts Sent Per Day
Quantiles
100
200
300
400
500
100.0% maximum
99.5%
97.5%
90.0%
75.0%
quartile
50.0%
median
25.0%
quartile
10.0%
2.5%
0.5%
0.0%
minimum
Moments
500
500
500
300
60
45
26.25
13
3
3
3
Mean
Std Dev
Std Err Mean
Upper 95% Mean
Lower 95% Mean
N
83.6875
116.42094
20.580509
125.66172
41.713276
32
Distributions 2 Gender=male
21 Texts Sent Per Day
Quantiles
100
200
300
400
500
100.0% maximum
99.5%
97.5%
90.0%
75.0%
quartile
50.0%
median
25.0%
quartile
10.0%
2.5%
0.5%
0.0%
minimum
Moments
300
300
277.5
127.5
60
30
11.25
7.7
3.45
3
3
Oneway Analysis of 21 Texts Sent Per Day By 2 Gender
21 Texts
Sent Per Day
500
400
300
200
100
0
female
male
2 Gender
Mean
Std Dev
Std Err Mean
Upper 95% Mean
Lower 95% Mean
N
50.125
57.138736
8.2472661
66.716359
33.533641
48
Part 1, Question 4
b) In both histograms of males and females as to the quantity of texts they send per day, the
shape is skewed to the right. The center of the distribution is roughly similar with the median
being 30 for males and 45 for females. The spread is for males (3,300) is less than that of the
females (3,500). The mean for males is 50.13 and 83.69 for females.
c) The number of text messages for males is less than that of females. There are a few outliers
that change the median and mean of the graphs. If the extreme high values were excluded the
data sets would be roughly similar and have similar shapes, centers and spreads.
Part 1, Question 5
Comparison of Gender with Age at which you hope to be married and Age at which you hope to
have your first child.
a) Females
Correlations
17 Age Hope to be Married
18 Age Hope to Have First Child
17 Age Hope to be Married 18 Age Hope to Have First Child

1.0000
0.9068
0.9068
1.0000
The correlations are estimated by REML method.
Males
Correlations

1.0000
0.8835
0.8835
1.0000
It appears that females want to get married and have children at a younger age than males.
Essentially, the correlations are the same. The female correlation is slightly stronger. We did
not find this surprising.
b)
Multivariate 2 Gender=female
Correlations

1.0000
0.8173
0.8173
1.0000
Scatterplot Matrix
32
30
28
17 Age Hope
26
to be Married
24
22
20
32
30
18 Age Hope to
28
Have First Child

26
24
22
20
22
24
26
28
30
32
22
24
26
28
30
32
Multivariate 2 Gender=male
Correlations

1.0000
0.7056
0.7056
1.0000
Scatterplot Matrix
30
29
28
27
26
25
17 Age Hope
to be Married
24
23
22
21
20
33
32
31
30
18 Age Hope to
29
Have First Child
28
27
26
25
20
22 23 24 25 26 27 28 29 30 25 26 27 28 29 30 31 32 33
c) The assumptions were met. The data is quantitative since it deals with age, linear and
moderately strong with outliers.
The outliers were removed and the correlations changed. The correlations weakened the strength
and created more outliers.
Part 1, Question 6
Bivariate Fit of 24 $ Spent on Haircut By 36 Number Body Piercings
24 $ Spent
on Haircut
a) The equation of the least squares regression line is 15.791 +3.665(x).
Residual
70
40
10
-20
-50
-1
36 Number Body Piercings
The assumptions for the linear regression are quantitative data is utilized, straight enough/linear
and no outliers present. The data is quantitative. However, the regression line is scattered and
doesnt follow a very straight line and there are several outliers present.
Linear Fit
24 $ Spent on Haircut = 15.790629 + 3.6645273*36 Number Body Piercings
Summary of Fit
RSquare
RSquare Adj
Root Mean Square Error
Mean of Response
Observations (or Sum Wgts)
0.156731
0.14592
18.65974
21.15
80
b) R Square is 0.156731 which means the correlation is very low between the amount
spent on a haircut and the number of body piercings.
Lack Of Fit
Source
Lack Of Fit
Pure Error
Total Error
DF
7
71
78
Sum of Squares
9449.829
17708.668
27158.497
Mean Square
1349.98
249.42
F Ratio
5.4125
Prob > F
<.0001*
Max RSq
0.4501
Analysis of Variance
Source
Model
Error
C. Total
DF
1
78
79
Sum of Squares
5047.703
27158.497
32206.200
Mean Square
5047.70
348.19
F Ratio
14.4972
Prob > F
0.0003*
Parameter Estimates
Term
Intercept
Estimate
15.790629
3.6645273
Std Error
2.516664
0.962447
t Ratio
6.27
3.81
Linear Fit
24
Spent on Haircut = 15.790629 + 3.6645273*36 Number Body Piercings.
24 $ Spent
on Haircut
Bivariate Fit of 24 $ Spent on Haircut By 36 Number Body Piercings
Linear Fit
24 $ Spent on Haircut = 15.790629 + 3.6645273*36 Number Body Piercings
Summary of Fit
RSquare
RSquare Adj
Root Mean Square Error
Mean of Response
Observations (or Sum Wgts)
0.156731
0.14592
18.65974
21.15
80
Prob>|t|
<.0001*
0.0003*
Lack Of Fit
Source
Lack Of Fit
Pure Error
Total Error
DF
7
71
78
Sum of Squares
9449.829
17708.668
27158.497
Mean Square
1349.98
249.42
F Ratio
5.4125
Prob > F
<.0001*
Max RSq
0.4501
Analysis of Variance
Source
Model
Error
C. Total
DF
1
78
79
Sum of Squares
5047.703
27158.497
32206.200
Mean Square
5047.70
348.19
F Ratio
14.4972
Prob > F
0.0003*
Parameter Estimates
Estimate
15.790629
3.6645273
Std Error
2.516664
0.962447
t Ratio
6.27
3.81
Prob>|t|
<.0001*
0.0003*
Residual
Term
Intercept
c) The equation remains 15.790629 + 3.6645273(x). The data is more linear and has no
outliers. R-squared also remained the same.
Part 2, Question 7
What type of computer do you use most often?
Frequencies
Level
PC
Mac
Other
Total
Count
54
24
2
80
Prob
0.67500
0.30000
0.02500
1.00000
N Missing
0
3 Levels
Confidence Intervals
Level
PC
Mac
Other
Total
Count
54
24
2
80
Prob
0.67500
0.30000
0.02500
Lower CI
0.584368
0.223401
0.008308
Upper CI
0.754182
0.389684
0.07277
1-Alpha
0.900
0.900
0.900
Note: Computed using score confidence intervals.
a) The sample proportion for PC users is .675. The confidence interval is (.58, .75). We are
90% confident that the sample proportion for PC users is in the interval.
b) The np^ and the n(1-p^) are both greater than 10 (54 and 26). The data is appropriate for
this normal model.
c) The population proportion is .6334. This is within the 90% confidence interval.
Part 2, Question 8
Fastest Driven in a Car
a) Outliers Included
Distributions
20 Fastest Driven in a Car (MPH)
Quantiles
70 80 90
110
130
150
170
100.0% maximum
99.5%
97.5%
90.0%
75.0%
quartile
50.0%
median
25.0%
quartile
10.0%
2.5%
0.5%
0.0%
minimum
Moments
170
170
156.825
134
115
105
95
85
80.075
80
80
Mean
Std Dev
Std Err Mean
Upper 95% Mean
Lower 95% Mean
N
106.0875
17.782222
1.9881129
110.04474
102.13026
80
a) Outliers Excluded
Distributions
20 Fastest Driven in a Car (MPH)
Quantiles
80
90
100 110
120
130
140
100.0% maximum
99.5%
97.5%
90.0%
75.0%
quartile
50.0%
median
25.0%
quartile
10.0%
2.5%
0.5%
0.0%
minimum
Moments
135
135
135
122.9
110
102
92.75
85
80
80
80
Mean
Std Dev
Std Err Mean
Upper 95% Mean
Lower 95% Mean
N
103.55263
14.017508
1.6079184
106.75577
100.34949
76
b) The shape meets the Nearly Normal Condition, it is unimodal and it is symmetric.
c) The sample mean is 103.55 miles per hour. For a 98% Confidence Interval the values are
(99.81, 107.29) for this sample. This means that 98% of the drivers will have driven at
speeds between 99.81 and 107.29 miles per hour at their fastest driven speed.
d) The population mean was found to be 105 mph. Yes it is within the interval.
Part 2, Question 9
2 Sample Test with Extremes
Oneway Analysis of 15 Hours Study Per Day By 8 Frat/Sorority?
25
15 Hours
Study Per Day
20
15
10
5
0
No
8 Frat/Sorority?
Yes
t Test
Yes-No
Assuming unequal variances
Difference
0.4202 t Ratio
Std Err Dif
1.2462 DF
Upper CL Dif
2.9328 Prob > |t|
Lower CL Dif
-2.0924 Prob > t
Confidence
0.95 Prob < t
0.337197
43.41902
0.7376
0.3688
0.6312
-4
-3 -2
-1
2 Sample Test without Extremes

Oneway Analysis of 15 Hours Study Per Day By 8 Frat/Sorority?
9
8
15 Hours
Study Per Day
7
6
5
4
3
2
1
0
Excluded Rows
No
8 Frat/Sorority?
Yes
11
t Test
Yes-No
Assuming unequal variances
Difference
-0.1020 t Ratio
Std Err Dif
0.5294 DF
Upper CL Dif
0.9665 Prob > |t|
Lower CL Dif
-1.1705 Prob > t
Confidence
0.95 Prob < t
-0.19272
41.85247
0.8481
0.5759
0.4241 -2.0
-1.0
0.0 0.5 1.0 1.5 2.0
a) Yes, the histograms are normal enough to perform a 2 Sample Test.

b) The hypotheses are as follows:
H: P(yes) P(no) = 0 There is no difference in study time in or out of fraternity or
sorority.
H: P(yes) P(no) > 0 There is an increase in study times when a member of a fraternity
or sorority.
Means and Std Deviations
Level
No
Yes
Number
47
22
Mean
3.17021
3.06818
Std Err
Std Dev
Mean Lower 95% Upper 95%
2.07542 0.30273
2.5608
3.7796
2.03713 0.43432
2.1650
3.9714
The difference of the Mean is -0.1022. The standard error of the difference is 0.132. The pvalue is 0.5759 is greater than 0.05. In this case we reject the Null Hypothesis and conclude that
yes students that are in fraternities or sororities spend more hours studying.

Stats 201 Project Summer 2010

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Stats 201 Project Summer 2010

Hochgeladen von

Copyright:

Verfügbare Formate

Project Learning About Stat 201 Students

a) We chose the categorical variables of Born in Tennessee and UT as first choice. We

Prob Alternative Hypothesis

c) There is a relationship. Seventy-two percent of people born in Tennessee did choose UT

Part 1, Question 3 (cont.)

Oneway Analysis of 21 Texts Sent Per Day By 2 Gender

17 Age Hope to be Married 18 Age Hope to Have First Child

The correlations are estimated by REML method.

17 Age Hope to be Married 18 Age Hope to Have First Child

The correlations are estimated by REML method.

17 Age Hope to be Married 18 Age Hope to Have First Child

The correlations are estimated by REML method.

Have First Child

17 Age Hope to be Married 18 Age Hope to Have First Child

The correlations are estimated by REML method.

Have First Child

a) The equation of the least squares regression line is 15.791 +3.665(x).

36 Number Body Piercings

Spent on Haircut = 15.790629 + 3.6645273*36 Number Body Piercings.

Bivariate Fit of 24 $ Spent on Haircut By 36 Number Body Piercings

Note: Computed using score confidence intervals.

2 Sample Test without Extremes

0.0 0.5 1.0 1.5 2.0

a) Yes, the histograms are normal enough to perform a 2 Sample Test.

Das könnte Ihnen auch gefallen