Stats Semester Project

Stats Semester Project
Project Part 2 The qualitative variable is the color of each skittle. The individual for the
qualitative variable is one skittle.
The quantitative variable is the number of skittles per bag. The individual for the quantitative
variable is one bag.
The sample size for the color of each Skittle was 3,551 Skittles. In total, there was 726 (or
.2044) yellow skittles, 716 (or .2016) red skittles, 710 (or .1999) green, 701 (or .1974) purple,
and 698 (or .1966) orange skittles.
For the quantitative variable, the sample size was 60 bags of candy. The numerical summaries
are as follows:
Based on this information the lower fence is 52.75 and the upper fence is 66.75. This means
that there are 2 outliers (the bag with 50 and the bag with 52 Skittles). My bag of Skittles was
not an outlier, because I had 64 total Skittles.
For the qualitative variable, we wouldnt have a distribution shape because there is not a logical
order so you can rearrange the data. For the quantitative variable, the shape is skewed left. We
know this because when placed in order, the tail pulls to the left, or toward the smaller numbers.
Project Part 3 regression/correlation
I do not think that there will be a significant relationship between your height and the number of
skittles per package. I think this because each person chooses their bag randomly and the
height of the individual doesnt affect how many Skittles is in their bag because the company
packages the Skittles with approximately the same number of Skittles per bag regardless.
For this example, the X variable is the height in inches and the y variable is the number if
skittles per bag.
The correlation coefficient was .17042887, which is close to zero, which means there is not a
significant relationship. The critical value for a sample size of 60 is .361. Since the absolute
value of the correlation coefficient is only .170, no linear relation exists. I wasnt expecting there
to be a relationship, so this wasnt surprising to me.
The regression equation would be = 50.7137+.1288x. Based on this equation the number of
candies in a bag bought by someone who is 63.5 inches tall would be approximately 58.9
skittles. It was not appropriate to use the regression equation for this because there was not a
significant relationship between the two variables so the answer is not realistic/correct.
R2 is .029046. This means 2.9% of the variation in the number of candies per bag can be
explained by the height of the person buying it.
It would not be appropriate to predict the number of candies in a bag bought by Yao Ming
because he is 90 inches tall, which makes him an outlier. Outliers have a big effect on the
equation and the regression equation should not be used to predict outside of the scope,
because that is extrapolation.
With the new data set, the new correlation coefficient changes to .14567 and the regression
equation changes to y^=52.96+0.077x. The critical value for a sample size of 6 is .811, since
the correlation coefficient absolute value is .14567 it is still lower than what the critical value
would need to be to show that there is a significant relationship so even with the smaller sample
size, there is not a significant relationship between your height and the number of Skittles in
your bag.
Project Part 4 Probability

Problem 1: Suppose you are going to randomly select two Skittles from the bag YOU
purchased.
My bag: 9 red, 9 orange, 11 yellow, 18 green, 17 purple, 64 total
(a) What is the probability that both Skittles are purple if you select them with replacement?
Give your answer correct to four decimal places.
(17/64)^2 = .0706
(b) What is the probability that both Skittles are purple if you select them without replacement?
Give your answer correct to four decimal places.
(17/64)(16/63)= .0675
(c) What is the probability that at least one Skittle is purple if you select them with replacement?
1-(1-17/64)^2=.4607
Problem 2: Suppose all of the Skittles in the class data set are combined into one large bowl
and you are going to randomly select one Skittle.
(a) What is the probability that you select a green Skittle?
710/3551= .1999
(b) What is the probability that you select a Skittle that is NOT green?
3551-710= 2841 2841/3551= .8001
(c) What is the probability that you select a Skittle that is red OR yellow?
Red skittles = 716 Yellow skittles = 726

(716/3551) + (726/3551) = .4061
(d) What is the probability that you select a Skittle that is orange GIVEN that it is a secondary
color?
(Secondary colors are green, orange and purple)?
Green = 710 Orange = 698 Purple = 701
698/2109 = .3310
Problem 3: Suppose all of the Skittles in the class data set are combined into one large bowl
and you are going to randomly select ten Skittles with replacement and count how many are
yellow.
(a) Show that this meets the requirements of the binomial probability distribution and identify n
and P.
Has fixed number of trials (10), trials are independent (since we are replacing the skittles, the
outcome of one does not affect another), 2 disjoint outcomes (yellow or not yellow), probability
of success is the same for each trial (since we are replacing the skittles, the likelihood of
drawing a yellow skittle stays the same each time). n=10 , P= 726/3551= .2044
(b) What is the probability that exactly 4 of the 10 Skittles are yellow?
Binomialpdf (10,.2044,4) = .0930
(c) For samples of size 10, what is the expected value and standard deviation for the number of
yellow skittles that will be included?
Expected value = = 10 x .2044 = 2.044
Standard Deviation = = the square root of 10 x .2044 x .7956 = 1.2752
Problem 4: For this problem, treat a 2.17 ounce bag of Skittles as an individual. Suppose the
values for our class data are the parameter values for all 2.17 ounce bags of Skittles. In other
words, assume = mean number of candies per bag in our class data set and = standard
deviation of number of candies per bag in our class data set (you computed these values in Part
2).
(a) Describe the sampling distribution for the mean number of candies per bag for samples of 32
bags. Include center, spread and shape. Note: The shape of the SAMPLING DISTRIBUTION is
different from the shape of the population, which you determined in Part 2 of the project.
Center: 59.18 (average from part 2)
Spread: 3.11/ the square root of 32 = .5498
Shape: Approximately normal (sample > 30)
(b) What is the probability that the mean number of candies per bag for a sample of 32 bags is
greater than 58.5?
P(x>58.5) = 58.5 59.18 / .5498 = -.68/.5498 = -1.2368 Z of -1.24 = .1075 1-.1075 = .8925
Project Part 5
Explain in general the purpose and meaning of a confidence interval.
o A confidence interval is an interval of numbers that gives a range of likely values for an
unknown parameter. They are helpful because you can say that with x amount of
confidence the range of values will include the true value population parameter.
Identify the requirements for computing confidence intervals. List the requirements
separately for a confidence interval for a population proportion and for a population mean.
o The requirements for computing a confidence interval for a population proportion are:
- A simple random sample
- np(1-p) 10
- n 0.05N
o The requirements for computing a confidence interval for a population mean are:
- A simple random sample
- n .05N
- Population normal or n 30
Using values for the class data that you computed in Part 2 of the project, construct a
99%
confidence interval estimate for the true proportion of yellow candies using the class data as
your sample. Remember that for this computation, n is the number of CANDIES for the entire
class data.
o n = number of candies = 3551
number of yellow skittles = 726
p = 726/3551 = .204
a = .01
a/2 = .005
Za/2 = 2.575
Formula = p Za/2 square root of (p (1- p) / n )
Equation = .204 2.575 square root (.204(1-.204)/n) = .1866,.2214 = Confidence
Interval
Give an appropriate interpretation of your interval
o With 99% confidence, the true proportion of yellow candies is between .1866 and
.2214
Based on your interval for the true proportion of yellow candies, was the proportion of
yellow
candies in the single bag of candy you purchased a likely value for the true population
proportion? Explain how you know using actual values from your data and computations.
o No, the proportion of yellow candies in the bag that I purchased was .1719, which is
too low for the confidence intervals.
Using values you computed in Part 2 of the project, construct a 95% confidence interval
estimate for the true mean number of candies per bag using the class data as your sample,
but for this computation, n is the number of BAGS.
o x= 59.18
S = 3.11
n = 60
df = 60 1 = 59
ta/2 = 2.000
a = 1-.95 = .05
formula = x ta/2 S/ square root of n
equation = 59.18 2.00(3.11/square root of 60) = 58.38, 59.98
Give an appropriate interpretation of your interval.
o With 95% confidence, the true mean number of candies per bag is between 58.38
and 59.98
Based on your interval for the true mean number of candies per bag, was the total
number of
candies in the single bag you purchased a likely value for the population mean? Explain how
you know using actual values from your data and computations
o No, the number of candies in my bag as 64, which was too high for the confidence
interval.
Project part 6
Explain in general the purpose and meaning of a hypothesis test.

o Hypothesis testing is a procedure based on sample results that allows us to test a
hypothesis about a population. Using the results, we can then reject or not reject the
null hypothesis by comparing it to the significance level.
Using values for the class data that you computed in Part 2 of the project and a 0.05 significance
level, test the claim that 20% of all Skittles candies are red. Show all the steps (neatly written
and scanned, typed, or copied from StatCrunch) including:
1. the hypotheses with correct notation

H0: p=.2
H1: p.2
2. the conditions for performing the hypothesis test, along with checking that they are methint:
they are not all met!
SRS not met students selected the bag of skittles at random, so it is a convenience sample
Np0(1-p0) 10 = 3551x.2(1-.2) = 568.1610 met
n.05N = 3551< all the skittles
3. the test statistic

Ti commands 1propztest p0: .2, x:716, n:3551, prop: p0
Z=.243
4. the p-value
Ti commands 1propztest p0: .2, x:716, n:3551, prop: p0
P=.808
5. the appropriate decision about the null hypothesis and an appropriate conclusion
Since the p value (.808) is greater than the significance level (.05) we do not reject the null
hypothesis.
We fail to reject H0. There is insufficient evidence to conclude that the proportion of red skittles is
not 20%.
6. Also describe the Type I and Type II errors for this test. (8 points)
Type 1 We reject that the proportion of red skittles is 20% when it really is 20%
Type 2 We fail to conclude that the proportion of red skittles is not 20% when it really is not
20%
Using values for the class data that you computed in Part 2 of the project and a 0.01 significance
level, test the claim that the mean number of candies in a bag of Skittles is more than 58. Show
all the steps (neatly written and scanned, typed, or copied from StatCrunch) including:
1. the hypotheses with correct notation
H0: =58
H1: >58
2. the conditions for performing the hypothesis test, along with checking that they are methint:
they are not all met!
SRS not met convenience sample
Sample greater than 30 met, n=60
Independent met
3. the test statistic
Ti commands Ttest 0: 58, x bar: 59.18, s:3.11, n:60, : >0
T= 2.94
4. the p-value
Ti commands Ttest 0: 58, x bar: 59.18, s:3.11, n:60 : >0
P= .0023
5. the appropriate decision about the null hypothesis and an appropriate conclusion
p value (.0023) is less than level of significance (0.05) so we reject H0.
We reject that the mean number of candies per bag is 58 because there is sufficient evidence to
conclude that the true mean number of candies per bag is greater than 58.
6. Also interpret the p-value for this test. (4 points)
If the mean number of candies per bag is 58, then the probability of getting a mean of 59.183
skittles or greater is .0023.
Project Part 7
Briefly explain the overall procedures and goals of the assignment in your own words. Do not
assume that your reader knows in advance what the assignment is about.
o For this assignment, we were each asked to buy a bag of skittles. We then had to
record how many of each color, how many skittles total, and how tall we are in
inches. Then, throughout the semester we were asked to do different pieces of the
project based on what we have learned in class. We mainly used the class set of
data in order to have a big enough sample size for the tests, but in some cases, we
used our personal bag to compare it to the larger sample size to see the differences
the sample size can make.
Over the semester, using the skittles and our heights, we made graphs, gathered
numerical statistics, computed things like correlation coefficients, probabilities,
confidence intervals and performed hypothesis tests. I think the goal of this project
was to make sure we understood what we were learning, and to show us an example
of how it can apply in the real world.
Reflection paper
When I first started the project, I was nervous as can be. One of my friends took this class
with Hilton before and was telling me about this huge, 7-piece project. She told me she spent many
hours on it, it took all semester, and it was a lot of very hard problems. To say I was intimidated is an
understatement. All I could think was that if she was saying this, and she is really good at math, it
wasnt something would ever be able to do. I thought that because, in most cases, I personally think
I am very bad at anything above basic math.
When we were told to buy skittles and record the information, along with our height, I was
intrigued to see how the height would tie in with the number of skittles we purchased. I was not able
to see any relation between the two at first, but when we got to the portion of the project that used
the height, I realized that it was okay that I didnt see it, because I was about to determine if there
was a relationship for myself.
At the beginning of each section of the project, when I would glance at the procedures, I
would feel so overwhelmed. I would read the steps and think that there was no way I was going to
be able to complete it without a lot of stress and hard work, so I would always put it off until the last
second. I shouldnt have waited to start the project, but I was happy that with each piece, -once I
actually got going- it was much simpler then I had thought it would be. Once I sat down, read it and
reviewed my class notes, most of the work became very simple.
Contrary to my prior worries, I really enjoyed this project. I think a large part of the reason
that I usually struggle with math is because I can never tie it to real life. For example, other than in
school, I dont ever think I am going to need to know half of the stuff we learn in math 1050.
However, largely with this project, but also with the examples Hilton used in class, I could tie it to
something that made sense and it made all of this so much easier for me to grasp. Because of this, I
was able to maintain a much higher grade than I thought I would with stats.

Stats Semester Project

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Stats Semester Project

Hochgeladen von

Copyright:

Verfügbare Formate

Stats Semester Project

Project Part 4 Probability

Red skittles = 716 Yellow skittles = 726

Explain in general the purpose and meaning of a confidence interval.

Give an appropriate interpretation of your interval

Give an appropriate interpretation of your interval.

Explain in general the purpose and meaning of a hypothesis test.

1. the hypotheses with correct notation

3. the test statistic

Das könnte Ihnen auch gefallen