You are on page 1of 13

Baylee Vario

Welcome to my Skittles Term Project! I wanted to explain a little bit

about the project and how I applied the concepts that we were learning in

class to the different project parts throughout the semester. To begin the

project (well call this Part 0.5), each student in my class purchased a 2.17 oz

bag of Skittles. We then sorted the candies and counted how many there were

of each color. We also included our heights in inches as part of the data.

In Part 1 I used the class data to create a couple different graphs that

displayed quantitative and qualitative data. In Part 2, I analyzed the data for

number of candies per bag and height of the person who purchased the bag to

see if they had any correlation. Our research question was Can height be

used to predict the number of candies that will be in a bag of Skittles you

purchase?

In Part 3, I computed different probabilities such as, What is the

probability that you select a green Skittle? Or a Skittle that is not red? In part

4, I discussed confidence intervals and computed some confidence intervals

using the data. Confidence intervals can be used to compare the data from our

small math class to a bigger population, such as the United States or even the

entire world. The interval that is computed is a specific range in which a

parameter may fit.

In Part 5, I tested certain hypotheses that were given to me in regards to

the class data and how it may apply to a larger population. I think it is so cool

that you can form a hypothesis/claim and use data you have gathered from a

small group to test its truth/accuracy in a large population all through


mathematics! Anyways, check out all the information below to get an in depth

look at my term project!

PART 1

Candy Color

Qualitative It is describing a quality of the data. It


can fit into a category. Individual: skittle candy
Sample size: 3,228 Skittles

Taste The Rainbow What is the most common color in a


bag of Skittles?
It is not appropriate to discuss the shape of distribution when describing a
pie chart because the data is being evaluated as a whole.
Taste The Rainbow What is the most common color in a bag of
Skittles?

It is not appropriate to describe the shape of distribution when discussing a


Pareto graph because the data is always arranged in descending order.
Number of Candies per Bag

Quantitative (discrete) It is
counting the data. Individual: bag
of skittles

Boxplot How many Skittles are in a 2.17 oz.


bag?

Shape of distribution: Skewed left


Dotplot How many
Skittles are in a 2.17
oz. bag?

Shape of distribution: Bell-shaped

Summary statistics:
Colum n Mean Std. Media Mi Ma Rang Q1 Q3 IQR Mode
n
Candie 5 59.77 dev.
2.203 n
60 n
55 x
64 e9 58 61 3 59
s per 4 8
bag

Lower fence: 53.5


Upper fence: 65.5

There are no outliers, therefore the bag I purchased is not an outlier.


PART 2

Can height be used to predict the number of candies that will be in a


bag of Skittles you purchase?

No, I do not think the height of a person can determine the amount of
candies they receive in a Skittles bag. A persons height has virtually no affect
on which bag they choose. Most people do not even put any thought into what
bag of candy they pick up. In this research question, the height of a person is
the explanatory variable and the number of candies in a bag of Skittles is the
response variable.

Because the absolute value of the correlation coefficient r=0.1665 is


less than the critical value-0.361, there is no significant linear relationship
between x and y. Yes, this is what I expected when I hypothesized about the
results. I did not think there would be a linear relationship between the height
of a person and the amount of candies in their Skittles bag because there is no
reason I can think of for the two variables to be linearly related. The regression
equation is y=0.0995x+53.1257. Using this equation, someone who is 63.5
inches tall should choose a bag with 59.4 candies. It is not appropriate to use
this equation to make a prediction because there is not a linear relationship
between x and y. R-sq: 0.0277, which means that only 2.77% of the variation
in the response variable can be explained by the least-squares regression line.
Even if there was a significant linear relationship between x and y, it
would not be appropriate to predict the number of candies Yao Ming would get
in his bag of Skittles because his height is too far out of the range of heights
used in the sample to create the regression equation. If we were to predict the
number of candies, this would be an example of extrapolation.
Using a systematic sample (2nd, 12th, 22nd, 32nd, 42nd, and 52nd rows of
data) the correlation coefficient is r=0.2806 and the regression equation is
y=0.2047x+46.0936. Using the Critical Values table, the critical value for the
sample is 0.811. Because the absolute value of r=0.2806 is less than the
critical value, there is no significant linear relationship between x and y.

PART 3

1. a. (11/63)^2 = 0.0305
b. (11/63)*(10/62) = 0.0282
c. 1 - (52/63)^2 = 0.3187

2. a. 650/3228 = 0.2014
b. 1 - (650/3228) = 0.7986
c. (662/3228) + (626/3228) = 0.3990
d. (633/1940) = 0.3263

3. a. Is there a fixed number of trials? Yes, 10. Are the trials independent of
each other? Yes, because we are replacing the skittle that we take out. Are
there only 2 outcomes (a success and a failure)? Yes, it is a success if the
skittle is yellow and a failure if it is not yellow. n=10 and p=(626/3228) =
0.1939
b. 0.0814 calculator commands: 2nd > DISTR > A binomial > trials: 10,
p: 0.1939, x value: 4 > paste > enter
c. Expected value is (p * n) = (0.1939 * 10) = 1.939 Standard deviation
is (square root of np(1-p)) = 1.25

4. a. Center: (mu) 59.8 Spread: (sigma/square root of 32) 0.4 Shape:


Approximately normal (bell-shaped).
b. 0.9994 calculator commands: 2nd > DISTR > 2: normalcdf > lower:
58.5, upper 1E99, mu: 59.8, sigma: 0.4 > paste > enter

PART 4

A confidence interval can be defined as a range of values used to


estimate the value of a population parameter. In statistics, we use a
confidence interval to describe the amount of uncertainty associated with an
estimate of a population parameter. The confidence interval is defined by the
point estimate plus or minus the margin of error. If we were trying to
determine the mean of specific data in a population, first we would take
multiple samples and determine their mean. Then we would find the
confidence interval. If it was found to be 98%, that would mean that 98% of
the intervals would contain the population mean.

The requirements to express a confidence interval are confidence level,


sample statistic, and margin of error. (Confidence interval = point estimate +/-
margin of error). To estimate the population proportion, use the sample
proportion. Which is found by dividing the total population (n) by the sample
(x). The population mean is found by the following formula: = ( * X)/ N
Where: means the sum of. X = all the individual items in the group. N =
the number of items in the group.

I used the TI-84 calculator to find a 99% confidence interval estimate for
the true proportion of yellow candies. These are the steps I used to find my
answer: STAT>TESTS> A: 1-PropZInt> x: 626, n: 3228, c-level: 0.99
>CALC>ENTER> The interval I found was: (0.1760 , 0.2119). This means that
99% of all sample proportions of yellow candies will be between 0.1760 and
0.2119.

Based on the interval mentioned above, the proportion of yellow candies


in the single bag of candy I purchased was not a likely value for the true
population proportion, much to my surprise! In my bag of 63 total candies,
only 8 were yellow. Which means the sample proportion of yellow candies in
my bag was 0.1270.

To find the 95% confidence interval for the true mean number of candies
per bag I used the following TI-84 calculator steps: STAT>TESTS> 8: T
Interval>Stats> x bar: 59.78, Sx: 2.2, n: 54, c-level: 0.95>Calculate> The
interval I found was: (59.18 , 60.38). This means that 95% of all sample means
will be between 59.18 and 60.38.

Again, I was surprised to see that based on the interval calculated above
for the true mean number of candies per bag, the total number of candies in
the bag I purchased was not a likely value. I had 63 candies in my bag, which
is a larger amount than what was included in the 95% confidence interval.

PART 5

A hypothesis test is a test used in statistics to determine whether


there is enough evidence in a sample of data to infer whether a certain
condition is true for the entire population. It examines two opposing
hypotheses about a population, including the null hypothesis and the
alternative hypothesis. Usually the null hypothesis is a statement of "no
effect/difference", hence its name. Once a hypothesis test is completed, you
can either reject or fail to reject a null hypothesis. You do this by comparing
the p-value to the level of significance. A test will remain with the null
hypothesis until there is enough evidence to support the alternative
hypothesis.

Part A

Claim: 20% of all Skittles


are red. H0: p=0.20
H1: p [does not equal] 0.20
Conditions for performing this hypothesis test:
1. Data is collected using simple random sampling or a randomized
experiment. No, our entire class was assigned to purchase a bag of skittles,
which is an example of convenience sampling.
2. n*p0*(1-p0) [is greater than or equal to] 10. Yes, p0=0.20,
n=sample size=3,228. Equation:
3,228*0.20*(1-0.20)=516.48. 516.48 is greater than 10.
3. The sample size must be less than or equal to 5% of the population.
Yes, our sample size is only 3,228 Skittles. I am unsure how to find the actual
amount of Skittles in the world but according to wrigley.com over 200 million
skittles are produced on a daily basis.

To find the test statistic and p-value I used the TI-84 calculator.
STAT>Tests>5:1-PropZTest>p0:0.20 x:662 n:3228 prop[does not equal]p0
>Calc> p=0.47052

P-value=0.4705 is greater than the level of significance=0.05, therefore I


will not reject the null hypothesis. There is not enough evidence to reject
the claim that 20% of all Skittle candies are red.

A Type I error would be rejecting the null hypothesis when the null hypothesis
is true. For example, if I were to state the above (that the p-value is greater
than the level of significance) and then say we can therefore reject the null
hypothesis, I would make a Type I error. A Type II error would be not rejecting
the null hypothesis when the alternative hypothesis is true. An example of
that would be if the p-value was not greater than the level of significance and
I failed to reject the null hypothesis.

Part B

Claim: The mean number of candies in a bag of


Skittles is more than 58. H0: mu=58
H1:
mu>58

Conditions for performing this hypothesis test:


1. Data is collected using simple random sampling or a randomized
experiment. No, our entire class was assigned to purchase a bag of
Skittles, which is an example of convenience sampling.
2. The population from which the sample is drawn is normally
distributed or the sample size n [is greater than or equal to] 30. Yes, the
sample size is n=54 bags.
3. The sample contains no outliers and the sample size is less than
or equal to 5% of the population. Yes, again, I am unsure how to find the
actual amount of bags of Skittles in the world but according to
wrigley.com over 200 million skittles are produced on a daily basis and
our sample size was only 54 bags.

I used a calculator to find the test statistic and p-value: STAT>Tests>2:T-


Test> mu0:59.778 xbar:58 Sx:2.203 n:54 mu:>mu0 >Calc> p=1.0

P=1.0 is greater than the level of significance=0.01 therefore, I will fail to


reject the null hypothesis. There is not enough evidence to reject the claim
that the mean number of candies in a bag of Skittles is more than 58.
This project has taught me a lot about Statistics and math in general.

First, it taught me that there is a lot more theory and analyzing that go in to

Statistics than I thought there was. It also surprised me how much depth

there really is to the study of Statistics and how this class is a small

introduction to a vast field of study.

Mathematics and Statistics are all around us. I really enjoyed the Stat

Talk videos because they did a great job at showing how Statistics can be

applied to everyday life. Some of the subjects of the videos included length

of mens hair in the streets of New York, the weather in New York versus San

Francisco, restaurant meal prices and delivery times, and so on. All of this

data can be collected, organized, analyzed, and tested using Statistics.

I believe I will use Statistics often in my nursing career. A major time

that comes to mind is during research. Nursing is constantly changing and

improving by evidence-based practice, so being able to read and understand

research papers or conduct individual research projects is extremely

important. Collecting and organizing data isnt too difficult; the challenging

part is figuring out what the results are and what they mean. This class

helped me to develop those skills (although I feel I still have a long way to

go), as well as problem solving and critical thinking skills. All those skills are

so important in nursing! When I first registered for this class I thought, What

does Statistics have to do with nursing? I already took College Algebra, why

do I have to take this class too? However, now I see why they require it and
I am appreciative of the knowledge this class offers. Of course, not many

people enjoy taking math classes but I really do see the significance in this

one and how it can and will help me succeed in my nursing career.