You are on page 1of 7

Adriana Mulato


December 14, 2017

Skittles Project

For this assignment, each student had to record each color of skittles from a small bag. Once all

the students had the numbers of their skittles, they were put together and combined. That was the

first part of this assignment. Below there is a list of the numbers I got from my skittles bag.

Number of red candies: 6

Number of orange candies :16

Number of yellow candies: 7

Number of green candies: 11

Number of purple candies: 14

The following section shows the overall sample that was gathered by our class. It will be shown

in a Pie Chart and Pareto chart. It will be separated by their color. The sample size for this data is

the total amount of skittles from all the students.

Sample size: 3076

After looking at the data, the numbers for each color close to each other. With the overall class

they go from Red, Orange, Green, Purple, and Yellow. When comparing this to my own data I

can see that mine isnt in that same order, mine goes from Orange, Purple, Green, Yellow and

Red. Although, mine is different, a lot of the numbers are pretty close to the overall data. There

are definitely a variety of combination in each bag, so each one will look different.

For this section there is going to be a frequency histogram and a boxplot will be displayed and a

5-number summary of the data. We are focusing on the number of candies per bag.
Although the boxplot and the histogram are different graphs, using the same information, it still

follows the same pattern. They are both skewed left. The mean number of candies per bag was

60.3 with a standard deviation of 3.44. To explain the boxplot a little more it shows that the

minimum amount of Skittles per bag was 50, the maximum amount per bag was 65. The first line

you see is the first interquartile range (Q1), it was 59 Skittles, the second line is the median which

was 61 Skittles, and the third line seen is the third interquartile range (Q3) was 62 skittles. The 5

number-summary proves the shape of the distribution for both graphs. Each individual person

may have different numbers per bag, mine in particular had 54, which is close to the minimum.

There is a difference between categorical data and quantitative data. This project consists of both,

when youre counting how many candies per bag there are is an example of quantitative data

because it they are all different. It is used to see the sample size of the data as well. Once you

combine them and split them into groups of color, like this project, then its categorical data.

Categorical data is what is used to make the charts and graphs.

For the following section, we are going to do a confidence interval. A confidence interval is to

show if the value we are looking for falls within a specific parameter. The higher the percentage

the more confident you will be to have a value fall within the parameters. An example for this

data, is to find the true proportion of yellow candies and be 99% confident. The number of

Yellow Skittles from the total number of skittles is 581. Reminder that the number of all the

Skittles is 3076. Below I will insert an image with the problem I worked out to make a 99%

confidence interval. After working out the numbers both by hand and by calculator we can make

a conclusion. The conclusion would say, we are 99% confident that the true proportion of Yellow

Skittles falls between 526 Skittles and 637 Skittles.

Another example would be constructing a 95% confidence interval for the true mean number of

candies per bag. Below I will insert an image of the work by hand. I also did it on the calculator

as well. The mean number of Skittles for all classes is 615.2 So after conducting the 95%

confidence interval for the true mean number of candies per bag, we can say that we are 95%

confident that the true mean number of Skittles is between 587.11 and 643.29.
For the following section we are going to focus on a hypothesis test. What we want to find with

this is whether our predictions based on the sample we have is true or not. I will insert an image

below showing both hypothesis tests. Using this data, we will make a hypothesis test, using a

significance level of 0.05, to test the claim that 20% of all Skittles of candies are red. To begin

with the hypothesis test we have to write a null hypothesis and an alternative hypothesis. The

null hypothesis will always be equal to the proportion, or mean we are trying to test. The

alternative is what we are trying to find, whether it is more than, less than, or not equal. For this

case we will write the null hypothesis as H0: p=20, and the alternative hypothesis as H1: p 20.

We put the values in the calculator and find that our p-value is .23, and our z value is 1.21. With

this data we compare the p-value to the level of significance, which in this case our p-value of .

23 is more than our level of significance of 0.05, because of this we fail to reject the null
hypothesis. We do not have sufficient evidence to support the claim that 20% of the Skittles are

not Red.

Another example using the data using a significance level to test the claim that the mean number

of candies in bag of Skittles is 55. We will do the same process as above and begin with the null

hypothesis and alternative hypothesis. The null hypothesis would be H0: = 55, and the

alternative hypothesis would be H1: 55. We put the values in the calculator and find that the

p-value is 0.000, and the t value is 11. Comparing the p-value to the level of significance we find

that the p-value is less than the level of significance, because of this we reject the null

hypothesis. There is sufficient evidence to support the claim that the mean number of Skittles per

bag is not 55.

Before we do a

hypothesis test, we have to meet 3 conditions, first that they are independent, more than or equal
to 10, and are less than 5% of the population. If these conditions are not met we cannot do the

hypothesis test. During this project there are many possible errors that could have been made, but

using a calculator eliminates most of these errors. There couldve been errors when entering the

data of each individuals bag of skittles and when adding up the total number of skittles. The

sampling method couldve improved if each person could compare the data they got to theirs, but

to do it with 51 people is a lot. Doing this project I was able to put the concepts learned in class

to something real and not just a story problem in the book, which makes it interesting.