Sie sind auf Seite 1von 9

Math 1040 Skittles Term Project

For this project, each of the 28 students in our class were asked to buy one
2.17oz. bag of Original Skittles candy and record the number of candies of each color
in the bag as well as the total number of candies in the bag. We were then asked to
submit our data to the instructor and he compiled it into a spreadsheet which was then
disseminated to the class.
Throughout the course of this project, I will determine the proportion of each color
in our class sample as well as calculate the mean number of candies per bag for the
class sample. I will also be calculating confidence intervals regarding true proportions
and true means and performing hypothesis testing on the data. To do the calculations I
will be using a TI-83 Plus calculator and will document my use of technology. I have
created charts, tables and graphs to display the categorical or qualitative (related to a
characteristic) data and quantitative (numerical) data obtained using the data for my
individual bag of candies and the collective data of the class. I will be further discussing
categorical vs. quantitative data in a later section.
The following tables contain the data for the bag of Skittles that I purchased and
the data for the class sample.
My Data

Number of Number of Number of Number of Number of Total


Red Candies Orange Yellow Green Purple Number
Candies Candies Candies Candies of
Candies
8 9 12 15 16 60

Class Data

Number of Number of Number of Number of Number of Total


Red Candies Orange Yellow Green Purple Number
Candies Candies Candies Candies of
Candies
361 338 351 312 361 1723

Next, I created a pie chart and a Pareto chart to display the proportion of each
color contained in our class sample of candies. Pie charts and Pareto charts are used
to display qualitative data. Pie charts are an excellent way to display parts of a whole
and Pareto charts are bar charts that are organized in descending order.
Proportion of Colors of Skittles Candies
in the Class Sample

312 361

338 361

351

Purple Red Yellow Orange Green

Proportion of Colors of Skittles Candies


in the Class Sample
370
361 (.210) 361 (.210)
360
351 (.204)
350

338 (.196)
Number of Candies

340

330

320
312 (.181)
310

300

290

280
Purple Red Yellow Orange Green
The graphs reflect what I expected to see given the numbers of the class sample.
I was surprised that the class proportion of green candies was the smallest in the class
sample at 18% whereas green candies made up 25% of my individual sample.

The next step of the project is to calculate the mean number of candies per bag
(x), the sample standard deviation (Sx) and the five number summary for the class
sample (minimum, Quartile 1, median, Quartile 3, maximum).
Using the TI-83 Plus calculator, I first entered all the data for the class sample into L 1 by
pressing STAT, selecting EDIT and then entering the values. From there I selected
STATCALC1-Var Stats2ndL1
Here are the results:
x = 61.5 Sx = 1.86 min = 57 Q1 = 60 med = 62 Q3 = 63 max = 65 n = 28 bags
# of candies in my bag = 60
I have created a histogram and a box plot in order to display these values.
Histograms and box plots are useful tools when displaying quantitative data. The
histogram displays the frequency of the number of candies per bag and the box plot
displays the five number summary of the class sample. The box plot show us that there
are no outliers since there are no values marked below Q1 or above Q3.

mean

median
min max

Q1 med Q3
ed

Number of Candies per Bag

The distribution appears to be approximately normal, possibly slightly skewed


left. My bag of Skittles falls exactly at Q1. It appears that this is in agreement with the
overall data of collected by the class.
So, what exactly is the difference between categorical data and quantitative
data?

Qualitative data or categorical data sorts individuals according to a characteristic,


trait, or attribute, etc. For example whether you have straight or crooked teeth, your
favorite flower, whether or not you like the smell of patchouli, what your favorite Big
Head Todd and the Monsters song is or, as it applies to this project, the color of the
Skittles.
Quantitative data is information that is measured and expressed numerically and
it makes sense to perform arithmetic operations on that data. Examples of quantitative
data include the age of the average student at SLCC, the percentage of time per day
my cockatoo spends screaming his head off, or the number of times the neighbors cat
has bolted into my house this week. Quantitative data as it relates to the Skittles project
would include the number of each color per bag and the total number of candies per
bag.
Pie charts and bar charts are good choices for graphing qualitative or categorical
data. A pie chart can be very effective in showing parts of the whole. Bar charts can be
used to display large numbers, distributions and multiple distributions of a given
qualitative variable. Stem and leaf and box and whiskers charts are not effective ways
to display qualitative data since they both rely on numerical data where arithmetic
operations may apply.

To effectively display quantitative data, histograms, stem and leaf and box and
whiskers can all be used effectively. Pie charts do not make sense with quantitative
data as they are best used to display part of a whole rather than comparison of data
across categories. With any graph, chart or plot it is important to pay attention to
graphics used, scale and consistency in order to achieve an accurate depiction of the
data.
For qualitative or categorical data, arithmetic operations such as addition,
subtraction, multiplication and division, do not make sense. For example adding Sister
Sweetly to Julianna and dividing by Bittersweet will not (cannot) produce any
meaningful results even though they are a great start to an awesome playlist. With
quantitative data we can build mathematical equations that will render meaningful
numerical results. For instance, the neighbors cat ran into my house twice on Sunday
and once on Tuesday, Wednesday, Thursday and Friday for a total of 6 cat in the house
episodes this week.

Next I will be creating confidence intervals using our class sample data on a
specified parameter of interest. Confidence intervals are used in inferential statistics. In
inferential statistics information gained from a sample of a population is used to make
inferences about a population as a whole. The general purpose of constructing a
confidence interval is to calculate a range that, with varying degrees of confidence
(hence the name confidence interval), will most likely include the unknown parameter of
interest.

99% confidence interval estimate for the true proportion of yellow candies
Using a TI-83 Plus calculator I entered:
STAT TESTS1PropZIntx:351n:1723C-level: .99Calculate
(.17872, .22871)
p = .204
n=1723
This means that we can state with 99% confidence that the true proportion of yellow
candies lies between .178 and .229. The proportion of yellow candies for our class
sample was .204, and we can see that it does lie within the 99% confidence interval
95% confidence interval for the true mean number of candies per bag
Using a TI-83 Plus calculator I selected STATEDIT and entered the values for the
number of candies per bag for each of the 28 bags obtained by the students in the class
into L1. Once finished entering the data, I entered:
STAT TESTSTIntervalDataList:L1Freq:1C-level: .95Calculate
(60.816, 62.255)
x = 61.536
Sx = 1.856
n = 28
This means that we can state with 95% confidence that the true mean number of
candies per bag lies between 60.816 and 62.255. The sample mean is given as 61.536
and matches the calculation of the 5 number summary from the previous section of this
project and falls within the boundaries of the interval.

Moving on to hypothesis testing. Hypothesis testing is used to test claims or


statements made about a population parameter. In order to conduct a hypothesis test,
one must form a hypothesis, that is, make a claim or statement about the population of
interest, collect appropriate data which will be used to test the hypothesis, and interpret
the data to determine whether your claim will be rejected or whether you will fail to reject
your claim.
The null hypothesis, denoted as H0, contains the accepted information, the status
quo. The alternative hypothesis, or H1, is contrary to the null hypothesis. The level of
significance, denoted as , is dependent on the severity of the consequences of making
a Type I error, which is defined as rejecting the null hypothesis when it is true.
Therefore, the more severe the consequences of rejecting the null when it is true, the
smaller the level of significance should be. Failing to reject the null hypothesis when the
alternative hypothesis is true is a Type II error or .

Using a 0.05 level of significance () test the claim that 20% of all Skittles are red.

Claim 20% of all Skittles are red


Counter Claim less than 20% or greater than 20% of all Skittles are red
H0: p = .20 (null hypothesis)

H1: p .20 (alternative hypothesis)


This is a two-tailed test. This is determined by the alternative hypothesis. If H1 states p
or is to a value, it is a two-tailed test. If H1 states that p or is < a value, it is a left-
tailed test. If H1 states that p or are > a value, it is a right-tailed test.

STATTESTS1-PropZTestp0: .20x:361n:1723propp0 Calculate or Draw

test statistic: z = .988


p-value = .323

= 0.05

p-value > therefore my decision is to fail to reject the null hypothesis. There is not
sufficient evidence to reject the claim that 20% of all Skittles are red.

Using a 0.01 level of significance () to test the claim that the mean number of
candies in a bag of Skittles is 55.
Claim the mean number of candies in a 2.17oz bag of Skittles is 55.
Counter Claim the mean number of candies in a 2.17oz bag of Skittles is greater than
or less than 55.
H0: = 55

H1: 55

two-tailed test

STATTESTST-TestData:55List: L1Freq:10 Calculate or Draw

t = 18.638
p=0

= 0.01

p-value < therefore my decision is to reject the null hypothesis. There is not sufficient
evidence to support the claim that the mean number of candies in a 2.17 oz. bag of
Skittles is 55.

Samples must meet certain conditions in order to construct confidence interval


estimates and perform hypothesis tests. These conditions are as follows:
Confidence interval for proportions:
1) simple random sample

2) n 0.05N

3) np(1-np) 10

Confidence interval for mean:


1) simple random sample

2) approximately normal distribution or n 30

3) is known (z-interval) is not known (t-interval)

Hypothesis Testing:
1) simple random sample
2) np(1-p) 10
3) n 0.05N

Our class sample did meet the conditions to perform the confidence interval
testing as well as the hypothesis testing. Although our sample size was less than 30, it
did have an approximately normal distribution and it is only required that the data meet
one of those conditions.
Some errors that could have been made using this data include the possibility of
students using a bigger or smaller package of Skittles. This could affect the mean
number of candies per bag as well as the proportion of each color and potentially create
an outlier. Students could miscount the total number of candies in their package or
miscount the numbers of each color.
The sampling method could be improved by increasing the sample size and
monitoring the size of bag used. In doing a bit of research I found that Skittles are
manufactured in multiple locations around the globe. I suppose it is possible that
different geographical locations could produce different proportions of colors of candies
per bag. This would require more investigation to determine if data would be
significantly altered if our sample were comprised of bags purchased from different
factories.
We all come into contact with statistical information all day, every day and we
may not realize it as such. From news stories to articles, politics to advertising,
statistics are everywhere. I have gained some insight into the power and usefulness of
statistics in things like pharmaceutical drug testing, consistency in manufacturing, crime
statistics and education. This class has also taught me that statistics can be faulty,
misleading, and possibly even dangerous if the standards of good statistics are not
adhered to. Particularly in politics and advertising, I have noticed that graphs can be
laid out and manipulated in such a way that the graphic does not accurately represent
the statistical information contained within it.
This has been a very challenging class for me but I have enjoyed learning the
material. I feel that it has helped me to improve my critical thinking skills and that that
will be beneficial to me in my day to day life as well as in any of my future classes.

Das könnte Ihnen auch gefallen