Sie sind auf Seite 1von 9

Corey Seggerty

MATH-1040-403-F16
Term Project
Skittles

This group project for Intro to Statistics utilized skittles in order to give us real world
experience with statistical calculations and theories. We were required to come together as a
group and collaborate and help each other with different aspects of statistics. We used statistical
software in order to create an accurate representation of the data provided by the class. This
helped supplement our homework with applicable knowledge. Each weeks work in turn built on
the previous assignments thus becoming more and more complex as the work progressed. Even
though working with candy may have appeared somewhat silly initially, the applications of using
descriptive statistics and graphic illustrations put all of our theoretical knowledge to the test and
into perspective.

Group 2
Intro to Stats
Term Project Part 2

Group Work
1. Determine the proportion of each color within the overall sample gathered by the class.
a. First I would hypothesize that the overall proportions of the classes Skittles will end up
with all five colors being roughly equal. I have no data to support this hypothesis as I am
not a fan of Skittles and havent previously purchased much of the product. I would
assume that equal parts of each color are added into some sort of machine that
randomizes them before dispensing them into the package. With my hypothesis I think
given a large enough a sample from the same batch would yield equal color distribution.
2.

3. No the class data does not represent a random sample. The population would be all bags of 2.17
ounce bag of Original Skittles for sale. As we can assume all members of our class are located in
Utah, bags of Skittles located elsewhere did not have the same probability of being entered into
our data set. Thus making a random sample impossible. This may not have a considerable result
in our observed data should it prove true that overall the color distribution in each bag is
completely random. Still, there may be intervening factors that we would not be able to detect
if we are only able to obtain a sample in Utah. It would be possible that only bags from a certain
factory are distributed in Utah. If we would have observed that in our sample a certain color is
overrepresented, this effect could be due to a malfunction in that particular factory. Just based
on this locally limited sample, we would not be able to make any valid conclusions about the
population of Skittles for sale.

Corey Seggerty
Intro to Stats Term Project Part 2 Individual work

Count Red
My Bag
Class Counts

18
275

Count
Orange
8
226

Count
Yellow
10
247

Count Green
15
232

Count
Purple
10
243

Total
61
1223

1. I expected to see a more even distribution of colors. In my bag the red proportion was
substantially higher than the orange proportion. Surprisingly, that also holds true for the class
totals. Red Skittles were observed overall the most whereas orange Skittles were in fact
observed the least. This count actually point to a trend. Obviously much more data would have
to be collected to come to the conclusion that maybe orange Skittles are overall less likely to be
encountered than red Skittles.
2. There is one bag of Skittles that only had 29 candies while another had 80 candies. These two
bags represent outliers as all other bags range from 45 to 63 Skittles. This could indicate that
maybe the wrong size bag was bought by the individual student or that somehow those bags
were different which also could influence the color of the candies contained in the bag. Outliers
generally can skew the results for summary statistics. Other than those two bags with less
overall candies, I could not find a significant outlier. If we had discovered a bag that completely
missed a color, or even more extreme a bag with just one color in it, that could have significantly
skewed both the graphs and the summary statistics.
3. The individual distribution from my bag does not exactly match the distribution in the sample.
As mentioned above, red was observed the most in both counts as well as orange the least.
Green was the second most common color in my sample while it only ranked fourth in the class
data.

Jonica Whitmore
Math 1040
9/15/2016
Term Project Part 3 Group 2 Portion

Mean and Standard Deviation of Total Candies


1. Mean 58.2
2. Standard Deviation 9
3. 5-Number Summary 29,57,59,62,80

Histogram

Box Plot

Corey Seggerty
Term Project Part 3
Individual Work

1.
i.
ii.

iii.

The shape of the distribution is unimodal (one distinct peak) and slightly skewed left
due to the bags with less candies which represent outliers.
As we were all asked to purchase bags of the same weight, I expected the candy
counts in each bag to be much more uniform. I did expect slight deviations due to
not all skittles weighing the same but I was surprised by the width of the distribution
ranging from one bag having just 29 candies whereas another had 80 candies. This
almost makes me suspect that some students purchased smaller or bigger bags than
required.
Overall my bag agrees with the numbers collected by the class. My bag contained 61
skittles whereas the class mean is just over 58 which places my result squarely
within the standard deviation of 9.

2. Categorical data is used to describe factors of a variable which are not measurable. In our specific
case, the color of the skittles was a categorical variable. While it makes sense to count how many skittles
per bag are a certain color, it would not make sense to order these colors or assign a hierarchy. It also
would not make sense to try to calculate a mean as these colors do not have a number associated with
them. Categorical data can be useful to describe distribution of attributes. Quantitative data on the
other hand allows a higher level of mathematical manipulation. The values of a quantitative variable,
such as the number of candies in each bag, can be ordered and measured. It makes sense to calculate a
mean and other indicators such as standard deviation and median actually tell us something useful
about the distribution of the data. In certain cases it may be possible to convert categorical data into
quantitative data but not without careful manipulation of the variables.
For categorical data, pie charts and bar charts are very useful to visualize the distribution. Box plots on
the other hand, have no meaning for categorical data as measurements such as means and standard
deviations dont apply. On the other hand, box plots, histograms and ogive graphs can be helpful as they
reveal outliers and the shape of a distribution. Pie charts for example are usually not useful to display
qualitative data.

Corey Seggerty
MATH-1040-403-F16
Term Project part 5 Reflection
Doing the group project gave me an opportunity to apply the theoretical material
conveyed in the lessons to a practical project. It sometimes was somewhat difficult to follow the
lessons by just solving the practice problems but the group project gave an opportunity to put it
all together in a fun way. The project made me create graphs and had me label them correctly
which will come in handy in a variety of settings. For example, in the coming semesters I will be
taking classes that will involve training and human resource material which will require in class
presentations. Being able to present a variety of data in graphs that are accurate and appropriate
for the material will be essential. This is also a skill which will be necessary when given any
presentation in a real world business setting and management meetings.
Additionally, I have been taking science classes this semester and the skills I have
acquired in statistics have come in handy in interpreting scientific sources for my biology and
geology class. For example, for a paper on climate change, I cited the Intergovernmental Panel
on Climate Change report which included meta-data on increasing levels of greenhouse gases in
the atmosphere. My statistical knowledge of linear equations and regression analysis helped me
make sense of the data presented. The work on the group project helped me to ultimately
formulate a concise conclusion based on the data.
The project also taught me valuable lessons regarding the interpretation of data. In
todays world, survey and research data is presented to prove certain points on a daily basis. I
have already begun to notice when journalists use certain editing techniques to misrepresent data.
For example, bar graphs with an inappropriately scaled x-axis or graphs that over-exaggerate

distances between data points. Graphs that are distorted in that way can mislead the public in
many different ways.

Das könnte Ihnen auch gefallen