Sie sind auf Seite 1von 8

Kanyon King

The team project we worked on in this statistics class involved putting our learning to work with
something real. To start, everyone in the class went and bought their own 2.17 ounce package of
skittles. We each analyzed the data of our individual bags. Afterwards our professor compiled all
of the data as a sample in order to do further analyzation, putting some of the concepts we were
learning to work.
The first task was to analyze our own 2.17 ounce bag of skittles:
Red: 06 – Orange: 17 – Purple: 15 – Yellow: 09 – Green: 15 -- -- Total: 62

The next individual task involved comparing our own bags to the sample data of the class.
Team Project Part 2 Individual
Red Orange Yellow Green Purple Total
Proportion Proportion Proportion Proportion Proportion Count

My Bag 9.7% 27.4% 14.5% 24.2% 24.2% 62

Class 21.3% 19.3% 19.7% 20.5% 19.2% 6255


Counts

After seeing the proportions of the class, it makes me wonder whether by bag was heavily
disproportioned or if there were a few people who just made up numbers that were close to a 20%
representation of each color. If the numbers are all (or mostly) true, it is interesting that the sample
showed an average representation of 20% for each color. I think that most bags of skittles that I have
eaten have always had more of one color and usually a color with very few skittles. If the numbers are
correct, then it would appear my orange and red very much outliers. If enough individual results were
included, they would most likely be balanced out if the goal is a 20% representation of each color in
every bag, which seems to be what the data is showing.

Looking at just my skittles, the outliers might have an impact on graphics or summary statistics
depending on what measurements were used. In a graph, it may appear that the red and yellow skittles
are under represented while the other colors are almost evenly represented. Just looking at the
summary here of proportions it looks like the red and yellow skittles are very low. But, if we look at the
sample it would appear all colors are represented very close to evenly.

The distribution of colors in the total class data does not match my single bag of candies. The
distribution is the total class data is pretty even and shows all colors to have a fairly equal
representation. My bag was rather low on red and yellow, while having an abnormally high proportion
of the other three colors when compared to the sample data.

For the group portion of part 2, we were to organize and display the categorical data of colors.
This portion involved guessing and comparing actual proportions, creating charts, and creating
tables to represent the data.
In a 2.17 oz bag of Skittles, are the colors equally distributed? In an observational study, our
class sampled 104 bags of Skittles to determine the color proportions. Out of the 104 Bags,
there were 6255 total Skittle candies. Our group, Team 6, believed we would see each color
appear at an equal percentage of 20%. We believe that each color would be represented
equally so that it is appealing to customers’ tastes and is aesthetically appealing. Here is
what we found:
Red Orange Yellow Green Purple

Total Candies 1333 1208 1228 1282 1204

Expected 20% 20% 20% 20% 20%


Proportion

Observed 21.3% 19.3% 19.6% 20.5% 19.2%


Proportion

The numbers in a Pareto Chart would be represented like this :


In a Pie Chart, our data would be represented like this:
In this study, the students of an online class were each asked to purchase a 2.17-ounce bag of
skittles. We determined that all 2.17-ounce skittles would be the population. A simple random
sample would mean that all 2.17-ounce bags of skittles had an equal chance to be chosen which
does not seem to be the case because locations would determine the stores that the Skittles
were purchased at. Ogden may have no representation, and New York most definitely does not.
There is absolutely no way all 2.17-ounce bags of skittles in the world had an equal chance to be
picked, therefore it is not a simple random sample. We concluded it was convenience sampling -
- the sample was picked just based on the class, each student went and grabbed a package of
Skittles from the closest store. Our convenience sample consisted of 104 bags being chosen.

The individual work of the next part, part 3, involved discussing our findings about the variable
of total candies in each bag. We also were to discuss the differences between categorical and
quantitative data.
Looking at the histogram for the data, I would say that the shape of the distribution is skewed
right. If looking at the boxplot, the distribution looks to be bell shaped, or symmetrical, because
the outliers are identified. I did expect to see something like this, yes. It would make the most
sense for every bag of skittles to be relatively close in total count of candies, with some outlying
bags that have quite a few more or quite a few less just because of the way the candies are
packaged. Out of 104 bags of skittles, the median was 60 and my bag had 62 total skittles which
is very close.
Categorical data breaks the data in to categories like colors or male/female. In a sense, it gives
meaning to the numerical data. Most of the time categorical data is better displayed as a pie chart
or bar graph. Something like the number of yes responses and the number of no responses can be
graphed on a pie chart, and the categories would be ‘no’ and ‘yes’. Quantitative data or,
quantitative variables, are numeric variables. Quantitative data can be organized as categorical
data, but the numerical value would represent the quantitative data. Quantitative data can
generally be added together, as it is numerical data, where as categorical data cannot be added
together, like ‘yes’ and ‘no’, or ‘cat’ and ‘dog’. Quantitative data would make the most sense on
a line graph, histogram, boxplot, or scatter plot, as all of these require numerical data. Just as
quantitative data would no make much sense as a pie chart, categorical data would not make
sense as a histogram or scatterplot.

The group portion for this part required a couple of things. First, we were to compute a few
measures for total candies in each bag such as mean, standard deviation, and a 5-number
summary. We were also to create a frequency histogram and a box plot.
1. Using the total number of candies in each bag in our class sample, compute the following
measures for the variable “Total candies in each bag”:

(a) mean number of candies per bag


60.144231 Rounded: 60.1

(b) standard deviation of the number of candies per bag


3.6428365 Rounded: 3.6

(c) 5-number summary for the number of candies per bag Min: 45, Q1: 58, Median: 60, Q3: 62,
Max: 82
2. Create a frequency histogram for the variable “Total candies in each bag”.
3. Create a box plot for the variable “Total candies in each bag”.
Part 4 involved explaining the purpose and meaning of a confidence interval:
A confidence interval explains the certainty of sample statistics. Confidence intervals can be

used to describe the likelihood that additional samples would contain the true population

parameter. For example, if a survey was done and an interval estimate was obtained, a

confidence level could be used to describe the certainty of the estimate. A 95% confidence

interval would mean that after using the same sampling method to select different samples, we

could expect the true population parameter to fall within the interval estimates 95% of the time.

The group portion of part 4 involved actually creating confidence intervals for the proportion of
yellow candies, as well ad the population mean number of candies per bag. We were then to
interpret our findings.
Confidence Level Intervals
Construct a 99% confidence interval estimate for the population proportion of yellow candies.
The number of yellow candies was 1228 out of 6225 total candies. Since we are looking for the
proportion of yellow candies, we chose to construct a Z interval for a population proportion. The
data was obtained from a simple random sample. The data sample fits the normal model. The
data sample is greater than 10 and is less than 5% of total population of Skittles candies. Here is
what we estimate: x = 1228 n = 6225 z value for 99% = 2.576 p = 1228/6225 =.197

lowerbound upperbound

So, our estimate of 99% confidence interval is (.184, .210)


Our margin of error is +/- .0130
We are 99% confident that the proportion of yellow candies in any 2.17oz bag of Skittles is
between 18.4% and 21%.
Construct a 90% confidence interval estimate for the population mean number of candies per
bag.
The mean number of candies per bag is 60.14 pieces. Since we are looking for a population
mean of candies per bag, we chose to construct a T interval for a population mean. The data is
was obtained from a simple random sample. It is quantitative, as it is a countable value. The data
sample is greater than 30, yet less than 5% of the total population of Skittles candies. Here is
what we estimate: x=60.14 n=104 sx= 3.643
t value for 90% = 1.645

lowerbound upperbound
So, our estimate of 90% interval confidence is (59.55, 60.73)
Our margin of error is +/- .588
We are 90% confident that the mean number of candies per 2.17oz bag of Skittles would be
between 59.55 and 60.73 pieces.

To Reflect:
The statistics skills applied in this project were skills that we learned in our homework
assignments. I think things like standard deviation, or probability of an outcome occurring x
amount of times will all come in handy during my journey to become a software developer. I am
not sure how it may correlate yet, but I could see how it would be useful to be able to determine
probabilities when writing code. I also got plenty of practice with Stat Crunch and my ti-84
calculator, which are skills that could come in handy in my next math class or in any future
statistical problems I may run in to.
It was interesting to directly use the skills we learned in a real-world problem. Applying the
homework lessons to something real definitely helped me to relate the work to possible realworld
applications. It was really quite something to learn that a lot of statistics given in the news or in
magazines are rarely done properly or with a large enough sample to provide accurate data.
When looking at the proportions of skittles, it was interesting to see how the proportions became
more normalized, or closer to there advertised values, as the sample size grew. We also used
graphs that were properly scaled, which I learned is also a common tool to trick readers in to
thinking the data is a lot more skewed or a lot less skewed than it actually is.
There were a lot of frustrating concepts in this project and I think that the real-world application
of these concepts reinforced the learned material. As with anything difficult, I believe that this
project did strengthen my problem solving skills, learning and overcoming obstacles or difficult
material is a great way to improve said skills. This project also gave me new ways to tackle
problems with statistics.

Das könnte Ihnen auch gefallen