Beruflich Dokumente
Kultur Dokumente
For this project, each of the 28 students in our class were asked to buy one
2.17oz. bag of Original Skittles candy and record the number of candies of each color
in the bag as well as the total number of candies in the bag. We were then asked to
submit our data to the instructor and he compiled it into a spreadsheet which was then
disseminated to the class.
Throughout the course of this project, I will determine the proportion of each color
in our class sample as well as calculate the mean number of candies per bag for the
class sample. I will also be calculating confidence intervals regarding true proportions
and true means and performing hypothesis testing on the data. To do the calculations I
will be using a TI-83 Plus calculator and will document my use of technology. I have
created charts, tables and graphs to display the categorical or qualitative (related to a
characteristic) data and quantitative (numerical) data obtained using the data for my
individual bag of candies and the collective data of the class. I will be further discussing
categorical vs. quantitative data in a later section.
The following tables contain the data for the bag of Skittles that I purchased and
the data for the class sample.
My Data
Class Data
Next, I created a pie chart and a Pareto chart to display the proportion of each
color contained in our class sample of candies. Pie charts and Pareto charts are used
to display qualitative data. Pie charts are an excellent way to display parts of a whole
and Pareto charts are bar charts that are organized in descending order.
Proportion of Colors of Skittles Candies
in the Class Sample
312 361
338 361
351
338 (.196)
Number of Candies
340
330
320
312 (.181)
310
300
290
280
Purple Red Yellow Orange Green
The graphs reflect what I expected to see given the numbers of the class sample.
I was surprised that the class proportion of green candies was the smallest in the class
sample at 18% whereas green candies made up 25% of my individual sample.
The next step of the project is to calculate the mean number of candies per bag
(x), the sample standard deviation (Sx) and the five number summary for the class
sample (minimum, Quartile 1, median, Quartile 3, maximum).
Using the TI-83 Plus calculator, I first entered all the data for the class sample into L 1 by
pressing STAT, selecting EDIT and then entering the values. From there I selected
STATCALC1-Var Stats2ndL1
Here are the results:
x = 61.5 Sx = 1.86 min = 57 Q1 = 60 med = 62 Q3 = 63 max = 65 n = 28 bags
# of candies in my bag = 60
I have created a histogram and a box plot in order to display these values.
Histograms and box plots are useful tools when displaying quantitative data. The
histogram displays the frequency of the number of candies per bag and the box plot
displays the five number summary of the class sample. The box plot show us that there
are no outliers since there are no values marked below Q1 or above Q3.
mean
median
min max
Q1 med Q3
ed
To effectively display quantitative data, histograms, stem and leaf and box and
whiskers can all be used effectively. Pie charts do not make sense with quantitative
data as they are best used to display part of a whole rather than comparison of data
across categories. With any graph, chart or plot it is important to pay attention to
graphics used, scale and consistency in order to achieve an accurate depiction of the
data.
For qualitative or categorical data, arithmetic operations such as addition,
subtraction, multiplication and division, do not make sense. For example adding Sister
Sweetly to Julianna and dividing by Bittersweet will not (cannot) produce any
meaningful results even though they are a great start to an awesome playlist. With
quantitative data we can build mathematical equations that will render meaningful
numerical results. For instance, the neighbors cat ran into my house twice on Sunday
and once on Tuesday, Wednesday, Thursday and Friday for a total of 6 cat in the house
episodes this week.
Next I will be creating confidence intervals using our class sample data on a
specified parameter of interest. Confidence intervals are used in inferential statistics. In
inferential statistics information gained from a sample of a population is used to make
inferences about a population as a whole. The general purpose of constructing a
confidence interval is to calculate a range that, with varying degrees of confidence
(hence the name confidence interval), will most likely include the unknown parameter of
interest.
99% confidence interval estimate for the true proportion of yellow candies
Using a TI-83 Plus calculator I entered:
STAT TESTS1PropZIntx:351n:1723C-level: .99Calculate
(.17872, .22871)
p = .204
n=1723
This means that we can state with 99% confidence that the true proportion of yellow
candies lies between .178 and .229. The proportion of yellow candies for our class
sample was .204, and we can see that it does lie within the 99% confidence interval
95% confidence interval for the true mean number of candies per bag
Using a TI-83 Plus calculator I selected STATEDIT and entered the values for the
number of candies per bag for each of the 28 bags obtained by the students in the class
into L1. Once finished entering the data, I entered:
STAT TESTSTIntervalDataList:L1Freq:1C-level: .95Calculate
(60.816, 62.255)
x = 61.536
Sx = 1.856
n = 28
This means that we can state with 95% confidence that the true mean number of
candies per bag lies between 60.816 and 62.255. The sample mean is given as 61.536
and matches the calculation of the 5 number summary from the previous section of this
project and falls within the boundaries of the interval.
Using a 0.05 level of significance () test the claim that 20% of all Skittles are red.
= 0.05
p-value > therefore my decision is to fail to reject the null hypothesis. There is not
sufficient evidence to reject the claim that 20% of all Skittles are red.
Using a 0.01 level of significance () to test the claim that the mean number of
candies in a bag of Skittles is 55.
Claim the mean number of candies in a 2.17oz bag of Skittles is 55.
Counter Claim the mean number of candies in a 2.17oz bag of Skittles is greater than
or less than 55.
H0: = 55
H1: 55
two-tailed test
t = 18.638
p=0
= 0.01
p-value < therefore my decision is to reject the null hypothesis. There is not sufficient
evidence to support the claim that the mean number of candies in a 2.17 oz. bag of
Skittles is 55.
2) n 0.05N
3) np(1-np) 10
Hypothesis Testing:
1) simple random sample
2) np(1-p) 10
3) n 0.05N
Our class sample did meet the conditions to perform the confidence interval
testing as well as the hypothesis testing. Although our sample size was less than 30, it
did have an approximately normal distribution and it is only required that the data meet
one of those conditions.
Some errors that could have been made using this data include the possibility of
students using a bigger or smaller package of Skittles. This could affect the mean
number of candies per bag as well as the proportion of each color and potentially create
an outlier. Students could miscount the total number of candies in their package or
miscount the numbers of each color.
The sampling method could be improved by increasing the sample size and
monitoring the size of bag used. In doing a bit of research I found that Skittles are
manufactured in multiple locations around the globe. I suppose it is possible that
different geographical locations could produce different proportions of colors of candies
per bag. This would require more investigation to determine if data would be
significantly altered if our sample were comprised of bags purchased from different
factories.
We all come into contact with statistical information all day, every day and we
may not realize it as such. From news stories to articles, politics to advertising,
statistics are everywhere. I have gained some insight into the power and usefulness of
statistics in things like pharmaceutical drug testing, consistency in manufacturing, crime
statistics and education. This class has also taught me that statistics can be faulty,
misleading, and possibly even dangerous if the standards of good statistics are not
adhered to. Particularly in politics and advertising, I have noticed that graphs can be
laid out and manipulated in such a way that the graphic does not accurately represent
the statistical information contained within it.
This has been a very challenging class for me but I have enjoyed learning the
material. I feel that it has helped me to improve my critical thinking skills and that that
will be beneficial to me in my day to day life as well as in any of my future classes.