Sie sind auf Seite 1von 12

Skittles Term Project

Math 1040
Instructor ping yu
Hannah Mamales

INTRODUCTION:

Have you wondered how many candies of different colors are there in a bag of skittles
2.17 ounce? Are the number of candies consistently same in each bag of skittle that we buy or
vary? We generally assume that a 2.17- ounce bag of skittle will contain the same number of
candies with identical number of candies of different colors. In reality it may not be true, because
every manufacturer has to deal with variation which is inherent in any process. There will be
always under- filling, or over- filling of the bags. This kind of variation cannot be removed
entirely but can be minimized to a very small degree.

This project will prove the above statement by using statistics which is the only science
that can deal with variation. To complete the project, everyone in the class purchased a 2.17-
ounce bag of skittles and counted the candies of different colors. (Red, purple, yellow, green and
orange) and the total numbers of candies of the bag. The total number of the class was 30, so
there were a total of 30 bags. This number may not represent the true population but it can
provide a good assessment of the parameters at the manufacturing process. After counting the
candies, the proportion of candies of each color was also determined. For the entire class sample,
the average and the spread of the number of candies were assessed by numerical calculations and
also plotting graphs. In summary, this project included organizing, analyzing the data using
descriptive statistics and using the concepts of inferential statistics to draw conclusions above the
sample data
Organizing and Displaying Categorical Data:

Proportion of candies of each color

20% 20%
Red
Orange
Yellow
21% 20% Green
Purple
19%
Bar Chart for the number of candies in
descending order
390

380
382

370

360
360 360 359
350

340 343

330

320
Green Red Purple Orange Yellow

Class
data My data
Red 0.2 0.27
Orange 0.2 0.222
Yellow 0.19 0.159
Green 0.21 0.206
Purple 0.2 0.143

The above table shows that the overall class data for the proportion of the candies for
different color is very close to the proportion data from my own bag of skittles, except for
yellow, purple and red color candies. For the yellow and purple candies my bag showed a lower
proportion whereas for the red color my bag had a higher proportion.
The pie chart of the class data shows that the proportions of the candies of different color
are evenly distributed within a narrow range of .19 and 0.21. Which shows that the
manufacturing process of filling the bags of candies has high quality control with minimum of
variation. The Pareto chart shows that the maximum number of candies had green color
followed by red and purple. The least number of candies were of yellow color.
Organizing and Displaying Quantities data:

Summary of Descriptive Statistics for the total number of candies.


Mean 60.1
Standard Deviation 3.07
Min 51
Q1 58.3
Q2 (median) 60.5
Q3 62
Max 65

Histogram of number of candies

15

13
Frequency

0 2 0 0

50 55 60 65 70 MORE
total number of candies

Frequency
From the histogram and the box plot, we can say that the shape of the distribution is
skewed to the left. This observation can be also confirmed from the relative distance between the
third quartile and the median and then comparing it to the distance between the median and the
first quartile. Moreover, the whisker at the left side is longer compared to the whisker on the
right side. Also, the median is higher than the mean, which means the data is skewed to the left.
The graphs reflect what I expected to see, the reason being that the sample size of 30 is
relatively small so it can result in a skewed distribution. We may need a larger sample size to
obtain a more symmetrical distribution from the data. As the number of sample will increase, the
distribution will approach a bell shaped distribution according to the central limit theorem.
The mean number of candies in my bag was 63 which is higher than the average mean of
60.1 from the class data. It is also higher than the second and third quartile values of the class
data. However, it is below the maximum value.
REFLECTION

Quantitative data is defining as data that consist of numerical measures or counts. Some
examples of quantitative data are: height, distance and weight. Categorical data consist in
attributes, labels or non- numerical categories. Some example of categorical data is type of color,
gender and brand of product. Since the quantitative data is numerical we can perform meaningful
calculation which is not possible for categorical data which cannot be measure but it can be
counted after classifying the data of different categories.

For quantitative data it makes sense to calculate measure of center like mean, medium
and mode, it is also possible to examine the spread of the data by calculating the standard
deviation, variance, range quartiles, interquartile range. We can plot quantitative data by
histograms, box plot, dot plots, stem and leaf plots. This provides us with a visual assessment of
how the data is spread across the center.

For analyzing categorical data, there are only two ways to graph the data: pie chart and
bars charts. There is no meaningful average or mean values and measure of spread like the
standard deviation and variance for categorical data since the data is not numerical and such
calculation are not feasible. Categorical data consist of categories which can be counted and
summarized in a frequency table. If the number of categories are small, the data can by visually
assessed by making a pie chart and a bar chart. If the number a categories are very large, it is
preferable to represent them by a bar chart because a pie chart may not provide clear
representation of categories.
99 % Confidence Interval Estimate for the proportion
Sample proportion of yellow candy=343/1804=0.19
99% confidence interval
0.19+-2.575*sqrt 0.19(1-0.19)/1804
0.19+- 0.0238
(0.166, 0.214)

95% Confidence Interval Estimate for the mean


The sample mean is equal to 60.1, T value =2.045
95 % interval
60.1 + - 2.045 * 3.07/sqrt 30
60.1 + - 1.1462
(58.954, 61.246)
98 % confidence interval estimate for the standard deviation
Sqrt [(30-1) (3.07) ^2/49.588] < std dev < sqrt[ (30-1) (3.07)^2/14.256]
(2.348 < std dev < 4.379)
Discussion
Confidence interval is range of values to estimate the true value of a population parameter, such
as mean, proportion of standard deviation. They provide more information than the point
estimate because the also mention that sampling error is involved. A confidents interval is
always associate with a confidents level which should the how confident we ate about that
estimate.

The 99 % confidence interval estimate for the true proportion of yellow candies is (0.166, 0.214)
Therefore, we are 99 % confident that the true value of the population proportion of the yellow
candies is contained between (0.166 and 0.214). In other words, we can say that out of hundred
sample of candies, the proportion of yellow candies for 99 bags will be inside the interval and
only one will be outside the interval.
The 95% confidence estimate for the true mean number of candies for bag is (58.954, 61.246).
Therefore, we can say that we are 95 % confident that the true mean value is contained between
(58.954, 61.246). In other words, we can say that out of hundred bags the mean value for number
of candies for the 95 bags will be fall inside the interval and only 5 will be outside of interval.
That 98% confidence interval estimate for the standard deviation is (2.348 <standard deviation <
4.379).
Therefore, we can say that we are 98% confident that the true value for the standard deviation of
the number of candies is contained inside the interval.
In other words, out of hundred skittle bags the standard deviation of 98 bags will be found inside
the interval and two will be outside.

Hypothesis Tests
In statistics, hypothesis testing is used for testing a claim about a property of a population. This
claim can be about a population mean, proportion, standard deviation or variance. For example, a
business who manufactures tires may claim that the average lifetime of their tires is around
20000 miles. To verify this claim techniques of hypothesis testing can be used and then a
decision can be made to reject or fail to reject this hypothesis. Using test statistics values we can
make inferences based on a certain level of significance.

Alpha=.05 P =0.2 (for red candies) n =1804


H0: p =0.2
H1: p not equal to 0.2
Z = (Phat p)/sqrt (p (1-p)/n)
= (0.2-0.2)/0.2(1-0.2)/1804 = 0
Therefore, pvalue = 0.5

Since the above p value of 0.5 >.05 we fail to reject Ho.


We do not have sufficient evident to conclude that the proportion for red candies is different than
0.2
Hypothesis test for the mean number of candies

Alpha = .01
Ho. = 55
H1 not equal to 55
n= 30, sample mean xbar = 60.1 sample standard deviation s =3.07
T = (X-m)/s/sqrt n = (60.1 -55)/3.07/sqrt 30 = 9.098
Pvalue = 0.000
Since pvalue (0.000) < alpha (.001)
We reject the Ho.
We have sufficient evident to conclude the mean number of skittles candies was different than
55.

Reflection
tConditions for interval estimates and hypothesis tests for population proportion:
In this case, the normal distribution is used as an approximation to the binomial distribution since
the proportion is treated as a binomial proportion. The condition to check before using this
approximation are:
np 5 and nq 5. Where n is the sample size and p and q are proportions.
Also the sample should be a simple random sample.
For our samples, the sampling can be assumed as random sampling since the skittle bags were
purchased by each student randomly from different locations.
n = 1804 and p = 0.2 therefore np = 361 which is > 5 and nq = which is > 5 therefore the
conditions were met for the estimation and testing.

Conditions for interval estimates and hypothesis tests for population mean:
The conditions for this case are:
The sample must be a simple random sample. This requirement is met since we can assume that
each bag was randomly selected.
The value of population standard deviation is unknown so we will use t distribution for both
estimating and hypothesis test. The requirements for t distribution is that the population should
be normally distributed or the sample size n > 30.
The sample size n for our class was = 30. Therefore, we can assume that the population is
normally distributed for our tests.

Conditions for interval estimates for population standard deviation:


The conditions for this case are:
a. The sample should be a simple random sample. This requirement is met since we can
assume that each bag was randomly selected.
b. The population of the skittle bag candies must be normally distributed regardless of
sample size. For our samples, we do not have information that the population is normally
distributed, but the sample size is 30. So, this requirement may have been met for our
samples. Besides we can assume that the skittles are manufactured on a large quantity so
the parent population is likely to be normally distributed.

Possible Errors:
There can be sampling, measurement, calculation or counting error. Some student may have
counted incorrectly or included broken candies in the data. Another error can be Type I or Type
II error in Hypothesis testing.
The sampling method can be improved by taking larger sample sizes from larger skittles bags,
which will increase the reliability and validity of our estimates and reduce the Errors.
Reflection:

After completing the projects in this class, I have come to understand that statistical tools

are very valuable for collecting, organizing and analyzing information thus enabling us to make

better decisions about any kind of data which can be classified under numerical, interval, nominal

and ordinal level of measurement. If the data can be classified as numerical or categorical such as

in the current project, then statistic that describe the data can be obtained for useful comparison

between different units. For example, the mean and median can be calculated to indicate the

average or center of the numerical data. In addition, the range and standard deviations can be

determined to assess how spread out the data is.

This project has helped me to develop my problem-solving skills which can be very

usefully applied to perform and analyze any kind of observational or experimental studies in real-

world applications. I have learned that sample data should be collected in an appropriate way by a

process of random selection and thus should be free from any kind of bias in order to obtain

meaningful results. Once the data are collected then careful organization and graphical displays of

the data will help me to spot patterns and trends, before I perform any kind of calculations. For

example, simple charts such as pie charts, scatter plots, bar charts or Pareto charts can show me

patterns which can further guide me in using the most appropriate methods.

For the current project, tools of descriptive statistics proved to be very helpful in presenting

the skittles data in a suitable tabular and graphic form for an easy and clear understanding and

comprehension of the data. I was able to determine interval estimates with different confident

levels which showed how good and reliable the estimates were and was able to use tools of

inferential statistics to test claims in a scientific manner which were statistically significant. I
firmly believe that any kind of study or research which is done without the usage of statistical tools

can became biased and full of error and can give misleading results with incorrect conclusions. In

real-world applications there is huge amount of information that is waiting to be analyzed. To

make sense of all of this information, I strongly believe that statistical tools and ways of thinking

are absolutely necessary. This project taught me to understand that in the real-world we should not

assume anything about any data or information which is given to us without doing proper

assessment with help of tools such as descriptive and or inferential statistics. Most kind of data are

subjected to chance, randomness and variability therefore it should be treated and analyzed by

scientific methods such as statistics which I think is the only tool to understand chance, randomness

and variation in the real-world.

Das könnte Ihnen auch gefallen