Beruflich Dokumente
Kultur Dokumente
Math 1040
Instructor ping yu
Hannah Mamales
INTRODUCTION:
Have you wondered how many candies of different colors are there in a bag of skittles
2.17 ounce? Are the number of candies consistently same in each bag of skittle that we buy or
vary? We generally assume that a 2.17- ounce bag of skittle will contain the same number of
candies with identical number of candies of different colors. In reality it may not be true, because
every manufacturer has to deal with variation which is inherent in any process. There will be
always under- filling, or over- filling of the bags. This kind of variation cannot be removed
entirely but can be minimized to a very small degree.
This project will prove the above statement by using statistics which is the only science
that can deal with variation. To complete the project, everyone in the class purchased a 2.17-
ounce bag of skittles and counted the candies of different colors. (Red, purple, yellow, green and
orange) and the total numbers of candies of the bag. The total number of the class was 30, so
there were a total of 30 bags. This number may not represent the true population but it can
provide a good assessment of the parameters at the manufacturing process. After counting the
candies, the proportion of candies of each color was also determined. For the entire class sample,
the average and the spread of the number of candies were assessed by numerical calculations and
also plotting graphs. In summary, this project included organizing, analyzing the data using
descriptive statistics and using the concepts of inferential statistics to draw conclusions above the
sample data
Organizing and Displaying Categorical Data:
20% 20%
Red
Orange
Yellow
21% 20% Green
Purple
19%
Bar Chart for the number of candies in
descending order
390
380
382
370
360
360 360 359
350
340 343
330
320
Green Red Purple Orange Yellow
Class
data My data
Red 0.2 0.27
Orange 0.2 0.222
Yellow 0.19 0.159
Green 0.21 0.206
Purple 0.2 0.143
The above table shows that the overall class data for the proportion of the candies for
different color is very close to the proportion data from my own bag of skittles, except for
yellow, purple and red color candies. For the yellow and purple candies my bag showed a lower
proportion whereas for the red color my bag had a higher proportion.
The pie chart of the class data shows that the proportions of the candies of different color
are evenly distributed within a narrow range of .19 and 0.21. Which shows that the
manufacturing process of filling the bags of candies has high quality control with minimum of
variation. The Pareto chart shows that the maximum number of candies had green color
followed by red and purple. The least number of candies were of yellow color.
Organizing and Displaying Quantities data:
15
13
Frequency
0 2 0 0
50 55 60 65 70 MORE
total number of candies
Frequency
From the histogram and the box plot, we can say that the shape of the distribution is
skewed to the left. This observation can be also confirmed from the relative distance between the
third quartile and the median and then comparing it to the distance between the median and the
first quartile. Moreover, the whisker at the left side is longer compared to the whisker on the
right side. Also, the median is higher than the mean, which means the data is skewed to the left.
The graphs reflect what I expected to see, the reason being that the sample size of 30 is
relatively small so it can result in a skewed distribution. We may need a larger sample size to
obtain a more symmetrical distribution from the data. As the number of sample will increase, the
distribution will approach a bell shaped distribution according to the central limit theorem.
The mean number of candies in my bag was 63 which is higher than the average mean of
60.1 from the class data. It is also higher than the second and third quartile values of the class
data. However, it is below the maximum value.
REFLECTION
Quantitative data is defining as data that consist of numerical measures or counts. Some
examples of quantitative data are: height, distance and weight. Categorical data consist in
attributes, labels or non- numerical categories. Some example of categorical data is type of color,
gender and brand of product. Since the quantitative data is numerical we can perform meaningful
calculation which is not possible for categorical data which cannot be measure but it can be
counted after classifying the data of different categories.
For quantitative data it makes sense to calculate measure of center like mean, medium
and mode, it is also possible to examine the spread of the data by calculating the standard
deviation, variance, range quartiles, interquartile range. We can plot quantitative data by
histograms, box plot, dot plots, stem and leaf plots. This provides us with a visual assessment of
how the data is spread across the center.
For analyzing categorical data, there are only two ways to graph the data: pie chart and
bars charts. There is no meaningful average or mean values and measure of spread like the
standard deviation and variance for categorical data since the data is not numerical and such
calculation are not feasible. Categorical data consist of categories which can be counted and
summarized in a frequency table. If the number of categories are small, the data can by visually
assessed by making a pie chart and a bar chart. If the number a categories are very large, it is
preferable to represent them by a bar chart because a pie chart may not provide clear
representation of categories.
99 % Confidence Interval Estimate for the proportion
Sample proportion of yellow candy=343/1804=0.19
99% confidence interval
0.19+-2.575*sqrt 0.19(1-0.19)/1804
0.19+- 0.0238
(0.166, 0.214)
The 99 % confidence interval estimate for the true proportion of yellow candies is (0.166, 0.214)
Therefore, we are 99 % confident that the true value of the population proportion of the yellow
candies is contained between (0.166 and 0.214). In other words, we can say that out of hundred
sample of candies, the proportion of yellow candies for 99 bags will be inside the interval and
only one will be outside the interval.
The 95% confidence estimate for the true mean number of candies for bag is (58.954, 61.246).
Therefore, we can say that we are 95 % confident that the true mean value is contained between
(58.954, 61.246). In other words, we can say that out of hundred bags the mean value for number
of candies for the 95 bags will be fall inside the interval and only 5 will be outside of interval.
That 98% confidence interval estimate for the standard deviation is (2.348 <standard deviation <
4.379).
Therefore, we can say that we are 98% confident that the true value for the standard deviation of
the number of candies is contained inside the interval.
In other words, out of hundred skittle bags the standard deviation of 98 bags will be found inside
the interval and two will be outside.
Hypothesis Tests
In statistics, hypothesis testing is used for testing a claim about a property of a population. This
claim can be about a population mean, proportion, standard deviation or variance. For example, a
business who manufactures tires may claim that the average lifetime of their tires is around
20000 miles. To verify this claim techniques of hypothesis testing can be used and then a
decision can be made to reject or fail to reject this hypothesis. Using test statistics values we can
make inferences based on a certain level of significance.
Alpha = .01
Ho. = 55
H1 not equal to 55
n= 30, sample mean xbar = 60.1 sample standard deviation s =3.07
T = (X-m)/s/sqrt n = (60.1 -55)/3.07/sqrt 30 = 9.098
Pvalue = 0.000
Since pvalue (0.000) < alpha (.001)
We reject the Ho.
We have sufficient evident to conclude the mean number of skittles candies was different than
55.
Reflection
tConditions for interval estimates and hypothesis tests for population proportion:
In this case, the normal distribution is used as an approximation to the binomial distribution since
the proportion is treated as a binomial proportion. The condition to check before using this
approximation are:
np 5 and nq 5. Where n is the sample size and p and q are proportions.
Also the sample should be a simple random sample.
For our samples, the sampling can be assumed as random sampling since the skittle bags were
purchased by each student randomly from different locations.
n = 1804 and p = 0.2 therefore np = 361 which is > 5 and nq = which is > 5 therefore the
conditions were met for the estimation and testing.
Conditions for interval estimates and hypothesis tests for population mean:
The conditions for this case are:
The sample must be a simple random sample. This requirement is met since we can assume that
each bag was randomly selected.
The value of population standard deviation is unknown so we will use t distribution for both
estimating and hypothesis test. The requirements for t distribution is that the population should
be normally distributed or the sample size n > 30.
The sample size n for our class was = 30. Therefore, we can assume that the population is
normally distributed for our tests.
Possible Errors:
There can be sampling, measurement, calculation or counting error. Some student may have
counted incorrectly or included broken candies in the data. Another error can be Type I or Type
II error in Hypothesis testing.
The sampling method can be improved by taking larger sample sizes from larger skittles bags,
which will increase the reliability and validity of our estimates and reduce the Errors.
Reflection:
After completing the projects in this class, I have come to understand that statistical tools
are very valuable for collecting, organizing and analyzing information thus enabling us to make
better decisions about any kind of data which can be classified under numerical, interval, nominal
and ordinal level of measurement. If the data can be classified as numerical or categorical such as
in the current project, then statistic that describe the data can be obtained for useful comparison
between different units. For example, the mean and median can be calculated to indicate the
average or center of the numerical data. In addition, the range and standard deviations can be
This project has helped me to develop my problem-solving skills which can be very
usefully applied to perform and analyze any kind of observational or experimental studies in real-
world applications. I have learned that sample data should be collected in an appropriate way by a
process of random selection and thus should be free from any kind of bias in order to obtain
meaningful results. Once the data are collected then careful organization and graphical displays of
the data will help me to spot patterns and trends, before I perform any kind of calculations. For
example, simple charts such as pie charts, scatter plots, bar charts or Pareto charts can show me
patterns which can further guide me in using the most appropriate methods.
For the current project, tools of descriptive statistics proved to be very helpful in presenting
the skittles data in a suitable tabular and graphic form for an easy and clear understanding and
comprehension of the data. I was able to determine interval estimates with different confident
levels which showed how good and reliable the estimates were and was able to use tools of
inferential statistics to test claims in a scientific manner which were statistically significant. I
firmly believe that any kind of study or research which is done without the usage of statistical tools
can became biased and full of error and can give misleading results with incorrect conclusions. In
make sense of all of this information, I strongly believe that statistical tools and ways of thinking
are absolutely necessary. This project taught me to understand that in the real-world we should not
assume anything about any data or information which is given to us without doing proper
assessment with help of tools such as descriptive and or inferential statistics. Most kind of data are
subjected to chance, randomness and variability therefore it should be treated and analyzed by
scientific methods such as statistics which I think is the only tool to understand chance, randomness