Sie sind auf Seite 1von 10

Math 1040 Statistics

Term Project
Blake Freeman

This project consists of several parts from part 2 to part 5. Each part
consists of a group section and an individual section. The project was
based on a package of skittles. Each person in the class was to buy a
bag of skittles and count the number of each color of candies in the
bag. We reported these counts to the instructor who then compiled
each individuals data into data for the entire class. This project used
that data to do everything from create charts to computing
confidence intervals from the data.

Term Project Part 2 Group

This sample is not a true random sample because in a random sample the
population that is choses to participate are chosen randomly. In this case,
we were not randomly selected, we were assigned because we were part of
a class. We do have a random element in this because each individual
randomly chose what store to buy the skittles from and it is a little bit more
of an accurate sampling. However, the bigger the sampling, the closer our
results will be to representing the population. The population in our study
would be the bags of skittles themselves. We were impressed that the
larger the sample was, the more the proportions of the different colors
seemed to even out. In the class sample, each color was very close to
representing a 20% proportion each. While each individual bag may have
been skewed with one color significantly smaller than the others.

Term Project Part 2


Individual

My Bag
Class
Counts

Red
Orange
Yellow
Green
Purple
Count 16 Count
Count
Count
8
13
10 Count 13
1164
1117
1189
1087
1093

Total
60
565
0

The graphs that we constructed for the group portion of this


assignment, along with the graph shown here, combined with a little
bit of number crunching, gives a pretty good picture of the data we
are analyzing. For the most part there has not been anything that I
have seen that really surprised me. I was certain before crunching
the numbers that each bag of skittles would contain the same
amount of candies. I came to that conclusion because the weight of
each bag had to be 2.17 ounces. There are a couple of things that
did stand out to me. First, when you construct a relative frequency
graph from the class counts, each color
in the sample evens out to just around 20% for each color. My
sample was not as
uniform. In fact the count of orange candies in my bag was
significantly less than the others, only accounting for 13% of the
total number. As far as how this may affect the graphics or
summary statistics, if the sample is small like my single bag of
skittles, the graphics and statistics would be much more skewed
than a much larger size sample like out class sample.

Term Project Part 3 Group


Total Skittles in each Bag Measurements

Mean # of candies per bag


Stand. Dev. Of candies per
bag
Min
Q1
Median
Q3
Max

60.1
5.6
37
58
60
62
82

Term Project Part 3 Individual


When we put the data gathered from the class sample of the total
number of candies into a box-plot, it was readily apparent that the distribution
of candies was very small indeed. In fact, the range of the box plot was only 4
candies! This is not real surprising to me because there are strict regulations
on the weight of these bags of candy to make sure they are consistent. I was
somewhat impressed to see just how consistent the distribution was on such a
large sample. My particular bag of candy contained 60 pieces which was the
exact value of the mean for the class sample.
Categorical, or sometimes called qualitative or attribute variables, are
variables that can be put into countable numbers of categories or diferent
groups. These groups are such that they cannot be placed in any logical order
by category. For example, in out class project the categorical variable would be
red, orange, yellow, green, and purple. Obviously one could
order these by how they appear in a rainbow or something like that, but there
really is not any definite logical way to arrange them. Quantitative variables are
variables are variables that can be ordered and measured. Often, you will collect
both types of data when exploring a single subject, because categorical variables are
often used to group or subset the data in graphs or analyses. Here are some examples
of categorical and quantitative data that you could collect when exploring the same
subject: (minitab.com). Some other examples of categorical variables would be the
different animal types in a bag of animal crackers. Whereas the number of crackers in
a bag would be quantitative. Quantitative variables can be displayed in any frequency
graph, such as bar graphs and pie charts. Qualitative variables can be shown in line
graphs, pie charts, and histogram graphs.

Term Project Part 4 Group


Group project part 4
Construct a 99% confidence interval estimate for the population proportion of yellow
candies.
N= 5650
Yellow = 1189
Critical level = 1 - .99 = .01 .01/2 = .005 using calculator = 2.576
Phat = 1189/5650 = .21
Lower bound: .21 2.576*sqr root *(.21*(1- .21)/5650) = .0139587
Upper bound : .21 + 2.576* sqr root *(.21*(1- .21)/5650) = .2239587
We are 99% confident that the population proportion of yellow candies per bag is
between .0139587 and .2239587
Construct a 95% confidence interval estimate for the population mean number of candies
per bag.
Using calculator List 1 Frequency blank
Xbar= 60.10638298
S = sample deviation 5.556103077
1-.95 = .05 .05/2 = .025
N = 94
Degrees of freedom = n-1 = 93
T.975 using calculator = 1.9858
Lower bound = 60.10638298 1.9858(5.556103077/sqrt(94) = 58.968
Upper bound = 60.10638298 + 1.9858(5.556103077/sqrt(94) = 61.2444
We are 95% confident that the population mean for the number of candies in each bag is
between 58.968 and 61.2444
Construct 98% confidence interval estimate for the population standard deviation of the
number of candies per bag
Xbar = 60.1068298
S = 5.56103077
1-.98 = .02 .02/2 = .01
N= 94
Df = 94-1 = 93

X^2 = 124.116 X^2 = 61.754

Lower sqrt((93*30.92506)/124.116) = 4.8137


Upper sqrt((93*30.92506)/61.754) = 6.81835

Term Project Part 4 Individual


Confidence intervals are values taken from recurring samples of data. These
samples of data are used to form an interval in which a certain value or parameter is
likely to occur. Confidence intervals
are used in statistics to predict all kinds of things from determining the likely life of a
light bulbs to
determining the odds of winning or losing a bet. It is important to note that confidence
intervals only give information about the parameter value itself, and not the distribution
of values. Also, the larger the sample population the more accurate your interval will
be.

Term Project Part 5 Reflection


Statistics has been quite a challenging class this semester.
There has been a ton of topics to study crammed into a short
summer semester. The speed at which this class went was truly
what made it challenging. I never realized just how much there was
to the statistics field. Who knew that you could make a term project
out of a single bag of skittles? I understood the histograms and the
pie charts before I took this class but there has been so much more.
As I think through the things I have learned I really think the most
useful would be the probability forecasts taken from sample data.
This could be very useful in life in general. Being able to take sample
data and predict the outcome with diferent degrees of confidence is
extremely useful. The classwork was full of story problems that
forced me to not only to know how to solve the problem, but also
how to arrange the data correctly from realistic scenarios. It was a
real eye-opener to be able to hear certain statistics reported on the
news or in news articles online and know what they are talking about

and what the statistics actually mean. Another thing that impressed
me is how people or organizations can manipulate graphs and
statistics to skew the truth about what they are trying to show In a
graph. The part about starting graphs at values other than zero to
skew the facts was a real eye-opener as well. I do think that the
information in this class could be used in the computer science field.
I dont know how much code I will be writing to compute diferent
statistical problems, but at least now I know I could if I had to.

Das könnte Ihnen auch gefallen