Sie sind auf Seite 1von 9

Term Project Part 8

Andrea Ramos
-Part 2: Sampling
Bag 1: row #10; red 13, orange 12, yellow 16, green 13, purple 6, total of candies
60
Bag 2: row #20; red 18, orange 13, yellow 7, green 7, purple 17, total of candies
62
Bag 3: row #17; red 10, orange 14, yellow 14, green 16, purple 7, total of candies
61
Sample totals: red 41, orange 39, yellow 37, green 36, and purple 30, total of
candies 183
My sample total: 58
-We randomly selected the bags by just picking 3 numbers between 1 and 23.
-Cluster Sampling
Divided the population into sections (per bag) then randomly selected 3 of the
bags and used all of the candy data from those bags selected.
The possible errors that could have been made using this data are that the sample
size of three bags might be too little to represent the total population of 23 bags. It
could be improved by using more bags in the sample.
I do not think that our sample is representative of the class because it is only 3
bags of the whole 23 which is only 13%.
-Part 3 graphical displays
Candy color is categorical because the data is a name or a label, not numbers,
representing counts or measures. Even though the variable can be put into a countable
number of categories or different groups.
It isnt really appropriate to talk about the shape of the distribution of for candy color
because the numbers dont vary much so the shape isnt very significant. . There isnt
much variations between the colors

NUMBER OF SKITTLES

350
300

291

301

295
243

250

Green

Purple

250
200
150
100
50
0
Red

Orange

Yellow

COLOR OF SKITTLES
Total Skittles of each color for 23 bags

TOTAL AMOUNT OF SKITTLE COLORS FOR SAMPLE


NUMBER OF SKITTLES

350

301

300

295

291

250

250

243

PURPLE

GREEN

200
150
100
50
0
ORANGE

YELLOW

RED

COLOR OF SKITTLES

The number of candies per bag is quantitative because it is representing the counts with
numbers and not labels or names.
It is appropriate to discuss the shape of the of the distribution for the amount of candies
per bag because it is quantitative data. I would say there isnt a definite shape to the
graphs but it is mostly left skewed. It looks like it was almost a normal distribution but
the mode lies outside of the bell shape.

Histogram: Total amount per bag


14
12

Frequency

10
8
6
4
2
0
52.5

54.5

56.5

58.5

60.5

62.5

64.5

Total Amount of Skittles Per Bag

Stem-and-Leaf plot for Total Number of Skittles Per Bag


Stem Leaves
5
3
5
77778
6
00000011111111233
6
5

-Part 4 numerical summaries


Summary for color of candy:
Column
Red
Orange
Yellow
Green
Purple

Proportions
0.211
0.218
0.214
0.176
0.181

Summary for Candies per bag for whole class:


Column n Mean
Std. dev.
Min Q1 Median Q3
var6
23 60
2.5584086
53 58 61
61 65

Max

Interquartile Range: 61 58 = 3
Lower Fence:
58 1.5 ( 3 ) = 53.5
Upper Fence:
61 + 1.5 ( 3 ) = 65.5
Outliers: 53 is the only outlier of the data
According to the data my bag is not an outlier because it had a total of 58 skittles.

-Part 5 confidence intervals


A confidence interval provides a range of values which is likely to be contain in the
population parameter of interest.
99% confidence interval for yellow candies:
p= 295/1380 = 0.214
n=1380
P^ - E < p < p^ + E
E= 2.575 * sqrt (0.214 *(1-0.214)/1380) = .0284
0.214+/- .0284
Confidence interval
18.5%, 24.2%
The true proportion for yellow candies is approximately between 18.5% and 24.2%.
The proportion of yellow candies in the bag I purchased is a likely value because 12/58

(12 being the number of yellow candies and 58 being the total number of candies in the
bag) is (.2068) 21%, which lies in between the confidence interval constructed.
95% confidence interval for truest mean of candies per bag
n= 23
x= 60
s= 2.558
E = 1.96 * (2.558/sqrt 23) = .9143
x - E < < x + E
60 1.043
confidence interval:
58.9, 61.1
The true mean for the total number of candies per bag is or in between 58.9 and 61.1.
Based on the interval computed the total number of candies in my bag, 58, is a likely
value for the population because is lies between 58.9 and 61.1.
98% confidence interval for the S.D. of the number of candies per bag
n= 23
s= 2.558
(n 1) s^2 / x^2R < pop. SD < (n 1) s^2 / x^2L
(23 1) 2.558^2 / 36.781 < pop. SD < (23 1) 2.558^2 / 10.982
= (3.91,13.11)
The process that the manufacturing company uses to fill their 2.17 does not seem too
consistent because the standard deviation because the amount of candies can range up to
10 according to the S.D interval commuted.

-Part 6: Hypothesis Testing


The major purpose of hypothesis test is to be able to test two competing claims that are
being made about a population parameter.
20% of all skittles are red:
1. Original claim: p = .20
2. Opposite claim: p does not = .20
3. Null: p = .20
Alternative: p does not = .20
n (total number of skittles) = 1380
x (total number of red skittles) = 291
Significance level: 0.05

-Statdisk
Alternative Hypothesis:
p not equal p (hyp)
Sample proportion: 0.2108696
Test Statistic, z: 1.0095
Critical z:
1.9600
P-Value:
0.3128
0.3128 > 0.05
-because the p-value is greater it fails to reject the null hypothesis (p = .20).
-There is not sufficient evidence to reject the claim that 20% of all Skittles candies are
red.
Mean number of candies in a bag of skittles is more than 55
1. Original Claim: > 55
2. Opposite of claim: </= 55
3. Null: = 55
Alternative: > 55
Significance level: 0.01
n ( sample size) = 23
Sample mean = 60
Sample S.D. = 2.558
-Statdisk
Alternative Hypothesis:
> (hyp) 55
Test Statistic, t: 9.3742
Critical t:
2.5083
P-Value:
0.0000
List the requirements:
-for testing a claim about a proportion
1) The sample observations are a simple random sample
-yes: data was randomly selected
2) The conditions for a binomial experiment are satisfied
1- randomly selected
-yes
2- each trial is independent
-amount of one color doesnt affect another.
3- There are only two possible outcomes (success or failure)
-either 20% or not
3)- Conditions np >/= 5 are satisfied, so the binomial distribution of sample proportions
can be approximated by a normal distribution
(1380)(.2) = 276 > 5

-for testing a claim about a population mean (without population S.D.)


1) The sample is a simple random sample
- Randomly selected
2) The value of the population S.D. is unknown
-is unknown
3) Either or both of these conditions is satisfied: the population is normally distributed or
n > 30
-normally distributed, n (23) is not greater than 30

Part 7 Hypothesis testing


Can height be used to predict the number of candies that will be in a bag of skittles you
purchased?
Explanatory variable: Height
Response variable: Total # of skittles per bag

r = -0.1921
critical value: +/- 0.396

Correlation coefficient value of |-0.1921| is less than the critical value of 0.396 so there is
not significant relationship between the two variables. This is what I expected because
there is no reason why the two would be related.
Regression equation: 69.507 + -0.144x
69.507 + -0.144(63.5) = 60.4
It would not be appropriate to use the regression equation to make a prediction because
about the number of candies per bag because the linear correlation coefficient r does not
indicate linear correlation.
It would not be appropriate to use Yao Mings height of 90 inches because it is outside of
the data range provided.

The overall procedures and goals of this assignment were to apply the data
that was gathered by all the students by each purchasing a bag of skittles (2.17oz)
and recording the number of candies of each color in the bag and the total. This
later was applied in all the different subjects that we studied during the course of
the class including organizing and analyzing data, graphical displays, numerical
summaries, confidence intervals, hypothesis testing and correlation and regression.

Reflection:
When we would be working on certain parts of the term project we would also
be working on that same subject in our homework. This helped me learn the
subjects in a different context, as I was able to apply them in different ways, rather
than set problems from the homework. I also learned that when trying to solve a
statistic problem you can use different sources such as certain equations or your
calculator, either to help you check your answers or maybe a certain way was
easier than another.
I think that that part three where we had to make different graphs and tables
from the candy data we had is and will be relevant in other courses as well as in
future jobs. We are all taught the basics of making graphs and tables but this
actually has you apply data that you have been collecting and calculating from the
beginning so you know exactly from where the data is coming from.
I still think that most people will never use many of the long equations that are
taught and memorize in algebra but many of the things we covered through the
project could be applied in real life. For example instead of calculating amount of
candies per bag and such you could do the same data for the amount of money you
spend each month and compare, track and record your spending different forms
such as in a table that states the highest amount you spent in a month with the
lowest amount, the mean amount of money spent month for the year, the total
amount spent in the whole year and so on.

Das könnte Ihnen auch gefallen