Term Project

Marisa Arias
Professor: Brenda Santistevan

Math 1040-004
This project will allow us to apply the different concepts that we will be learning through the
duration of the course. For example, the following concepts are applied: collecting samples,
organizing and analyzing data, drawing conclusions as well as learning to use technology and
StatCrunch, a web-based statistical software program.
In the initial part of our project we will be selecting the data set: Body Measurement that will
serve as our population, from which we will select a categorical variable represented by gender
(1-M, 0-F). Next, using StatCruch, we will create a pie chart and a pareto chart for the
categorical variable. [The term we is used throughout the paper to describe the project].
Population Population

Simple Random Sample
Using the same categorical variable (gender) we will now use two sampling methods, Simple
random and Systematic, with a sample size n>35. For the Simple random sample we used the TI-
84 calculator to generate random numbers (1, 507, 41) with a sample size n=41. Using
StatCrunch we developed a new data sheet in which we have the population size N= 507 from
the categorical variable (gender). The calculator randomly selected 41 numbers from N=507.
These randomly selected 41 numbers make up our new column titled simple random sample
that we used to generate the graphs.
Pie chart Pareto chart

Systematic Sampling
Next, we generate the Systematic sample with sample size n=39. We used the TI-84 calculator
to generate a random number (1, 507, 1) =169. Then we divide the population 507 in to 39 (the
sample size) which equals 13. Now that we have all the information for the Systematic sample
we start with the number 169 (from our categorical variable, gender) and choose every 13
th

number until we obtained our sample size.
Pie chart Pareto chart

Observations:
In the Simple Random sample the 0-Female is heavily represented compared to the 1-Male. In
fact, 0-Female is represented more than two times that of 1-Male. In the Systematic sampling,
the distribution of 1-Male and 0-Female is more evenly distributed, although in this case 1-Male
is slightly more represented than 0-Female (2.5 percent greater 0-Male). If we compare this to
the entire population, the Systematic sampling is more representative to the entire population,
(except the 1-M is slightly greater than 0-F. In the entire population this condition is reversed).

In this part of the project we have selected the quantitative variable, called waist girth, which will
represent our population. From this population we will compute the sample statistics (mean,
standard deviation, and five-number summary). In addition, we will create a frequency histogram
and boxplot with the help of Statcrunch. Using two different sampling methodsSimple
Random and Systematic Samplingbased on our population quantitative variable, we will
compute the sample statistics and create a frequency histogram and boxplot.

Summary statistics: Population
Column n Mean Std. dev. Median Min Max Q1 Q3
Waist girth 507 76.979487 11.012688 75.8 57.9 113.2 68 84.5

As you can see there are two outliers on the boxplot (represented by the two red dots). These
represent waist girth of 112.1 and 113.2. Outliers are defined the following way:
Step 1. Q3-Q1= Interquartile range or IQR. In this case 84.5-68=16.5.
Step 2. Calculate the following: 1.5XIQR. In this case, 1.5X16.5=24.75
Step 3. Any outliers are above Q3=84.5 by more than 24.75 or below Q1=68 by more than
24.75. This means that any outlier is greater than 109.25 (84.5+24.75) or less than 43.25 (68-
24.75).

When you look at all N=507 there are two outliers. 112.1 and 113.2. There are no outliers on the
low end of the graph. No one has a waist girth less than 43.25. The highest data value that is not
an outlier is 109.2. This is the maximum, greatest value excluding the outliers in N=507.

Summary statistics: Simple Random Sample

Simple Random Sample: waist girth 41 75.319512 10.00048 73.6 59.5 97.4 68 80.7

Summary statistics: Systematic Sample

Systematic Sample: Waist girth 39 76.874359 11.543162 75 60.7 112.1 81.5 68

Observation:
There are three frequency histogram graphs (population, simple random sample and systematic
sampling). Each graph type shows a shape distribution skewed to the right (positive skew). The
data are not symmetric like what would be represented in a bell curve. Most of the time in a right
skewed graph the mean will be greater than the median as is shown in these three graphs.
The two samples do not have outliers. The simple random sample and systematic samplings was
obtained by using TI 84 calculator. Using Step 1-Step3 above with the new summary statistics
for each sample does not yield any outliers.
The boxplots too show the data distribution being skewed to the right. The upper quartile (25%)
of the three boxplots are similar. Observe that this is on the right whiskers on the boxplot and
that they are longer than the left ones due to the top quarter of the people having far larger waist
girth.

In this part of the project we will create confidence intervals for estimating the population
proportion of each of the samples already obtained from the categorical variable, gender. Next,
we will find the confidence interval for estimating the population mean and the population
standard deviation of each of the previously obtained quantitative samples (in this case waist
girth). We will find these computations with the help of the TI-84 calculator and Tables A-2, A-3
and A-4.

Categorical variable: in this case is gender (male and female)
We choose the female value
P= 0.5128
Sample 1: Simple Random Sample
n= 41 x= 28 p-hat= 0.6829 = 0.05 q-hat= 1 (0.6829) Z 0.025= 1.96
E= Z/2 ( p-hat * q-hat / n )
E=1.96 ((0.6829)*(0.3171) / 41)
E=0.142
p-hat E < p < p-hat + E
0.6829 0.142 < p < 0.6829 + 0.142
0.5409 < p < 0.8249
Sample 2: Systematic Sampling
n= 39 x= 19 p-hat= 0.4872 = 0.05 q-hat= 1 (0.4872) Z 0.025= 1.96
E= Z/2 (p-hat * q-hat / n)
E= 1.96 ((0.4872)*(0.5128) / 39)
E= 0.157
p-hat E < p < p-hat + E
0.4872 0.157 < p < 0.4872 + 0.157
0.3302 < p < 0.6442
Quantitative variable: waist girth
= 76.98
= 11.01
n= 41 x-bar= 75.32 s= 10 = 0.05 t /2= 2.021 df= 40
E= t/2*(s/(n))
E= 2.021*(10/(41))
E= 3.16
x-bar E < < x-bar + E
75.32 3.16 < < 75.62 + 3.16
72.16 < < 78.48
n= 39 x-bar= 76.87 s= 11.54 = 0.05 t/2= 2.024 df= 38
E= t/2*(s/(n))
E= 2.024*(11.54/(39))
E= 3.74
x-bar E < < x-bar + E
76.87 3.74 < < 76.87 + 3.74
73.13 < < 80.61
df= n-1 is the number of degrees of freedom (Find t/2 using table A-3)
Confidence interval for population standard deviation
n= 41 s= 10 = 0.05 df= 40

n= 39 s= 11.54 = 0.05 df= 38

(For a 95% confidence level, we divide = 0.05 equally between the two tails of the chi-square
distribution, and we refer to the values of (1-0.025) = 0.975 and 0.025 across the top row of table
A-4.)
Observations:
Categorical Variable: Gender
On our categorical value, gender, we observed the population proportion, p, was equal to 51.28%
In other words 51.28% were female. In the first sample, using the Simple Random Sample
method, we used a 95% confidence level. In this case our confidence interval estimate of the
population proportion, p, is 0.5409<p<0.8249. Because p (.5128) was outside this range our
calculation was not reflective of the true population. Using the 95% confidence level there is a
5% probability that the confidence interval will not contain the true population proportion, as is
the case here.
In the second sample we used the Systematic Sampling method. We are 95% confident that the
interval from 0.3302<p<0.6442 actually do contain the true value of the population proportion, p.
For example, if we were to take many different samples and construct corresponding confidence
intervals, 95% of them would contain the value of the population proportion p.
Quantitative Variable: Waist Girth
Using the quantitative variable of waist girth we created confidence intervals for the population
mean of each of the quantitative samples. We are 95% confident that the interval from
72.16 < < 78.48 actually do contain the true value of the population mean, .
In the second sample we used the Systematic Sampling method. We are 95% confident that the
interval from 73.13 < < 80.61 actually do contain the true value of the population mean. In this
case =76.98.
The quantitative variable waist girth has a population standard deviation of 11.01, based on the
computations obtained from each of the quantitative samples. We have 95% confidence that the
intervals from 8.21< <12.80 (Simple Random Sample) and 9.23 < < 14.39 (Systematic
Sampling) contain the true value of = 11.01.

Hypothesis tests:
Based on our sample data previously obtained we are going to find the hypothesis tests for the
population proportion, for one value (female) of our categorical variable (gender). Next, for each
of our samples from the quantitative (waist girth) data we will complete a hypothesis test for the
population mean. Moreover, we are going to show the computation of the test statistic and
critical value. These computations are obtained by using TI 84 calculator and tables A-2, A-3.
We will use the 0.05 level of significance.
Hypothesis test for the population proportion:
Sample 1:

We reject the null hypothesis because the test statistic is in the critical region. And we conclude
that there is sufficient evidence to warrant rejection of the claim that 51.28% of our population
proportion in this sample data is female.

Sample 2:

We fail to reject the null hypothesis because the test statistic is not in the critical region. And we
conclude that there is not sufficient evidence to warrant rejection of the claim that 51.28% of our
population proportion in this sample data is female.
Hypothesis test for the population mean
Sample 1:

We fail to reject the null hypothesis because the test statistic in not in the critical region. And we
conclude the there is not sufficient evidence to warrant rejection of the claim that 76.98 of our
population mean in this sample data is female.

Sample 2:

We fail to reject the null hypothesis because the test statistic in not in the critical region. And we
conclude the there is not sufficient evidence to warrant rejection of the claim that 76.98 of our
population mean in this sample data is female.
Observation:
Type I error is a wrongful rejection of the null hypothesis when it should not be rejected. In
sample 1(population proportion) we reject the null hypothesis. But it may be that is a weird
sampling or the sampling is not representative of the population.
My samples do meet the conditions for performing hypothesis tests because for both of my
samples of the mean waist girth are greater than 30. Although my population is not normally
distributed and is skewed right I can perform this test when the sample size is greater than 30.
For the proportion tests I performed of the proportion of females my sample sizes of 39 and 41
are much less than 5% of my total population of 507.

Summary Reflection
This project has allowed me to be more critical of studies that I read. As a nursing major, I have
the opportunity to examine many medical journals. Statistics allow me to understand the math
behind the data. Confidence intervals and p values are just two skills applied in this project that I
can apply in my school career.
For instance, in the Lancet medical journal there is a recent article on the mortality of patients
with ruptured abdominal aortic aneurism (rAAA). Without an understanding of confidence
interval or p value the data would be difficult to interpret. The article reports, hospital
mortality was lower in the U.S. than in England (53.05% [95% CI 51.26-54.85] vs 65.90%;
p<0.0001).
More broadly, concepts that I learned in this class (and this project), like margin of error, are
always referenced in political surveys. If you do a quick search of statistics in the online App
Technology, Entertainment and Design (TED) you find how juries are fooled by statistics, how
States are just beginning to use statistics in the prison system and a mathematician who argues
that statistics is a far more important skill than calculus for university students.
Whether it is related to the politicians that I vote for, the medical procedures that are given to
patients, or social and environmental policies that affect all of us, a thorough understanding of
statistics will enable me to make more informed decisions.

Term Project

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Term Project

Hochgeladen von

Copyright:

Verfügbare Formate

Marisa Arias

Professor: Brenda Santistevan

Das könnte Ihnen auch gefallen