asdfhknl hik. liln gb fyj vrtgreg 3

© All Rights Reserved

Als DOCX, PDF, TXT **herunterladen** oder online auf Scribd lesen

7 Aufrufe

asdfhknl hik. liln gb fyj vrtgreg 3

© All Rights Reserved

Als DOCX, PDF, TXT **herunterladen** oder online auf Scribd lesen

- Normal Distribution Tolerance Sample Size Calculator.xls
- Rule: Energy Employees Occupational Illness Compensation Program Act; implementation: Functions performance and compensation claims
- {i2H-ad8-Mna}Homew
- Chapter 20
- Accurate 3D Point Cloud Comparison Lague Et Al- Revised With Figures Feb2013
- Stats220 Week8 Homework Example
- Statistics in Anaesthesia - Part 1
- 4561_123_2_510_33_sampling
- Uncertainty
- Class 3 - CI
- AuditNet Monograph Series Audit Sampling
- QM Ch 5 Conf Interval
- ch12
- an-empirical-analysis-on-women-empowerment-of-some-selected-areas-of-bangladesh-2151-6219-1000166.pdf
- BARC India Newsletter Relative Error
- Chapter 8
- water-06-01685
- Test+2+Summer+I+2016
- theconfidenceintervalmini-project
- Reading and Practice Guide_ Week 1 _ Reading and Practice Guide for Week 1 _ Stat_2

Sie sind auf Seite 1von 15

mind is the differentiation between descriptive and inferential statistics. There are

other ways that we can separate out the discipline of statistics. One of these ways is

to classify statistical methods as either parametric or nonparametric.

We will find out what the difference is between parametric methods and

nonparametric methods.

The way that we will do this is to compare different instances of these types of

methods.

Parametric Methods

Methods are classified on the basis of what we know about the population we are

studying. Parametric methods are typically the first methods studied in an

introductory statistics course. The basic idea is that there is a set of fixed parameters

that determine a probability model.

Parametric methods are often those for which we know that the population is

approximately normal, or we can approximate using a normal distribution after we

invoke the central limit theorem. There are two parameters for a normal

distribution: the mean and the standard deviation.

Ultimately the classification of a method as parametric depends upon the

assumptions that are made about a population. A few parametric methods include:

Confidence interval for a population mean, with known standard deviation.

Confidence interval for a population mean, with unknown standard deviation.

Confidence interval for a population variance.

Confidence interval for the difference of two means, with unknown standard

deviation.

Nonparametric Methods

To contrast with parametric methods, we will define nonparametric methods. These

are statistical techniques for which we do not have to make any assumption of

parameters for the population we are studying.

Indeed, the methods do not have any dependence on the population of interest. The

set of parameters is no longer fixed, and neither is the distribution that we use. It is

for this reason that nonparametric methods are also referred to as distribution-free

methods.

Nonparametric methods are growing in popularity and influence for a number of

reasons. The main reason is that we are not constrained as much as when we use a

parametric method. We do not need to make as many assumptions about the

population that we are working with as what we have to make with a parametric

method. Many of these nonparametric methods are easy to apply and to understand.

A few nonparametric methods include:

Sign test for population mean

Bootstrapping techniques

U test for two independent means

Spearman correlation test

Comparison

There are multiple ways to use statistics to find a confidence interval about a

mean. A parametric method would involve the calculation of a margin of error with

a formula, and the estimation of the population mean with a sample mean. A

nonparametric method to calculate a confidence mean would involve the use of

bootstrapping.

Why do we need both parametric and nonparametric methods for this type of

problem?

Many times parametric methods are more efficient than the corresponding

nonparametric methods. Although this difference in efficiency is typically not that

much of an issue, there are instances where we do need to consider which method is

more efficient.

A normal distribution is more commonly known as a bell curve. This type of curve

shows up throughout statistics and the real world.

For example, after I give a test in any of my classes, one thing that I like to do is to

make a graph of all the scores. I typically write down 10 point ranges such as 60-69,

70-79, and 80-89, then put a tally mark for each test score in that range. Almost

every time I do this, a familiar shape emerges.

A few students do very well and a few do very poorly. A bunch of scores end up

clumped around the mean score. Different tests may result in different means and

standard deviations, but the shape of the graph is nearly always the same. This shape

is commonly called the bell curve.

Why call it a bell curve? The bell curve gets its name quite simply because its shape

resembles that of a bell. These curves appear throughout the study of statistics, and

their importance cannot be overemphasized.

What Is a Bell Curve?

To be technical, the kinds of bell curves that we care about the most in statistics are

actually called normal probability distributions. For what follows we’ll just assume

the bell curves we’re talking about are normal probability distributions. Despite the

name “bell curve,” these curves are not defined by their shape. Instead, an

intimidating looking formula is used as the formal definition for bell curves.

But we really don’t need to worry too much about the formula. The only two

numbers that we care about in it are the mean and standard deviation. The bell curve

for a given set of data has the center located at the mean. This is where the highest

point of the curve or “top of the bell“ is located. A data set‘s standard deviation

determines how spread out our bell curve is.

The larger the standard deviation, the more spread out the curve.

There are several features of bell curves that are important and distinguishes them

from other curves in statistics:

A bell curve has one mode, which coincides with the mean and median. This is

the center of the curve where it is at its highest.

A bell curve is symmetric. If it were folded along a vertical line at the mean,

both halves would match perfectly because they are mirror images of each

other.

A bell curve follows the 68-95-99.7 rule, which provides a convenient way to

carry out estimated calculations:

Approximately 68% of all of the data lies within one standard

deviation of the mean.

Approximately 95% of all the data is within two standard deviations of

the mean.

Approximately 99.7% of the data is within three standard deviations

of the mean.

An Example

If we know that a bell curve models our data, we can use the above features of the

bell curve to say quite a bit. Going back to the test example, suppose we have 100

students who took a statistics test with a mean score of 70 and standard deviation of

10.

The standard deviation is 10. Subtract and add 10 to the mean. This gives us 60 and

80.

By the 68-95-99.7 rule we would expect about 68% of 100, or 68 students to score

between 60 and 80 on the test.

Two times the standard deviation is 20. If we subtract and add 20 to the mean we

have 50 and 90. We would expect about 95% of 100, or 95 students to score between

50 and 90 on the test.

The central limit theorem is a result from probability theory. This theorem shows up

in a number of places in the field of statistics. Although the central limit theorem can

seem abstract and devoid of any application, this theorem is actually quite important

to the practice of statistics.

So what exactly is the importance of the central limit theorem? It all has to do with

the distribution of our population.

As we will see, this theorem allows us to simplify problems in statistics by allowing

us to work with a distribution that is approximately normal.

The statement of the central limit theorem can seem quite technical but can be

understood if we think through the following steps. We begin with a simple random

sample with n individuals from a population of interest. From this sample, we can

easily form a sample mean that corresponds to the mean of what measurement we

are curious about in our population.

A sampling distribution for the sample mean is produced by repeatedly selecting

simple random samples from the same population and of the same size, and then

computing the sample mean for each of these samples. These samples are to be

thought of as being independent of one another.

The central limit theorem concerns the sampling distribution of the sample means.

We may ask about the overall shape of the sampling distribution.

The central limit theorem says that this sampling distribution is approximately

normal - commonly known as a bell curve. This approximation improves as we

increase the size of the simple random samples that are used to produce the

sampling distribution.

There is a very surprising feature concerning the central limit theorem.

The astonishing fact is that this theorem says that a normal distribution arises

regardless of the initial distribution. Even if our population has a skewed

distribution, which occurs when we examine things such as incomes or people’s

weights, a sampling distribution for a sample with a sufficiently large sample size

will be normal.

The unexpected appearance of a normal distribution from a population distribution

that is skewed (even quite heavily skewed) has some very important applications in

statistical practice. Many practices in statistics, such as those involving hypothesis

testing or confidence intervals, make some assumptions concerning the population

that the data was obtained from. One assumption that is initially made in a statistics

course is that the populations that we work with are normally distributed.

The assumption that data is from a normal distribution simplifies matters but seems

a little unrealistic. Just a little work with some real-world data shows that outliers,

skewness, multiple peaks and asymmetry show up quite routinely. We can get

around the problem of data from a population that is not normal. The use of an

appropriate sample size and the central limit theorem help us to get around the

problem of data from populations that are not normal.

Thus, even though we might not know the shape of the distribution where our data

comes from, the central limit theorem says that we can treat the sampling

distribution as if it were normal. Of course, in order for the conclusions of the

theorem to hold, we do need a sample size that is large enough. Exploratory data

analysis can help us to determine how large of a sample is necessary for a given

situation.

Inferential statistics gets its name from what happens in this branch of statistics.

Rather than simply describe a set of data, inferential statistics seeks to infer

something about a population on the basis of a statistical sample. One specific goal

in inferential statistics involves the determination of the value of an unknown

population parameter. The range of values that we use to estimate this parameter is

called a confidence interval.

A confidence interval consists of two parts. The first part is the estimate of the

population parameter. We obtain this estimate by using a simple random sample.

From this sample, we calculate the statistic that corresponds to the parameter that

we wish to estimate. For example, if we were interested in the mean height of all

first-grade students in the United States, we would use a simple random sample of

U.S. first graders, measure all of them and then compute the mean height of our

sample.

The second part of a confidence interval is the margin of error. This is necessary

because our estimate alone may be different from the true value of the population

parameter. In order to allow for other potential values of the parameter, we need to

produce a range of numbers. The margin of error does this.

The estimate is in the center of the interval, and then we subtract and add the

margin of error from this estimate to obtain a range of values for the parameter.

Confidence Level

Attached to every confidence interval is a level of confidence. This is a probability or

percent that indicates how much certainty we should be attributed to our confidence

interval.

If all other aspects of a situation are identical, the higher the confidence level the

wider the confidence interval.

This level of confidence can lead to some confusion. It is not a statement about the

sampling procedure or population. Instead it is giving an indication of the success of

the process of construction of a confidence interval. For example, confidence

intervals with confidence of 80% will, in the long run, miss the true population

parameter one out of every five times.

Any number from zero to one could, in theory, be used for a confidence level. In

practice 90%, 95% and 99% are all common confidence levels.

Margin of Error

The margin of error of a confidence level is determined by a couple of factors. We

can see this by examining the formula for margin of error. A margin of error is of the

form:

The statistic for the confidence level depends upon what probability distribution is

being used and what level of confidence we have chosen. For example, if Cis our

confidence level and we are working with a normal distribution, then C is the area

under the curve between -z* to z*. This number z* is the number in our margin of

error formula.

The other term necessary in our margin of error is the standard deviation or

standard error. The standard deviation of the distribution that we are working with

is preferred here. However, typically parameters from the population are unknown.

This number is not usually available when forming confidence intervals in practice.

To deal with this uncertainty in knowing the standard deviation we instead use the

standard error. The standard error that corresponds to a standard deviation is an

estimate of this standard deviation. What makes the standard error so powerful is

that it is calculated from the simple random sample that is used to calculate our

estimate. No extra information is necessary as the sample does all of the estimation

for us.

There are a variety of different situations that call for confidence intervals.

Although these aspects are different, all of these confidence intervals are united by

the same overall format. Some common confidence intervals are those for a

population mean, population variance, population proportion, the difference of two

population means and the difference of two population proportions.

an unknown population parameter. You start with a statistical sample, and from

this, you can determine a range of values for the parameter. This range of values is

called a confidence interval.

Confidence Intervals

Confidence intervals are all similar to one another in a few ways. First, many two-

sided confidence intervals have the same form:

Second, the steps for calculating confidence intervals are very similar, regardless of

the type of confidence interval you are trying to find. The specific type of confidence

interval that will be examined below is a two-sided confidence interval for a

population mean when you know the population standard deviation. Also, assume

that you are working with a population that is normally distributed.

Below is a process to find the desired confidence interval. Although all of the steps

are important, the first one is particularly so:

for your confidence interval have been met. Assume that you

know the value of the population standard deviation,

denoted by the Greek letter sigma σ. Also, assume a normal

distribution.

in this case, the population mean—by use of a statistic,

which in this problem is the sample mean. This involves

forming a simple random sample from the population.

Sometimes, you can suppose that your sample is a simple

random sample, even if it does not meet the strict definition.

with your confidence level. These values are found by

consulting a table of z-scores or by using the software. You

can use a z-score table because you know the value of the

population standard deviation, and you assume that the

population is normally distributed. Common critical values

are 1.645 for a 90-percent confidence level, 1.960 for a 95-

percent confidence level, and 2.576 for a 99-percent

confidence level.

where n is the size of the simple random sample that you

formed.

margin of error. This can be expressed as either Estimate ±

Margin of Error or as Estimate - Margin of Error to

Estimate + Margin of Error. Be sure to clearly state the

level of confidence that is attached to your confidence

interval.

Example

To see how you can construct a confidence interval, work through an example.

Suppose you know that the IQ scores of all incoming college freshman are normally

distributed with standard deviation of 15. You have a simple random sample of 100

freshmen, and the mean IQ score for this sample is 120. Find a 90-percent

confidence interval for the mean IQ score for the entire population of incoming

college freshmen.

you have been told that the population standard deviation is

15 and that you are dealing with a normal distribution.

simple random sample of size 100. The mean IQ for this

sample is 120, so this is your estimate.

percent is given by z* = 1.645.

obtain an error of z* σ /√n = (1.645)(15) /√(100) = 2.467.

2. Conclude: Conclude by putting everything together. A 90-

percent confidence interval for the population’s mean IQ

score is 120 ± 2.467. Alternatively, you could state this

confidence interval as 117.5325 to 122.4675.

Practical Considerations

Confidence intervals of the above type are not very realistic. It is very rare to know

the population standard deviation but not know the population mean. There are

ways that this unrealistic assumption can be removed.

While you have assumed a normal distribution, this assumption does not need to

hold. Nice samples, which exhibit no strong skewness or have any outliers, along

with a large enough sample size, allow you to invoke the central limit theorem.

As a result, you are justified in using a table of z-scores, even for populations that are

not normally distributed.

Inferential statistics concerns the process of beginning with a statistical sample and

then arriving at the value of a population parameter that is unknown. The unknown

value is not determined directly. Rather we end up with an estimate that falls into a

range of values. This range is known in mathematical terms an interval of real

numbers, and is specifically referred to as a confidence interval.

Confidence intervals are all similar to one another in a few ways. Two-sided

confidence intervals all have the same form:

confidence intervals. We will examine how to determine a two sided confidence

interval for a population mean when the population standard deviation is unknown.

An underlying assumption is that we are sampling from a normally distributed

population.

We will work through a list of steps required to find our desired confidence interval.

Although all of the steps are important, the first one is particularly so:

conditions for our confidence interval have been met. We

assume that the value of the population standard deviation,

denoted by the Greek letter sigma σ, is unknown and that we

are working with a normal distribution. We can relax the

assumption that we have a normal distribution as long as

our sample is large enough and has no outliers or extreme

skewness.

parameter, in this case the population mean, by use of a

statistic, in this case the sample mean. This involves forming

a simple random sample from our population. Sometimes

we can suppose that our sample is a simple random sample,

even if it does not meet the strict definition.

corresponds with our confidence level. These values are

found by consulting a table of t-scores or by using software.

If we use a table, we will need to know the number of

degrees of freedom. The number of degrees of freedom is

one less than the number of individuals in our sample.

where n is the size of the simple random sample that we

formed and s is the sample standard deviation, which we

obtain from our statistical sample.

margin of error. This can be expressed as either Estimate ±

Margin of Error or as Estimate - Margin of Error to

Estimate + Margin of Error. In the statement of our

confidence interval it is important to indicate the level of

confidence. This is just as much a part of our confidence

interval as numbers for the estimate and margin of error.

Example

To see how we can construct a confidence interval, we will work through an example.

Suppose we know that the heights of a specific species of pea plants are normally

distributed. A simple random sample of 30 pea plants has a mean height of 12 inches

with a sample standard deviation of 2 inches.

What is a 90% confidence interval for the mean height for the entire population of

pea plants?

population standard deviation is unknown and we are

dealing with a normal distribution.

simple random sample of 30 pea plants. The mean height

for this sample is 12 inches, so this is our estimate.

3. Critical Value: Our sample has size of 30, and so there are

29 degrees of freedom. The critical value for confidence level

of 90% is given by t* = 1.699.

and obtain a margin of error of t*s /√n = (1.699)(2) /√(30) =

0.620.

90% confidence interval for the population’s mean height

score is 12 ± 0.62 inches. Alternatively we could state this

confidence interval as 11.38 inches to 12.62 inches.

Practical Considerations

Confidence intervals of the above type are more realistic than other types that can be

encountered in a statistics course. It is very rare to know the population standard

deviation but not know the population mean. Here we assume that we do not know

either of these population parameters.

resampling. This technique involves a relatively simple procedure but repeated so

many times that it is heavily dependent upon computer calculations. Bootstrapping

provides a method other than confidence intervals to estimate a population

parameter. Bootstrapping very much seems to work like magic. Read on to see how it

obtains its interesting name.

An Explanation of Bootstrapping

One goal of inferential statistics is to determine the value of a parameter of a

population. It is typically too expensive or even impossible to measure this directly.

So we use statistical sampling. We sample a population, measure a statistic of this

sample, and then use this statistic to say something about the corresponding

parameter of the population.

For example, in a chocolate factory, we might want to guarantee that candy bars

have a particular mean weight. It’s not feasible to weigh every candy bar that is

produced, so we use sampling techniques to randomly choose 100 candy bars. We

calculate the mean of these 100 candy bars and say that the population mean falls

within a margin of error from what the mean of our sample is.

Suppose that a few months later we want to know with greater accuracy -- or less of a

margin of error -- what the mean candy bar weight was on the day that we sampled

the production line.

We cannot use today’s candy bars, as too many variables have entered the picture

(different batches of milk, sugar and cocoa beans, different atmospheric conditions,

different employees on the line, etc.). All that we have from the day that we are

curious about are the 100 weights. Without a time machine back to that day, it would

seem that the initial margin of error is the best that we can hope for.

randomly sample with replacement from the 100 known weights. We then call this a

bootstrap sample. Since we allow for replacement, this bootstrap sample most likely

not identical to our initial sample. Some data points may be duplicated, and others

data points from the initial 100 may be omitted in a bootstrap sample. With the help

of a computer, thousands of bootstrap samples can be constructed in a relatively

short time.

An Example

As mentioned, to truly use bootstrap techniques we need to use a computer. The

following numerical example will help to demonstrate how the process works. If we

begin with the sample 2, 4, 5, 6, 6, then all of the following are possible bootstrap

samples:

2 ,5, 5, 6, 6

4, 5, 6, 6, 6

2, 2, 4, 5, 5

2, 2, 2, 4, 6

2, 2, 2, 2, 2

4,6, 6, 6, 6

Bootstrap techniques are relatively new to the field of statistics. The first use was

published in a 1979 paper by Bradley Efron. As computing power has increased and

becomes less expensive, bootstrap techniques have become more widespread.

The name “bootstrapping” comes from the phrase, “To lift himself up by his

bootstraps.” This refers to something that is preposterous and impossible.

Try as hard as you can, you cannot lift yourself into the air by tugging at pieces of

leather on your boots.

However, the use of bootstrapping does feel like you are doing the impossible.

Although it does not seem like you would be able to improve upon the estimate of a

population statistic by reusing the same sample over and over again, bootstrapping

can, in fact, do this.

- Normal Distribution Tolerance Sample Size Calculator.xlsHochgeladen vonyoge1130
- Rule: Energy Employees Occupational Illness Compensation Program Act; implementation: Functions performance and compensation claimsHochgeladen vonJustia.com
- {i2H-ad8-Mna}HomewHochgeladen vonPooja Jain
- Chapter 20Hochgeladen vonIrina Alexandra
- Accurate 3D Point Cloud Comparison Lague Et Al- Revised With Figures Feb2013Hochgeladen vonAndrei Taranu
- Stats220 Week8 Homework ExampleHochgeladen vonSteve Jones
- Statistics in Anaesthesia - Part 1Hochgeladen vonnot here 2make friends sorry
- 4561_123_2_510_33_samplingHochgeladen vonsamarth9211
- UncertaintyHochgeladen vonMedo Hamed
- Class 3 - CIHochgeladen vonjbrunomaciel1957
- AuditNet Monograph Series Audit SamplingHochgeladen vonbinuanb
- QM Ch 5 Conf IntervalHochgeladen vonASHISH
- ch12Hochgeladen vonTonoy Peter Corraya
- an-empirical-analysis-on-women-empowerment-of-some-selected-areas-of-bangladesh-2151-6219-1000166.pdfHochgeladen vonShehbaz Shaikh
- BARC India Newsletter Relative ErrorHochgeladen vonPalla Bhaskara Rao
- Chapter 8Hochgeladen vonJohn Wong
- water-06-01685Hochgeladen vonhaerul84
- Test+2+Summer+I+2016Hochgeladen vonAntonio Shinnery
- theconfidenceintervalmini-projectHochgeladen vonapi-358685851
- Reading and Practice Guide_ Week 1 _ Reading and Practice Guide for Week 1 _ Stat_2Hochgeladen vonahmedidrees1992
- 5sampl2Hochgeladen vonElmer Delejero
- AmazonHochgeladen vonAqira Pakuanzahra
- SamplingMethodinResearchMethodologyHowtoChooseaSamplingTechniqueforResearchHochgeladen vonAbeer Abdullah
- 03-StStForwardHochgeladen vonmarimarinowa
- 3_linear_regression-handout.pdfHochgeladen vonTaylor Tam
- Good Sample AnswerHochgeladen vonShruti Malani
- Banking Sector ResultsHochgeladen vonMuhammad Arif
- Social Security: apnHochgeladen vonSocial Security
- actuatorsHochgeladen vonmechatron29
- Ch 15 - SLR - Part 2 - Rev_11_22_13Hochgeladen vonJerry Wong

- Os Question Paper 1Hochgeladen vonharry
- The Lady, Or the TigerHochgeladen vonDaniel Frazón
- Quiz1(1)Hochgeladen vonHarshaSharma
- HestiaHochgeladen vonkayla1205
- Gonzales Cannon Jan. 17 issueHochgeladen vonGonzales Cannon
- Practical Programming in Tcl and TkHochgeladen vonalynutzza90
- 15 Road Inventory Survey Report ExampleHochgeladen vonAJ Sa
- Abd07 k WidthHochgeladen vonKevin Mondragon
- SFB_NewProductGuide_2010_lrHochgeladen vonjuneautek
- Balmeo - Open High School Program-Its Effects and Implications on the Socio-Emotional Development of%Hochgeladen vonGail Edroso
- Prem Kumar- Marketing ManagerHochgeladen vonManoj Kumar Mohan
- Tempting Failure Study Boxes (2014)Hochgeladen voninfo8406
- Analyzing the correlation between urban spaces and place attachment Evidence from: Narmak neighborhood in TehranHochgeladen vonHatem Hadia
- VSO Activity BookHochgeladen voncndy31
- Sucess factors in M&AHochgeladen vonsmh9662
- Architecture PortfolioHochgeladen vonandresblah
- Vintage Airplane - Jun 1999Hochgeladen vonAviation/Space History Library
- GhbjHochgeladen vonShannen Khrisha De Jesus
- PC and PS 2 POST Error CodesHochgeladen vonajithkumarv007
- Hossein i Al Hashemi 2015Hochgeladen von'Mbem Octaviani'saiank Ndud-forever
- str_pansHochgeladen vonDaniela Cazarotti
- Warrant for Italy GOIFHochgeladen vonagenzia massonica internazionale
- Norse Bankruptcy FilingHochgeladen vonThe Post-Standard
- Project Evaluation & ImplementationHochgeladen vonMukesh Kumar Yadav
- brand auditHochgeladen vonMayuri Agrawal
- Acm Ed Week Cs10kHochgeladen vonBoris Popov
- Agasen vs CAHochgeladen vonChelle Ong
- JD UI5Hochgeladen vonLanke Dhanaraju
- Grappling With Charles Taylor a Secular AgeHochgeladen vonLuis Felipe
- Vestibular 3Hochgeladen vonLANUZIASALDANHA