Data Analysis Notes 2 PDF

CE 006 - ENGINEERING DATA ANALYSIS 2ND SEMESTER (AY 2019-2020)
DRAWING CONCLUSIONS FROM STATISTICAL STUDIES

DATA COLLECTION
- A primary goal of statistical studies is to collect data that can
then be used to make informed decisions. It should come as
no surprise that the ability to make good decisions depends on
the quality of the information available.
- It is important to keep in mind the questions we hope to

answer on the basis of the resulting data.
- The type of conclusion that can be drawn from a statistical

study depends on how the study was conducted.
TYPES OF STATISTICAL STUDIES
SAMPLING
- In an experiment the researcher controls who is in which

group, whereas this is not the case in an observational study.
SAMPLE VS CENSUS (ENTIRE POPULATION)
- A well-designed experiment can result in data that provide A sample involves looking only at some items selected from the
evidence for a cause and effect relationship.
population but a census is an examination of all items in a
- In an observational study, it is impossible to draw clear cause defined population. (The accuracy of a census can be illusory)
and effect conclusions because we cannot rule out the - We denote the population size by N, and the sample size by n.
possibility that the observed effect is due to some variable

other than the explanatory variable being studied. Such
variables are called confounding variables.
- CONFOUNDING VARIABLE: A variable that is related to both

how the experimental groups were formed and the response
variable of interest.
- EXAMPLES OF CONFOUNDING VARIABLE IN STUDY:
ENGR. L.V. YNGENTE 1

SIMPLE RANDOM SAMPLING

- Every item in the population of N items has the same chance
of being chosen in the sample of n items.
- There must be no bias (i.e.no systematic tendency to over or

underestimate whatever parameter we are trying to measure),
all values must be equiprobable (i.e. equally likely to occur).
- EXAMPLES:
• Write each of the N data values in a poker chip, and then

draw n chips from a bowl after stirring it thoroughly.
PARAMETER • Write each of the N data values in a numbered list, and

- Any measurement that describes n entire population. Usually then choose n integers at random between 1 and N.
the parameter value is unknown since we rarely can observe

the entire population.
STATISTIC
- Any measurement computed from a sample. Usually, the
statistic is regarded as an estimate of a population parameter.
TARGET POPULATION
- The population in which we are interested.
- EXAMPLE: Suppose e wish to estimate the proportion of

potential consumers who would purchase a car. Is the target
population all drivers? Only drivers with incomes over
PhP35,000?
- Someways to get 10 random integers between 1 and 875.
SAMPLING FRAME
- The group from which we take the sample.
- EXAMPLES:
• Names and addresses of all registered voters in Quezon City
• Names and addresses of all registered Civil Engineers in the

Philippines.
FINITE POPULATION
- A population is finite if it has a definite size, N, even if its size
is unknown.
- EXAMPLES:
• Number of cars in the TIP-QC parking lot
• Number of Civil Engineering Students enrolled at TIP-QC
INFINITE POPULATION
- A population is treated as infinite if its of arbitrary large size.
SYSTEMATIC SAMPLING
- EXAMPLES:
- Choose every kth item from the list, starting from a randomly
• Number of vehicles along EDSA
chosen entry among the first k items on the list.
• Number of cement nails produced by an assembly line

- An attraction of systematic sampling is that it can be used with
- When a population is known to be very large relative to the unlistable or infinite population, such as production processes
sample, a statistician may treat the population as infinite. (e.g. testing every 100th reinforcing steel bar)
(When N is at least 20 times n, i.e. when N/n > 20)

- It is also suitable to linearly organized physical populations
(e.g. pulling every tenth student file from an alphabetized filling
drawers)
- A systematic sample of n items from a population of N items

SAMPLING METHODS requires that periodicity k be approximately N/n. (e.g. to
choose 25 companies from a list of 501 companies, we chose
1. PROBABILITY SAMPLING every twentieth company in the list, k = 501/25 ≈ 20.)
- Items are chosen by randomization or chance procedure.

- EXAMPLE: The figure below shows how to sample every fourth
2. NON-PROBABILITY SAMPLING item, starting from item 2, resulting in a sample of n=20 items
- Less scientific but is sometimes used for expediency.
from a list of N=78 items.

- If the this difference between the included and the

STRATIFIED SAMPLING excluded occurs on a variable that is important to the
- When the population can be divided into relatively study, conclusions based on the sample data may not be
homogeneous way, the subgroups are called strata, and each valid for the population of interest.
individual subgroup is called stratum (the singular of strata).
- Entails selecting a separate simple random sample from each 2. RESPONSE ERROR
stratum.
- Occurs when respondents deliberately give false
- It often allows us to make more accurate inferences about a information.
population than does simple random sampling.
- It is easier to produce relatively accurate estimates of 3. COVERAGE ERROR

characteristics of a homogeneous group than of a - Occurs when some important segment of the target
heterogeneous group.
population is systematically missed.
CLUSTER SAMPLING 4. MEASUREMENT ERROR

- Involves dividing the population of interest into non- - When methods of observation tends to produce values
overlapping subgroups, called clusters.
that systematically differ from the true value in some way
- Clusters are then selected at random, and then all individuals - Results when survey questions do not accurately reveal
in the selected clusters are included in the sample.
the construct being assessed.
- The ideal situation for cluster sampling is when each cluster

mirrors the characteristics of the population.
5. INTERVIEW ERROR
- Occurs when the interviewer’s facial expressions, tone of
voice, or appearance influences the responses data
6. SAMPLING ERROR
- Uncontrollable random error that is inherent in any
survey/study.
- Even using a probability sampling method, it is possible

that the sample will contain unusual responses.
- Cannot be prevented and is generally undetectable.
JUDGEMENT SAMPLING
- A non-probability sampling method that relies on the expertise
of the sampler to choose items that are representative of the
population.
- QUOTA SAMPLING is a special kind of judgement sampling, in

which the interviewer chooses a certain number of people in
each category (e.g. male/female).
CONVENIENCE SAMPLING
- The idea is to grab whatever sample is handy.
- Its sole virtue is that it is quick.
SOURCES OF ERROR
1. NONRESPONSE BIAS
- Occurs when responses are not obtained from all
individuals selected for inclusion in the sample.
- Can distort results if those who respond differ in

important ways from those ho do not respond.
2. SELECTION BIAS
- Also called undercoverage
- Introduced when the ay the sample is selected

systematically excludes some part of the population of
interest.
- If those who are excluded from the sampling process

differ in some systematic way from those ho are
included, the sample is virtually guaranteed to be
unrepresentative of the population.

SIMPLE COMPARATIVE EXPERIMENTS - Random assignment - either of subjects to treatments or of

treatments to trials - is critical component of a good
experiment.
- Random assignment can be effective if only the number of

subjects or observations in each experimental condition
(treatment) is large enough for each experimental group to
reliably reflect variability in the population.
- Replication is the design strategy of making multiple

observation for each experimental condition.
THE PUPOSE OF THE EXPERIMENT:
PRINCIPLES OF EXPERIMENTAL DESIGN

- A well designed experiment copes with the potential effects of
extraneous variables by using random assignment to
experimental conditions and sometime by incorporating direct
- control and/or blocking into the design of the experiment.
SOME COMMON EXPERIMENTAL DESIGNS
- The effects of extraneous variables can be filtered out by
blocking. Extraneous variables that are addressed through
blocking are called blocking variables.
- If an extraneous variable is confounded with the explanatory

variables (which define the treatments), it is not possible to
draw an unambiguous conclusion about the effect of the
treatment on the response. Both direct control and blocking
are effective in ensuring that the controlled variables and - Simple diagrams are sometimes used to highlight important
blocking variables are not confounded with the variables that features of some common experimental designs. The structure
define treatments.
of an experiment that is based on random assignment of
- Random assignment ensures that our experiment does not experimental units (the units to which treatments are assigned,
systematically favor one experimental condition over any other usually subjects or trials) to one of two treatments is displayed
and attempts to create experimental groups that are as much above.
alike as possible.
GOAL OF EXPERIMENTAL DESIGN

To provide a method of data collection that:
1. Minimizes the effect of extraneous sources of variability in the

response so that any differences in response for various
experimental conditions can be more easily assessed.
2. Creates experimental groups that are similar with respect to

extraneous variables that cannot be controlled either directly
or through blocking.

USE OF CONTROL GROUP design as a previous experiment. This is a way of confirming

- Control group is an experimental group that does not receive conclusions based on a previous experiment, but it does not
treatment, or sometimes, the group that receives the current eliminate the need for replication in each of the individual
standard treatment.
experiments themselves.
- The use of control group allows the experimenter to assess

how the response variable behaves when treatment is not
used.
USING VOLUNTEERS AS SUBJECTS IN AN EXPERIMENT
- This provides the baseline against which the treatment groups - Although the use of volunteers in a study that involves
can be compared to determine whether the treatment had an collecting data through sampling is never a good idea, it is a
effect.
common practice to use volunteers as subjects in an
- Not all experiments require the use of a control group. For experiment.
example, many experiments are designed to compare two or - Even though the use of volunteers limits the researcher’s ability
more conditions.
to generalize to a larger population, random assignment of the
- An experiment with two treatments and no control group might volunteers to treatments should result in comparable groups,
allow us to determine whether there is a difference between and so treatment effects can still be assessed.
the two treatments and even to assess the magnitude of the

difference if one exists, but it would not allow us to assess the
individual effect of either treatment.
SOME ISSUES THAT NEEDS TO BE ADDRESSED FOR
- For example, without a control group, we might be able to say OBSERVATIONAL STUDIES:
that there is no difference in the increase in mileage for two 1. What is the population of interest? What is the sampled
different gasoline additives, but we would not be able to tell if population? Are these two populations the same? If the
this was because both additives increased gas mileage by a sampled population is only a subset of the population of
similar amount or because neither additive had any effect on interest, undercoverage limits our ability to generalize to the
gas mileage.
population of interest. Overcoverage results when the
sampled population is actually larger than the population of
interest.
USE OF A PLACEBO 2. How were the individuals or objects in the sample actually
selected? A description of the sampling method helps the
reader to make judgments about whether the sample can
reasonably be viewed as representative of the population of
interest.
- In experiments that use human subjects, use of a control 3. What are potential sources of bias, and is it likely that any of
group may not be enough to determine whether a treatment these will have a substantial effect on the observed results?
really does have an effect. People sometimes respond merely When describing an observational study, you should
to the power of suggestion.
acknowledge that you are aware of potential sources of bias
and explain any steps that were taken to minimize their effect.
SINGLE-BLIND & DOUBLE-BLIND EXPERIMENTS

SOME ISSUES THAT NEEDS TO BE ADDRESSED FOR
EXPERIMENTS:
1. What is the role of random assignment? All good experiments
use random assignment as a means of coping with the
effects of potentially confounding variables that cannot easily
be directly controlled. When describing an experimental
design, you should be clear about how random assignment
(subjects to treatments, treatments to subjects, or treatments
to trials) was incorporated into the design.
2. Were any extraneous variables directly controlled by holding

EXPERIMENTAL UNITS & REPLICATION
them at fixed values throughout the experiment? If so, which
- An experimental unit is the smallest unit to which a treatment is
ones and at which values?
applied.
3. Was blocking used? If so, how were the blocks created? If an

- In the language of experimental design, treatments are
experiment uses blocking to create groups of homogeneous
assigned at random to experimental units, and replication
experimental units, you should describe the criteria used to
means that each treatment is applied to more than one
create the blocks and their rationale.
experimental unit.
- Replication is necessary for random assignment to be an

effective way to create similar experimental groups and to get

a sense of the variability in the values of the response for
individuals who receive the same treatment.
- Do not confuse replication in an experimental design with

replicating an experiment. Replicating an experiment means
conducting a new experiment using the same experimental

Data Analysis Notes 2 PDF

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Data Analysis Notes 2 PDF

Hochgeladen von

Copyright:

Verfügbare Formate

CE 006 - ENGINEERING DATA ANALYSIS 2ND SEMESTER (AY 2019-2020)

DRAWING CONCLUSIONS FROM STATISTICAL STUDIES

- It is important to keep in mind the questions we hope to

- The type of conclusion that can be drawn from a statistical

TYPES OF STATISTICAL STUDIES

- In an experiment the researcher controls who is in which

possibility that the observed eﬀect is due to some variable

- CONFOUNDING VARIABLE: A variable that is related to both

- EXAMPLES OF CONFOUNDING VARIABLE IN STUDY:

ENGR. L.V. YNGENTE 1

SIMPLE RANDOM SAMPLING

- There must be no bias (i.e.no systematic tendency to over or

• Write each of the N data values in a poker chip, and then

PARAMETER • Write each of the N data values in a numbered list, and

the parameter value is unknown since we rarely can observe

- EXAMPLE: Suppose e wish to estimate the proportion of

- Someways to get 10 random integers between 1 and 875.

• Names and addresses of all registered voters in Quezon City

• Names and addresses of all registered Civil Engineers in the

• Number of cars in the TIP-QC parking lot

• Number of Civil Engineering Students enrolled at TIP-QC

• Number of cement nails produced by an assembly line

(When N is at least 20 times n, i.e. when N/n > 20)

- A systematic sample of n items from a population of N items

- Items are chosen by randomization or chance procedure.

ENGR. L.V. YNGENTE 2

- If the this diﬀerence between the included and the

individual subgroup is called stratum (the singular of strata).

population than does simple random sampling.

- It is easier to produce relatively accurate estimates of 3. COVERAGE ERROR

CLUSTER SAMPLING 4. MEASUREMENT ERROR

- The ideal situation for cluster sampling is when each cluster

- Even using a probability sampling method, it is possible

- Cannot be prevented and is generally undetectable.

- QUOTA SAMPLING is a special kind of judgement sampling, in

- Its sole virtue is that it is quick.

- Can distort results if those who respond diﬀer in

- Introduced when the ay the sample is selected

- If those who are excluded from the sampling process

ENGR. L.V. YNGENTE 3

SIMPLE COMPARATIVE EXPERIMENTS - Random assignment - either of subjects to treatments or of

- Random assignment can be eﬀective if only the number of

- Replication is the design strategy of making multiple

THE PUPOSE OF THE EXPERIMENT:

PRINCIPLES OF EXPERIMENTAL DESIGN

- If an extraneous variable is confounded with the explanatory

GOAL OF EXPERIMENTAL DESIGN

1. Minimizes the eﬀect of extraneous sources of variability in the

2. Creates experimental groups that are similar with respect to

ENGR. L.V. YNGENTE 4

USE OF CONTROL GROUP design as a previous experiment. This is a way of confirming

- The use of control group allows the experimenter to assess

the two treatments and even to assess the magnitude of the

SINGLE-BLIND & DOUBLE-BLIND EXPERIMENTS

2. Were any extraneous variables directly controlled by holding

3. Was blocking used? If so, how were the blocks created? If an

- Replication is necessary for random assignment to be an

eﬀective way to create similar experimental groups and to get

- Do not confuse replication in an experimental design with

ENGR. L.V. YNGENTE 5

Das könnte Ihnen auch gefallen