Sie sind auf Seite 1von 32

INTRODUCTION OF STATISTICS

STATISTICS
 It is science which deals with the methods in the collection, gathering, presentation,
analysis and interpretation of data.
Origin and development of Statistics
 As early as 3500 B.C., Statistics had been Used in Egypt in recording the number of sheep
or cattle owned, the amount of people living in a Particular city.
 In 3800 B.C., Babylonian government used Statistics to measure the number of men under
the king’s rule and the vast territory that he occupied.
 In 700 B.C., Roman Empires used Statistics by conducting registration to Record
population for the purpose of Collecting taxes.
IMPORTANT PERSONS IN STATISTICS
John Graunt (1620 - 1674)
 Records “bills of mortality” that included Information about the numbers and causes of
deaths in the city of London
De Moivre (1773)
 De Moivre discovered the equation of Normal Distributions
Adolphe Quetelet (1796 - 1874)
 He known as “Father of Modern Statistics
Karl Pearson (1857 - 1936)
 He discovered probability theory of regression and correlation
Ronald Fisher (1890 - 1962)
 He developed the f-tool in inferential statistics and for experimental design.
George Gallup (1857 - 1936)
 He was an instrumental in making statistical polling, common tool in political campaigns.

THE USED OF STATISTICS


Educational
 Assess student’s performance and Correlate factors affecting teaching and learning process
to improve quality of education.

Psychology
 Determine attitudinal Patters, the causes and effects Of misbehavior.
Business and Economics
 Validate or test a claim or inferences about a group of people, objects or a series of events.
Medicine
 It collects information about patience and diseases to make decisions about the use of new
drugs treatment.
Meteorologist
 It finds patterns in the weather and predictions about what future weather will be like.

IMPORTANCE OFTHE STUDY OF STATISTICS


• It is a way to make to more easily understand a certain topic or presentation.
• It makes the gathered data easy to complete especially in survey.
• Ideas can easily be analyzed and studies can be given conclusions at once.
• Answers questions can be easily defined.
TWO FIELD OF STATISTICS
1. Descripted Statistics
 It is concerned methods of collecting organizing, and presenting data appropriately and
creatively to describe or asses group characteristics. (Summarizing the Data.)
2. Inferential Statistics
 It is concerned with inferring or drawing conclusions like decisions, predictions, or
generalizations about the data set.

IMPORTANT TERM IN STATISTICS


Populations
 It is the complete collections of Elements (scores, people, animals, …) to be studied.
Census
 It is a collections of data from every elements drawn from a population.

Sample
 It is a collections of data from every elements drawn from a population.

TYPES OF SAMPLES
Random Sample
 It is the most commonly used sampling techniques. It is a procedure where every elements
of population is given an equal chance of being selected as a member of the sample.
Convenience Sample
 It is a sample that is chosen so that it is easy for the researcher.
Cluster Sample
 It is a sample that consists of items in a group such as neighborhood or a household the
group maybe at random.

DATA COLLECTION
 Data can be define as the value of a variable (e.g. number, images, words, figures, facts or
ideas)
 It is a lowest unit of information from which other measurements and analysis can be done.
 Data is one of the most important and vital aspect of any research study.
Factors to be Considered Before Collection of Data
 Object and scope of the enquiry.
 Sources of information.
 Quantitative expression.
 Techniques of data collection.
 Unit of collection.
METHODS OF PRIMARY DATA COLLECTION
1. Questionnaire Method
2. Interview Method
3. Focus Group Discussion (FGD)
4. Participatory rural Appraisal/ Assessment (PRA)
5. Rapid Rural Appraisal/ Assessment (RRA)
6. Observation Method
7. Survey Method
8. Case Study method
9. Diaries Method
10. Principal Component Analysis (PCA)
11. Activity Sampling Technique
12. Memo Motion Study
13. Process Analysis
14. Link Analysis
15. Time and Motion Study
16. Experimental Method
17. Statistical Method

SOURCES OF DATA
1. External sources
 Primary data
 Secondary data
2. Internal sources

INTERNAL SOURCES OF DATA


 Many institutions and departments have information about their regular functions, for
their own internal purposes.
 When those information are used in any survey is called internal sources of data.
 Eg…social welfare societies.
EXTERNAL SOURCES OF DATA
 When information is collected from outside agencies is called external sources of data.
 Such types of data are either primary or secondary.
 This type of information can be collected by census or sampling method by conducting
survey.

Primary Data
 Data that has been collected from first-hand experiences is known as primary data. It has
more reliable, authentic and not been published anywhere.
 Primary data has not been changed or altered by human being; therefore, its validity is
greater than secondary data.
Methods Of Collecting Primary Data
 Direct Personal Investigation (i.e. interview method)
 Investigation through observation
 Investigation through mailed questionnaire
 Investigation through local reporters’ questionnaire
 Indirect oral investigation (i.e. through enumerators)

Secondary Data
 Secondary data are those that have already been collected by others.
 These are usually in journals, periodicals, research publication, official record etc.
 Secondary data may be available in the published or unpublished form. When it is not
possible to collect the data by primary method, the investigator go for secondary method.
 This data collected for some purpose other than the problem at hand.
Method Of Collection Secondary Data
1. Published Sources
 International
 Government
 Municipal corporation
 Institutional/ commercial
2. Unpublished sources
SECONDARY DATA
MERITS DEMERITS
Quick and cheap source of data No fulfill our specific research needs
Wider geographical area Poor accuracy
Longer orientation period Data are not up to date
Leading to find primary data Poor accessibility in some cases
DIFFERENCE B/W PRIMARY AND SECONDARY DATA
Primary Data Secondary data
Real time data Past data
Sure about sources of data Not sure about of sources of data
Help to give results/finding Refining the problem
Costly and time consuming process Cheap and no time consuming process
Avoid biasness of response data Cannot know in data biasness or not
More flexible Less flexible

TYPES OF STUDIES
Agenda
• Dependent Variable
• Independent Variable
• Intervening/Mediating Variable
• Organismic Variable
• Control/Constant Variable
• Interval variable
• Ratio variable
• Nominal/Categorical variable
• Ordinal variable
• Dummy variables
• Preference variable
• Multiple response variable
• Extraneous Variable
VARIABLE
• Any characteristic which is subject to change and can have more than one value such as
age, intelligence, motivation, gender, etc.
Dependent Variable
• Variable affected by the independent variable
• It responds to the independent variable.
Independent Variable
• Variable that is presumed to influence other variable
• It is the presumed cause, whereas the dependent variable is the presumed effect.
Example 1 Example 2

You are interested in “How stress affects Promotion affects employees’ motivation
mental state of human beings?”
Independent variable  Promotion
Independent variable  Stress Dependent variable  Employees motivation
Dependent variable  mental state of human
beings
You can directly manipulate stress levels in
your human subjects and measure how those
stress levels change mental state.

Other Names For Dependent And Independent Variables


Dependent Variable Independent Variable

• Explained • Explanatory
• Predictand • Predictor
• Regressand • Regressor
• Response • Stimulus
• Outcome • Covariate
• Controlled • Control
Intervening/Mediating Variable
 It is a variable whose existence is inferred but it cannot be measured.
Example 1 Example 2
Determining the effect of video clips on Higher education typically leads to higher
learning ability of students of M.Phil. income
The association between videoclips and Higher education  independent variable
learning ability needs to be explained. Higher income  dependent variable
Other variables intervene Better occupation  intervening variable
Such as anxiety, fatigue, motivation, improper
diet, etc. It is causally affected by education and itself
It is caused by the independent variable and is affects income.
itself a cause of the dependent variable.

Organismic Variable
 Any characteristic of the research participant/individual under study that can be used for
classification
 Such as personal characteristics of gender, height, weight, age, etc. in behavioral sciences.
Control/Constant variable
 It is variable that is NOT allowed to be changed unpredictably during an experiment.
 As they are ideally expected to remain the same, they are also called constant variables.
Example
An example of a constant variable is the voltage from a power supply.

If you are examining “How electricity affects experimental subjects” you should keep the
voltage constant, otherwise the energy supplied will change as the voltage will be changed.

Interval Variable
 Interval variables have a numerical value
 These have order and equal intervals.
 They allow not only to rank order the items that are measured but also to quantify and
compare the magnitudes of differences between them.
Example
Suppose you have a variable such as monthly income that is measured in rupees, and we have
three people who make
• Rs. 10,000
• Rs. 15,000 and
• Rs. 20,000
Ratio Variable
 A ratio variable is similar to an interval variable with one difference: the ratio makes
sense.
Example
• Let’s say respondents were being surveyed about their stress levels on a scale of 0-10.
• A respondent with a stress level of 10 should have twice the stress experienced as a
respondent who selected a stress level of 5.

Age, height, and weight are also good examples of ratio variables. Someone who is 6’.0” tall is
twice as tall as someone who is 3’.0” tall.

Nominal/Categorical Variable
• They can be measured only in terms of whether the individual items belong to certain
distinct categories
• We cannot quantify or even rank/order the categories:
• Nominal data has no order
• One cannot perform arithmetic (+, -, /, *) or logical operations (>, <, =) on the nominal
data.

Example
Gender: (Dichotomous Variable) Marital Status:
1. Male 1. Unmarried
2. Female 2. Married
3. Divorcee
4. Widower

Ordinal Variable
 An ordinal variable is a nominal variable, but its different states are ordered in a meaningful
sequence.
 Ordinal data has order but the intervals between scale points may be uneven.
 Because of lack of equal distances, arithmetic operations are impossible, but logical
operations can be performed on the ordinal data.
 A typical example of an ordinal variable is the socio-economic status of families.
 We know 'upper middle' is higher than 'middle' but we cannot say 'how much higher'.
Example
A questionnaire on the time involvement of scientists in the 'perception and identification of
research problems'.
The respondents were asked to indicate their involvement by selecting one of the following
codes:
1 = Very low or nil 2 = Low
3 = Medium
4 = Great
5 = Very great
Here, the variable 'Time Involvement' is an ordinal variable with 5 states.

Dummy Variable
 A qualitative variable can be transformed into quantitative variable(s), called dummy
variable.
Preference Variable
 Preference variables are specific discrete variables whose values are either in a decreasing
or increasing order.

For example,
In a survey, a respondent may be asked to indicate the importance of the following FIVE sources
of information in his research and development work, by using the code [1] for the most important
source and [5] for the least important source:
1. Literature published in the country
2. Literature published abroad
3. Scientific abstracts
4. Unpublished reports, material, etc.
5. Discussions with colleagues within the research unit
Multiple Response Variable
 Multiple response variables are those which can assume more than one value
Example
A typical example is a survey questionnaire about the use of computers in research.
The respondents were asked to indicate the purpose(s) for which they use computers in their
research work. The respondents could score more than one category.
1. Statistical analysis
2. Lab automation/ process control
3. Data base management, storage and retrieval
4. Modeling and simulation
5. Scientific and engineering calculations
6. Computer aided design (CAD)
Extraneous Variable
 Extraneous variables are undesirable variables that influence the relationship between the
variables an experimenter is examining.
Example
An educational psychologist has developed a new learning strategy and is interested in
examining the effectiveness of this strategy.
The experimenter randomly assigns students into two groups. All of the students’ study text
materials on a biology topic for thirty minutes. One group uses the new strategy and the other
uses a strategy of their choice.
Then all students complete a test over the materials.
Extraneous variable  pre-knowledge of the biology topic

Random Variable
Statistics and Probability
Competencies:
 Illustrates a random variable (discrete and continuous);
 Illustrates a random variable (dependent and independent);
 distinguishes between a discrete and continuous random variable and dependent
and independent variable; and
 Finds the possible values of a random variable.

Variable – object, person and characteristics of a person under investigation.


 Examples: height, weight, number of students, behavior, etc.
Discrete variables
– can be obtained by counting
 Examples: Number of births, death, marriages, number of students in any given period of
time
Continuous variables
– obtained by measurement
- can assume any values
 Examples: Height, Weight, ages, time, temperatures, volumes, areas
Practice Activity
Determine if the following examples are discrete or continuous variables.
DISCRETE CONTINUOUS
 The number of boys in a randomly  The temperature of a cup of coffee
selected three- child family. served at a restaurant.
 The number of no-shows for every 100  The average amount spent on
reservations made with a commercial electricity each July by a randomly
airline. selected household in a certain state.
 The number of vehicles owned by a
randomly selected household.

Dependent variable – values are predicted


Independent variable – use to predict the values

DEPENDENT VS. INDEPENDENT


• The independent variable is the one the experimenter controls. The dependent variable
is the variable that changes in response to the independent variable.
• The two variables may be related by cause and effect. If the independent variable changes,
then the dependent variable is affected.

Example 1:
Factors Affecting Academic Performance of Grade 11 Students
Factors may include I.Q., study habits, etc.
Independent Variable Dependent Variable

Factors Affecting Academic Performance of Factors may include I.Q., study habits, etc.
Grade 11 Students

Example 2:
Mr. S set up an experiment to see how the mass of a ball affects the distance it rolls off a ramp.
Example 3:
Eating breakfast in the morning increases the ability to learn in school.
Independent Variable Dependent Variable
mass of a ball distance it rolls off a ramp
Eating breakfast ability to learn

Finding the Possible Values of a Random Variable


FORMULA:
𝒏
Xi
𝒊=𝟏
• means the sum of X1, X2, … Xn.
GRAPHICAL REPRESENTATION AND METHODS
VARIABLES
• A variable is a characteristic or condition that can change or take on different values.
• Most research begins with a general question about the relationship between two variables
for a specific group of individuals.
Types of Variables
Variables can be classified as
• Discrete variables (such as class size) consist of indivisible categories.
• Continuous variables (such as time or weight) are infinitely divisible into whatever units
a researcher may choose. For example, time can be measured to the nearest minute, second,
half-second, etc.
Data
 Statistical data are usually obtained by counting or measuring items. Most data can be put
into the following categories:
• Qualitative - data are measurements that each fail into one of several categories. (hair
color, ethnic groups and other attributes of the population)
• Quantitative - data are observations that are measured on a numerical scale (distance
traveled to college, number of children in a family, etc.)

GRAPHICAL REPRESENTATION
• The visual display of statistical data in the form of points,
• Lines, areas and other geometrical forms and symbols, is in the most general terms known
as graphical representation.

BAR GRAPH
• A bar graph is a chart that uses either horizontal or vertical bars to show comparisons
among categories.

A simple bar graph to display profit a bank for 5 years:


TYPE OF BAR GRAPHS
o Single (vertical)
o Grouped
o Stacked
o Horizontal

LINE DIAGRAM
 A graph that shows information that is connected in some way (such as change over time).
 Line graph represent data or information in the form of dots, and these dots shows like a
line in the particular graph.

PIE DIAGRAM
 Pie diagram is a circular diagram where the whole circle represents a total and the
components of the total are represented by sectors of the pie diagram.
 Pie diagram is also called sector diagram.
 Example (Pie Chart)
The Chart below shows the percentage of usage of different browser in Europe. In this
chart 37.9% of people in Europe use Firefox and 15.5% of people use chrome, vice versa.

PICTOGRAM
 A pictogram is a popular device for portraying the statistical data by means of pictures or
small symbols.
HISTOGRAM
 A histogram consists of a set of adjacent rectangles whose bases are marked off by class
boundaries (not class limits) on the X- axis and whose heights are proportional to the
frequencies associated with respective classes.
 Example (Histogram)
SAMPLING DESIGN AND SAMPLING DISTRIBUTIONS
Target Population
 The target population is the collection of elements or objects that possess the information
sought by the researcher and about which inferences are to be made.

TERMINOLOGY
ELEMENT
 is the object about which or from which the information is desired, e.g., the respondent
SAMPLING UNIT
 is an element, or a unit containing the element, that is available for selection at some stage
of the sampling process
EXTENT
 refers to the geographical boundaries
TIME
 is the time period under consideration

Important Qualitative Factors That Determine The Sample Size


– The importance of the decision
– The nature of the research
– The number of variables
– The nature of the analysis
– Sample sizes used in similar studies
– Incidence rates
– Completion rates
– Resource constraints
The Sampling Frame
 Define the target population
 Select a sampling frame
 Determine if a probability or non-probability sampling method will be chosen
 Plan procedure for selecting sampling units
 Determine sample size
 Select actual sampling units
 Conduct fieldwork

STATISTICAL ERRORS
– The difference between the value of a sample statistic of interest and the value of the
corresponding population parameter a statistical error has occurred.
Types of Errors
1. Random Sampling Error
 The difference between the sample result and the result of a census conducted using
identical procedures
 These errors are due to chance fluctuations
2. Systematic Error
 Systematic (non sanmpling) errors result from non sampling factors, primarily the
nature of a study’s design and the correctness of execution
 These are not due to chance fluctuatuions
Classification of Sampling Techniques
Sampling Techniques
Non probability Sampling Techniques
– Convenience Sampling
– Judgmental Sampling
– Quota Sampling
– Snowball Sampling

Probability Sampling Techniques


– Simple Random Sampling
– Systematic Sampling
– Stratified Sampling
– Cluster Sampling
– Other Sampling Techniques
Types of Non probability sampling
1. Convenience Sampling
– The sampling procedure of obtaining those people or units that are most
conveniently available.
– Best used for exploratory research.
2. Quota Sampling
– A nonprobability sampling procedure that ensures that various subgroups
of a population will be represented on pertinent characteristics to the exact
extent that the investigator desires.

POSSIBLE SOURCES OF BIAS


– haphazard selection of subjects
ADVANTAGES
– Speed of data collection
– Lower costs
– Convenience
3. Judgment Sampling
– A nonprobability sampling technique in which an experienced individual
selects the sample based on personal judgment about some appropriate
characteristics of the sample member
4. Snowball sampling
– A sampling produce in which initial respondents are selected by probability
methods and additional respondents are obtained from information provided
by the initial respondents.
– It uses referrals for selecting respondents
ADVANTAGES
– Reduced sample size
– Reduced cost
Probability Sampling
• The sampling techniques where selection procedure is based on chance are called
probability sampling techniques.
Types of Probability Sampling
1. Simple Random Sampling
• The sampling procedure that ensures each element in the population will have an
equal chance of being included in the sample is called simple random sampling.

2. Systematic Sampling
• A sampling procedure in which a starting point is selected by a random process and
then every nth number on the list is selected.

3. Stratified Sampling
• A probability sampling procedure in which simple random subsamples that are
more or less equal on some characteristic are drawn from within each stratum of
population.

4. Proportional versus Disproportional Sampling


• Proportional
– A stratified sample in which the number of sampling units drawn
from each stratum is in proportion to the population size of that
stratum.
• Disproportional
– A stratified sample in which the sample size for each stratum is
allocated according to analytical considerations

5. Cluster Sampling
• An economically efficient sampling technique in which the primary sampling unit
is not the individual element in the population but a cluster of element; clusters are
selected randomly.

6. Multistage area sampling


• Sampling that involves using a combination of two or more probability sampling
techniques

Selecting an Appropriate Sample Design


A researcher who must decide on the most appropriate sample design for a specific project
will identify a number of sampling criteria and evaluate the relative importance of each criterion
before selecting a sampling design.
Sampling Criterion
Degree of Accuracy
– Depends on the researcher’s tolerance for errors in sampling and
requirements of the project
Resources
– Depends on the researcher’s financial and human resource
constraints
Time
– Depends on the deadline of the project completion
Advance Knowledge of the Population
– Depends on the availability of details of population characteristics
National vs Local
– Depends on the geographic proximity of the population elements

NONPROBABILITY SAMPLES
DESCRIPTION COST AND ADVANTAGES DISADVANTAGES
DEGREE OF USE

Convenience - The - Very low cost, No need for list of -Unrepresentative


researcher uses the most - extensively used population samples likely,
convenient sample or - random sampling error
economical sample estimates cannot be made
units projecting data beyond
sample is relatively risky

Judgement – an expert - Moderate cost, - Useful for certain - Bias due to expert’s
or experienced - average use types of forecasting, beliefs may make sample
researcher selects the - sample guaranteed unrepresentative,
sample to fulfill a to meet a specific projecting data beyond
purpose, such as objective sample is risky
ensuring that all
members have a certain
characteristic
Quota – the researcher - Moderate cost, - Introduces some - Introduces bias in
classifies the population - very extensively stratification of researcher’s classification
by pertinent properties, used population, of subjects,
determines the desired - requires no list of - nonrandom selection
proportion to sample population within classes means
from each interviewer error from population
cannot be estimated,
projecting data beyond
sample is risky
Snowball – initial - Low cost, Useful in locating High bias because sample
respondents are - used in special members of rate units are not independent,
selected by probability situations populations projecting data beyond
samples, additional sample is risky
respondents are
obtained by referral
from initial respondents

PROBABILITY SAMPLES
Simple Random – the High cost Only minimal
Requires sampling frame
researcher assigns advance knowledge
to work from, does not
each member of the Moderately used in of population
use knowledge of
sampling frame a practice (most needed, easy to population that researcher
number, then selects common in random analyze data andmay have, larger errors
sample unit by digit dialing and with compute error for same sampling,
random method computerized respondents may be
sampling frames widely dispersed, hence
cost may be higher
Systematic – the Moderate cost Simple to draw, If sampling interval is
researcher users Moderately used easy to check related to periodic
natural ordering or the ordering of the
order of the sampling population, may
frame, selects an introduce increased
arbitrary starting variability
point, then selects
items at a preselected
interval
Stratified – the High cost Ensures Requires accurate
researcher divides the Moderately used representation of all information on proportion
populations into groups in sample, in each stratum, if
groups and randomly characteristics of stratified list are not
selects subsamples each stratum can be already available, they
from each group. estimated and can be costly to prepare
Variations include comparisons made,
proportional, reduces variability
disproportional and for same sample
optimal allocation of size
subsample sizes
Cluster – the Low cost If clusters Larger error for
researcher selects Frequently used geographically comparable size than with
sampling units at defined, yields other probability samples,
random, the does a lowest field cost, researcher must be able to
complete observation requires listing of assign population
of all units or draws a all clusters, but of members to unique
probability sample in individuals only cluster or else duplication
the group within clusters can or omission of individuals
estimate will result
characteristics of
clusters as well as of
population
Multistage - High cost Depends on Depends on techniques
progressively smaller Frequently used, techniques combined
areas are selected in especially in combined
each stage by some nationwide surveys
combination of thhe
first hour techniques

Internet Sampling
Advantages
• Allow researchers to reach a large sample rapidly
• Sample size requirements can be met quickly
• Easier to carry out
• Less costly
Disadvantages
• Lack of computer ownership and internet access
• Unrepresentative of all target populations

Web Site Visitors


• Volunteer respondents
• Unrestricted/convenience samples
• Arrive haphazardly
• Random selection of sample units is a better option
• Done through Pop-up ads
• Problem of over representing the frequent visitors to the site
• Can be controlled by several techniques like cookies, prescreening etc
• Valuable if the target population is defined as visitors to a particular Web site
Panel Samples
• Drawing a probability sample from an established consumer panel or other pre-recruited
membership panel
• Yields a high response rate
• Easier to select the panelists based on the data of their previously answered questionnaires
• Panelists are compensated for their time with a sweepstakes, a small cash incentive, or
redeemable points, etc
• Allows the company to draw simple random samples, stratified samples, and quota samples
Recruited Ad Hoc Samples
• A sampling frame of e-mail addresses on an ad hoc basis
• Can be done online or offline
• Can be compiled from many sources, including customer/client lists, advertising banners
on pop-up windows that recruit survey participants, online sweepstakes, and registration
forms
• Respondents maybe contacted by “snail mail” or by telephone to ask for their e-mail
addresses and obtain permission for an Internet survey
• Offline techniques used are random-digit dialing and short telephone screening interviews
Opt-in Lists
• To give permission to receive selected e-mail, such as questionnaires, from a company with
an internet presence
• E-mail is sent only to authorized recipients
• Each individual has to confirm and reconfirm their consent to participate in the survey
• Unsolicited survey request is treated as spam
• High response rate cannot be expected from the individuals who have not agreed to be
surveyed
• It can lead to complaints to the Internet Service Providers and the survey site may be shut
down
AN OVERVIEW OF DESIGN OF EXPERIMENTS
EXPERIMENT
• involve manipulation of one or more independent variables, and observing the
effect on some outcome (dependent variable). Experiments can be done in the
field or in a laboratory.

A QUICK HISTORY OF DESIGN OF EXPERIMENTS


 The agricultural origins, 1918 – 1940s
• R. A. Fisher & his co-workers
• Profound impact on agricultural science
• Factorial designs, ANOVA
 The first industrial era, 1951 – late 1970s
• Box & Wilson, response surfaces
• Applications in the chemical & process industries

 The second industrial era, late 1970s – 1990


• Quality improvement initiatives in many companies
• TQM were important ideas and became management goals
• Taguchi and robust parameter design, process robustness
 The modern era, beginning 1990
• Six sigma, Lean Six sigma
• Clinical Trails, Mathematical biology.
• Algorithm design and analysis,
• Networking, group testing, and cryptography
WHY WE USE EXPERIMENTAL DESIGNS
"All experiments are designed experiments, it is just that some are poorly designed and
some are well- designed."
 Experimental designs are used so that the treatments may be assigned in an organized
manner to allow valid statistical analysis to be carried out on the resulting data.

WHAT IS DESIGN OF EXPERIMENTS


• It is a logical planning (or construction) of the experiment having a complete
sequence of steps taken ahead of time to ensure that the appropriate data will be
obtained in a way which permits an objective analysis of a particular problem
leading to valid and precise inference in most economic and useful forms.

SUBJECT MATTER OF DESIGN OF EXPERIMENTS


It includes:
• Planning of the experiment
• Obtaining data from it
• Making statistical analysis of the data obtained.

HOW DESIGN OF EXPERIMENT CONTRIBUTES


• Reduce time to design/develop new products & processes
• Improve performance of existing processes
• Improve reliability and performance of products
• Achieve product & process robustness
• Perform evaluation of materials, design alternatives, setting component & system
tolerances, etc.

TERMINOLOGY
Control Group - A group assigned to the experiment, but not for the purpose of being exposed
to the treatment. Performance of this group serves as a baseline.
Treatment Group - The Group in an experiment which receives the specified treatment.
Factor - This term is used when an experiment involves more than one variable. These
variables are often identified as factor.
Level - Refers to the degree or intensity of a factor.
Randomness -refers to the property of completely chance events that are not predictable.
Replication - The repetition of the treatment under consideration.
Blocks - refers to the categories of subjects with a treatment 2/g7/r2o02u0 p.

EXPERIMENTAL ERROR
o is the variation in the responses among experimental units which are assigned the same
treatment, and are observed under the same experimental conditions. It is measured by
SSE (or MSE). Ideally, we would like experimental error to be zero.
This is impossible because of (at least) one or more of the following reasons:
• There are inherent differences in the experimental units before they receive treatments.
• There is variation in the devices that record the measurements.
• There is variation in applying or setting the treatments.
• There are extraneous factors other than the treatments which affect the response.

ANALYSIS OF VARIANCE (ANOVA)


• the statistical technique was first developed by R.A Fisher and was extensively used for
agriculture experiments
• It is mainly employed for comparison of means of 3 or more samples including the
variations in each sample.
• ANOVA is the method to estimate the contribution made by each factor to the total
variation.

THE STEPS IN DESIGNING AN EXPERIMENT


Step 1: Identify the problem or claim to be studied. The statement of the problem needs to be
as specific as possible. As your text says, it must "identify the response variable and the
population to be studied".
Step 2: Determine the factors affecting the response variable. This is best done by an expert
in the field, but we'll be able to do this for most examples we'll be looking at.
Step 3: Determine the number of experimental units. In general, more experimental units is
better. Unfortunately, time and money will always be limiting factors, so we have to decide
an appropriate number
Step 4: Determine the level(s) of each factor.We split factors up into three categories:
o Control: If possible, we try to fix the level of factors that we're not interested
in.
o Manipulate: This is the treatment - we manipulate the levels ofthe variable that
we think will affect the response variable.
o Randomize: Often, there are factors we just can't control. To mitigate their
effect on the data, we randomize the groups. By randomly assigning
experimental units, these factors should be equally spread among all groups.
Step 5: Conduct the experiment.
Step 6: Test the claim.
Step 7: Interpret the results
BASIC PRINCIPLE OF DESIGN OF EXPERIMENTS
• Randomization
• Replication
• Local Control (Blocking)

Complete and Incomplete Block Designs

SOME EXPERIMENTAL DESIGNS


• Completely Randomized Design (CRD)
• Randomized Block Design (RBD)
• Latin Square Design (LSD)
• Factorial Designs
• Balanced Incomplete Block Design (BIBD)
• Nested Balanced Incomplete Block designs (NBIBD)
• Balanced Incomplete Block Design with Nested Rows and Columns

COMPLETE DESIGNS
COMPLETELY RANDOMIZED DESIGN (CRD)
• COMPLETELY RANDOMIZED DESIGNS are the simplest design in which the treatments
are assigned to the experimental units completely at random. This allows every
experimental unit to have an equal probability of receiving a treatment.
• For CRD, any difference among experimental units receiving the same treatment is
considered as experimental error.
CHARACTERISTICS OF THE CRD
• CRD is the simplest design to use.
• CRD is appropriate only for experiments with homogeneous experimental units, such as
laboratory experiments, where environmental effects are relatively easy to control. .
• The CRD is best suited for experiments with a small number of treatments.
• For field experiments, where there is generally large variation among experimental plots
in such environmental factors as soil, the CRD is rarely used.
• Every experimental unit has the same probability of receiving any treatment
• Treatments are assigned to experimental units completely at random
EXAMPLE OF CRD
• In order to determine whether there is significant difference in the durability of 3 makes of
computers, samples of size 5 are selected from each make and the frequency of repair
during the first year is observed. The results are as follows:

Makes

A B C

5 8 7

6 10 3

8 11 5

9 12 4
7 4 1

VARIOUS STEPS TO BE FOLLOWED


• Write the hypotheses to be tested.
• Calculate the Correction Factor.
• Calculate the Total SS
• Calculate the Treatment SS
• Calculate the Error SS
• Complete the ANOVA table
• Look up Table F-values.
• Make conclusions.

HYPOTHESIS
H0: The three makes of computers do not differ significantly in the durability.
H1: Atleast one of the makes of computers differ significantly in the durability.

ADVANTAGES
• Very flexible design (i.e. number of treatments and replicates is only limited by the
available number of experimental units).
• Statistical analysis is simple compared to other designs.
• Loss of information due to missing data is small compared to other designs due to
the larger number of degrees of freedom for the error source of variation.
• Provides maximum number of degrees of freedom.
DISADVANTAGES
• If experimental units are not homogeneous and you fail to minimize this variation
using blocking, there may be a loss of precision.
• Usually the least efficient design unless experimental \units are homogeneous.
• Not suited for a large number of treatments.
RANDOMISED BLOCK DESIGN (RBD)
• Any experimental design in which the randomization of treatments is restricted to groups
of experimental units within a predefined block of units assumed to be internally
homogeneous is called a randomized block design.
• Divides the group of experimental units into n homogeneous groups of equal or unequal
sizes.
• These homogeneous groups are called blocks.
• The treatments are then randomly assigned to the experimental units in each block - one
treatment to a unit in each block.

CHARACTERISTICS OF RBD
• A randomized block experiment is assumed to be a two-factor experiment., the factors are
blocks and treatments.
• The blocks of experimental units are uniform.
• There is one observation per cell. It is assumed that there is no interaction between blocks
and treatments.
• The degrees of freedom for the interaction is used to estimate error.
• Treatments randomly assigned to each experimental unit of a block.
ADVANTAGES
• Complete flexibility can have any number of treatments and blocks.
• Provides more accurate results than the completely randomized design due to
grouping.
• Relatively easy statistical analysis even with missing data.
• Some treatments may be replicated more times than others.
• Whole treatments or entire replicates may be deleted from the analysis.
DISADVANTAGES
• Not suitable for large numbers of treatments because blocks become too large, and
there is possibility of heterogeneity among the experimental units of the blocks
• Interactions between block and treatment effects increase error.
• Serious problem with the analysis if a block factor by treatment interaction effect
actually exists and no replication within blocks has been included. (solution: use
replication within blocks when possible).
LATIN SQUARE DESIGN (LSD)
• A Latin square is a square array of objects (letters A, B, C, …) such that each object
appears once and only once in each row and each column.
• Example - 4 x 4 Latin Square.
ABCD BCDA CDAB DABC
• The Latin Square Design is for a situation in which there are two extraneous sources
of variation. If the rows and columns of a square are thought of as levels of the the two
extraneous variables, then in a Latin square each treatment appears exactly once in each
row and column.
• With the Latin Square design we are able to control variation in tw 2/7o/20d20irecti4o6 ns.
CHARACTERISTICS OF LSD
• In LSD we have three factors: Treatments, Rows and Columns
• The number of treatments = the number of rows = the number of colums = t (say).
• The row-column treatments are represented by cells in a t x t array.
• The treatments are assigned to row-column combinations using a Latin-square
arrangement, that is each row contains every treatment. and each column contains
every treatment.
• Every treatment occurs once in each row and column.
HYPOTHESIS
H0A: There is no significant difference between burners.
H1A: At least one of the burner is significantly different.
H0B: There is no significant difference between the days.
H1B: At least one of the day is significantly different
H0C: There is no significant difference between Engines.
H1C: At least one of the engine is significantly different

ADVANTAGES
 We can control variation in two directions. It means LSD is more efficient then
CRD and RBD.
 Being 3-way design, it is economic over the corresponding complete 3-
way design. Instead of 𝑟3 experimental units, here only 𝑟2 experimental units are
sufficient.
 The analysis remains relatively simple even with missing data.
DISADVANTAGES
 Number of treatment is limited to the number of replicates which seldom
exceeds 10.
 If we have less than 5 treatments, the df for controlling random variation is
relatively large and the df for error is small.
 The number of treatments must equal the number of replicates.
 The experimental error is likely to increase with the size of the square.
 Evaluation of interactions between rows and columns, rows and treatments &
columns and treatments is not possible separately.
FACTORIAL EXPERIMENT
• Factorial designs include two or more factors, each having more than one level or
treatment. Participants typically are randomized to a combination that includes one
treatment or level from each factor.
NESTED DESIGNS
• In certain multifactor experiments, the levels of one factor are similar but not identical for
different levels of another factor, (is unique to that particular factor) this is called
hierarchical or nested design.

Das könnte Ihnen auch gefallen