You are on page 1of 98






(a) Definition of Research and Related concepts. Research is the formal,

systematic and intensive process of carrying on a scientific method of


Research is the systematic and objective analysis and recording of controlled

observations that may lead to the development of generalizations, principles, or
theories, resulting in prediction and possibly ultimate control of events.

Research is a more systematic activity that is directed toward discovery and the
development of an organized body of knowledge.

The characteristics of research are:

1. Research is directed toward the solution of a problem. The ultimate goal

is to discover cause-and-effect relationships between variables.

2. Research emphasizes the development of generalizations, principles, or

theories that will be helpful in predicting future occurrences.

3. Research is based upon observable experience or empirical evidence.

4. Research demands accurate observation and description. Researchers use

quantitative measuring devices and qualitative or non-quantative
descriptions of their observations.

5. Research involves gathering new data from primary or firsthand sources or

using existing data for a new purpose.

6. Although research activity may at times be somewhat random and

unsystematic, it is more often characterized by carefully designed
procedures that apply rigorous analysis.

7. Research requires expertise. The researcher knows what is already known
about the problem and how others have investigated it. He has searched
the related literature carefully, is thoroughly grounded in the terminology,
concepts and technical skills necessary to understand and analyze the data

8. Research strives to be objective and logical, applying every possible test to

validate the procedures employed, the data collected, and the conclusions

The researcher attempts to eliminate personal bias and emotion. Emphasis

is on testing rather than on proving the hypothesis.

9. Research involves the quest for answers to unsolved problems.

Pushing back the frontiers of ignorance is its goal, and originality is
frequently the quality of a good research project.

However, previous important studies are deliberately repeated, using

identical or similar procedures, with different subjects, different settings,
and at a different time.
This process is replication, a fusion of the words repetition and

10. Replication is always desirable to confirm or to raise questions about the

conclusions of a previous study.

11. Research is carefully recorded and reported.

Each important term is defined, limiting factors are recognized,
procedures are described in detail, references are carefully documented,
results are objectively recorded, and conclusions are presented with
scholarly caution and restraint.

12. Research sometimes requires courage because it sometimes arouses

violent criticism.

The terms research and scientific method are sometimes used

synonymously. Scientific method in problem solving may be an informal
application of problem identification, hypothesis formulation, observation,
analysis and conclusion.

(b) Purposes of Research

1. Assessment or Reporting

Assessment is a fact-finding activity that describes conditions that exist at

a particular time.

No hypotheses are proposed or tested, no variable relationships are
examined, and no recommendations for action are suggested.
The result is a reporting study which is merely an inquiry to provide an
account or summation of some data, perhaps the generation of some

2. Evaluation

Concerned with the application of its findings utility, or desirability of a

product, process or program in terms of carefully defined and agreed-upon
objectives or values.
It may involve recommendations for action.

3. Descriptive

Descriptive research is concerned with all the following: hypothesis

formulation and testing, the analysis of the relationships between non
manipulated variables, and the development of generalization.
Unlike the experimental method, in which variables are deliberately
arranged and manipulated through the intervention of the researcher, in
descriptive research variables that exist or have already occurred are
selected and observed.
This process is called ex post fact, explanatory observational or causal-
comparative research.

Both descriptive and experimental methods employ careful sampling

procedures so that generalizations may be extended to other individuals,
groups, times or settings.

4. Explanatory

This research is grounded in theory and theory is created to answer why

and how questions.
An explanatory study goes beyond description and attempts to explain the
reasons for the phenomenon that the descriptive study only observed.

5. Predictive

If we can provide a plausible explanation for an event after it has occurred,

it is desirable to be able to predict when and in what situations the event
will occur.
A predictive study is just as rooted in theory as explanation.
This type of study often calls for a high order of inference making.

In business research, prediction is found in studies conducted to evaluate

specific courses of action or to forecast current and future values.

We would like to be able to control a phenomenon once we can explain
and predict it.

Being able to replicate a scenario and dictate a particular outcome is the

objective of control, a logical outcome of prediction.

(c) Types of Research

1. Fundamental or pure or Basic Research

This is usually carried on in a laboratory situation, sometimes with animals as

subjects. In the business arena this might involve a researcher for an
advertising agency who is studying the results of the use of coupons versus
rebates as demand stimulation tactics, but not in a specific instance or in
relation to a specific client‟s product.

Thus, both applied and pure research are problem-solving based, but applied
research is directed much more to making immediate managerial decisions.

Hence, fundamental or pure or basic research is research in its more formal

aspects of a rigorous, structured type of analysis.
It focuses on the development of theories by the discovery of broad
generalizations or principles.
It focuses on the development of theories by the discovery of broad
generalizations or principles.
It employs careful sampling procedures to extend the findings beyond the
group or situation studied.

2. Applied Research

This has most of the characteristics of fundamental research, including the use
of sampling techniques and the subsequent inferences about the target
Its purpose is improving a product or a process – testing theoretical concepts
in actual problem situations.
It has a practical problem-solving emphasis, although the problem solving is
not always generated by a negative circumstance.

Applied research is often used to evaluate opportunities.

The problem-solving nature of applied research means it is conducted to
reveal answers to specific questions related to action, performance or policy

3. Research and Development

In many firms, Research and Development (R&D) are important aspects of

the firm‟s activities.
Although much of this is basic research not connected with any immediate
need or usefulness, some of it is for discovering new products, product and
process improvement, new technologies, new technology applications, etc.

It mainly focuses on inventions and improvements.

(d) The Research Process

The research task is usually depicted as a sequential process involving several

clearly defined steps.
In practice, recycling, circumventing and skipping occur.
But the idea of a sequence is useful for developing a project and for keeping
the project orderly as it unfolds.

1. Management Dilemma

The research process begins when a management dilemma triggers the need
for a decision.
The management question is the critical activity in the sequence.
The management dilemma is a symptom of an actual problem.

2. Management Question

The manager must move from the management dilemma to the management
question to proceed with the research process.
The management question restates the dilemma in question form.

Management questions can be the choice of purposes or objectives, the

generation and evaluation of solutions, the troubleshooting or control
situation, etc.

A poorly defined management question will misdirect research efforts since

its sets the research task.

3. Exploration

To move forward in the management-research question hierarchy (to define

the research question), some exploration information on the problem needs to
be collected.

An exploration typically begins with a search of published data, experts on the
topic, and other sources of relevant information like an experience survey,
focus groups, etc.

4. Research Question

Once the researcher has a clear statement of the management question, she
and the manager must translate it into a research question.
It has to be defined.
The research question is a fact-oriented, information-gathering question.

It is the choice hypothesis that best states the objective of the research study.
It may be fore than one question, or just one.
More exploration may yield more information that can be used to refine the
research question.
This is fine-tuning the research question.
Investigative questions are questions the researcher must answer to
satisfactorily arrive at a conclusion about the research question.
To formulate them, the researcher takes a general research question and
breaks it into more specific questions about which to gather data.

Investigative questions should be included in the research proposal, for they

guide the development of the research design.
They are the foundation for creating the research data collection instrument.
Measurement questions should be devised and tailored to parallel the
investigative questions.
They are the questions that the respondents answer in a survey; hence they
appear on the questionnaire.

5. Research Design

This is the blueprint for fulfilling objectives and answering questions.

Selecting a design may be complicated by the availability of a large variety of
methods, techniques, procedures, protocols and sampling plans.
One may decide on a secondary data study, case study, survey, experiment, or

The design strategy should indicate the type, purpose, time frame, scope and
environment of the research.
This will determine the data collection design in terms of the investigative
questions, measurement questions and the instrument design; and also the
sampling design in terms of sample unit selection, sample type selection and
the draw sample.

e. Sampling Design

Another step in planning the design is to identify the target population and
select the sample if a census is not desired.
The researcher must determine how many people to interview and who they
will be; what and how many events to observe or how many records to inspect
and which ones.
A sample is a part of the target population, carefully selected to represent that
When researchers undertake sampling studies, they are interested in
estimating one or more population values and or testing one or more statistical
The sampling process must then give every person within the target
population a known nonzero chance of selection if probability sampling is

If there is no feasible alternative, a non probability approach may be used.

f. Resource Allocation and Budgets

Data collection requires substantial resources but it is not the most costly
activity in the budget as it only accounts for a third of the total research
The geographic scope and the number of observations required do affect the
cost, but much of the cost is relatively independent of the size of the data-
gathering effort.
A guide might be that project planning, data gathering and analysis,
interpretation and reporting each share about equally in the budget.

Without budgetary approval, many research efforts are terminated for lack of

Three types of budgets in organizations where research is purchased and cost

containment is crucial are: Rule-of-thumb budgeting (taking a fixed
percentage of some criterion), departmental or functional area budgeting
(Portion of total expenditures), task budgeting (Selects specific research
projects to support on an ad hoc basis).

g. The Research Proposal

This is an activity that incorporates decisions made during early research-

project planning phases of the study including the management-research
question hierarchy and exploration.
The proposal thus incorporates the choices the investigator makes in the
preliminary steps.

A written proposal is often required when a study is being suggested, showing
the project‟s purpose and proposed methods of investigation.

Time, budgets, and other responsibilities and obligations are often spelled out.

Substantial background detail and elaboration of proposed techniques may be

included, if required.

Business research proposals normally range from 1 to 10 pages. But a

research proposal may also be oral.

Proposal Content: Every proposal should include two basic sections – a

statement of the research question and a brief description of research

In a brief memo-type proposal, the research question may be incorporated into

a paragraph that also sets out the management dilemma and management
question and categories of investigative questions.

A second section includes a statement of what will be done – the bare bones
of the research design.
Often research proposals are much more detailed and describe specific
measurement devices that will be used, time and cost budgets, sampling plans,

h. Pilot Testing

The data-gathering phase of the research process typically begins with pilot

A pilot test is conducted to detect weaknesses in design and instrumentation

and provide proxy data for selection of a probability sample.
It should therefore draw subjects from the target population and simulate the
procedures and protocols that have been designated for data collection.

If the study is a survey to be executed by mail, the pilot questionnaire should

be mailed.

The size of the pilot group may range from 25 to 100 subjects, depending on
the method to be tested, but the respondents do not have to be statistically

i. Data Collection

The gathering of data may range from a simple observation at one location to
a grandiose survey of multinational corporations at sites in different parts of
the world.

The method selected will largely determine how the data are collected.
Questionnaires, standardized tests, observational forms, laboratory notes, and
instrument caliboration logs are among the devices used to record raw data.

Data is defined as the facts presented to the researcher from they study‟s
Secondary data have had at least one level of interpretation inserted between
the event and its recording.
Primary data are sought for their proximity to the truth and control over error.
Data are edited to ensure consistency across respondents and to locate
In the case of survey methods, editing reduces errors in the recording,
improves legibility, and clarifies unclear and inappropriate responses.
Edited data are then put into a form that makes analysis possible.
Because it is impractical to place raw date into a report, alphanumeric codes
are used to reduce the responses to a more manageable system for storage and
future processing.
The codes follow carious decision rules that the researcher has devised to
assist with sorting, tabulating and analyzing.

j. Analysis and Interpretation

Raw data are rarely useful in management decision making.

Managers need information
Researchers generate information by analyzing data after its collection.
Data analysis usually involves reducing accumulated data to a manageable
size, developing summaries, looking for patterns, and applying statistical

Scaled responses on questionnaires and experimental instruments often

require the analyst to derive various functions, and relationships among
variables are frequently explored after that.
Further, researchers must interpret these findings in light of the research
question or determine if the results are consistent with their hypotheses and

k. Reporting the Results

Finally, it is necessary to prepare a report and transmit the findings and

recommendations to the manager for the intended purpose of decision making.

The style and organization of the report will differ according to the target
audience, the occasion, and the purpose of the research.
The results of applier research may be communicated in a conference call, a
letter, a written report, or an oral presentation, or sometimes all of them.

Reports should be developed from the client‟s perspective.

Thus, the researcher must accurately assess the manager‟s needs throughout
the research process and incorporate this understanding into the final product,
the research report.
At a minimum, a research report should contain these section:

An executive summary consisting of a synopsis of the problem, findings, and

An overview of the research: the problem‟s background, literature summary,
methods and procedures, conclusions.
A section on implementation strategies for the recommendations.
A technical appendix with all the materials necessary to replicate the project.


Formulation of problems involves locating a problem within a firm or industry

and defining the problem specifically.

a) Research Problem Sources, Formulation, Magnitude and significance.

The following devices and techniques of managerial control when used as

agencies of information can become effective ways of finding problems.

i) Study of records and reports of the firm will enable manager to learn many
of the facts concerning the operating status of the organization
ii) Careful observation of conditions in the firm can bring to light
unsatisfactory situations.
iii) Purposeful conversation with other qualified persons in the firm can
uncover potential problems.
iv) Careful observation and study of the procedures and techniques of the
most efficient and successful firms in the industry.
v) Reading of pertinent published materials in the field in which a business
vi) Use of checklists in evaluating the operations of a firm.
vii) Brainstorming-intensified discussion within a group of interested persons
encourages thinking and development of new ideas about a problem.

Research Problem Formulation:

A problem clearly and accurately stated is a problem that is often well on its way
to being solved.
Before research or fact finding can successfully start, the researcher must know
what the problem is and why a solution is wanted.
The what of a problem is answered by an accurate definition of the situation.
The why can be established by the determination of the uses to which the findings
will be or can be put.
A complete definition of a problem must include both the what and the why


In order to take a general topic or problem and refine it into a researchable

problem, the researcher needs to define certain components of the problem – the
population of interest, the situation , what part of the issue is to be addressed in
the first (or next) study etc.

Only by narrowing the focus (e.g. Population, situation, measurements, etc) can a
researchable problem be derived.
Once the scope of the topic or problem has been narrowed to make it a potentially
researchable problem, we can then determine its importance and feasibility.


It is important that the researcher point out how the solution to the problem or the
answer to the question can influence theory or practice.

He must demonstrate why it is worth the time, effort and expense required to
carry out the proposed research.
Careful formulation and presentation of the implications or possible applications
of knowledge helps to give the project an urgency, justifying its worth.

b) Hypotheses, Assumptions, Limitations and significance of the study

Forming Hypotheses:

After the problem has been precisely defined, the next step is to begin the
process of setting up possible reasons for the difficulty.

These hypotheses or guesses as to the cases or solutions must be carefully

made because they will determine what facts will be sought and what research
procedures will be used.
Intelligent insight and sound judgement are most important in the
establishment of reasonable hypotheses.

The more the researcher knows about the situation, the more likely it is that
the hypotheses will be correct.
The process of setting up hypotheses is facilitated by writing down a list of
possible causes or solutions.
A process of eliminating the least likely ones then follows until a few logical,
intelligent hypotheses which constitute reasonable possibilities remain.

Statistical Hypotheses: Many hypotheses are qualitative in nature, since they

do not lend themselves to numerical measurement (e.g. attitudes of employees
may not be measurable quantitatively).
Rating charts are in common usage by government and large corporations for
supervisors to use to rate employees on attitude performance, promptness, and
other characteristics by letter grades or simply written sentences.
But there are also aptitude tests that are given to employees, which are
quantitatively measurable.
Nevertheless, many hypotheses in business research are qualitative and their
solutions may be made only as a result of value judgements concerning
courses of action to be taken or decisions to be made.

Statistical hypotheses are quantitative in that they are numerically

measurable. Comparative tests are used as a basis of determining definitively
if the hypothesis should be accepted or rejected.

A test of significance is a statistical test in which the significance of the

difference between two groups can be accepted or rejected at a given level of

The statistical hypothesis is normally stated as a null hypothesis.

For example, the hypothesis may be stated: the difference between the two
means is zero.
This hypothesis is capable of definitive testing because it is a negative or null

Another point concerning the statistical hypothesis concerns the acceptance or

rejection of the hypothesis.
Acceptance or rejection are used in preference to proof or disproof.

A proposition is a statement about concepts that may be judged as true or

false if it refers to observable phenomena.
When a proposition is formulated for empirical testing, it is called a

Hypotheses are statements in which we assign variables to cases.

A case is the entity or thing the hypothesis talks about.
The variable is the characteristic, trait or attribute that, in the hypothesis, is
imputed to the case.

If a hypothesis is based on more than one case, then it is a generalization.
Descriptive hypotheses are propositions that typically state the existence, size,
form or distribution of some variable.
Researchers often use a research question rather than a descriptive hypothesis.

Relational hypotheses are statements that describe a relationship between two

variables with respect to some case.
These hypotheses indicate a correlational relationship (unspecified
relationship) and an explanatory or causal, relationship (predictable

Correlational relationships state merely that the variables occur together in

some specified manner without implying that one causes the other.
Such weak claims are often made when we believe there are more basic
causal forces that affect both variables or when we have not developed
enough evidence to claim a stronger linkage.

With explanatory (causal) hypotheses, there is an implication that the

existence of, or a change in, one variables causes or leader to an effect on the
other variable.
The causal variable is typically called the independent variable (IV) and the
other the dependent variable (DV).
But the IV need not be the sole reason for the existence of, or change in, the
In proposing or interpreting causal hypotheses, the researcher must consider
the direction of influence.
In many cases, the direction is obvious from the nature of the variables.

The Role of the Hypothesis:

In research, a hypothesis serves several important functions.

The most important is that it guides the direction of the study.
A frequent problem in research is the proliferation of interesting information.
The second is that it limits what shall be studies and what shall not.
It identifies facts that are relevant and those that are not, thus suggesting
which form of research design is likely to be most appropriate.
Finally, a hypothesis provides a framework for organizing the conclusions
that result.


These are statements of what the researcher believes to be facts but cannot


These are those conditions beyond the control of the researcher that may
place restrictions on the conclusion of the study and their application to other

Significance of study-importance/ benefits of the study. This allows one to

describe explicit benefits that will accrue from the study.
The importance of doing the study now should be emphasized.
Usually, this section is not more than a few paragraphs.
This section also requires one to understand what is most troubling to one‟s

c. Research Questions

This module addresses the purpose of the investigation.

It is here that one lays out exactly what is being planned by the proposed
In a descriptive study, the objectives can be stated as the research question.
These questions flow naturally from the problem statement, giving specific,
concrete and achievable goals.
They should be set off from the flow of the text so they can b found easily.


a. Purpose, Scope and Sources

 A preliminary activity to a research project may be a search of the existing

literature regarding the subject matter.
This is the published information

 At times, the search of the literature will reveal that the subject matter has
already been adequately investigated

 Sometimes the search indicates that a primary research project must be

undertaken since the basic numerical data on which to construct the
project have never been gathered.

 Occasionally facts uncovered in the literature will have the effect of

changing the nature or direction of the planned research.

This section examines recent (or historically significant) research studies,

company data, or industry reports that act as a basis for the proposed study.

b. Literature Research, Locating, reading and selecting the Literature

 A search of the literature naturally begins in a library

 The sources of data may be classified as primary and secondary

 A primary source is defined as a publication that contains the first material

on the subject that is put into print

 Latter publications on the subject in books, magazines, periodicals,

government publications and newspapers constitute secondary sources.

 A primary source is likely to contain more completed and accurate data

than may be found in a secondary source.

 The reason for this is that the primary source contains all the original data
in unaltered form.

 The secondary source may contain only part of the original data and what
is included may have been selected to convey a special meaning.

 It is also possible that the writer of the secondary source material may
have misinterpreted the data in the primary source.

 Researchers, therefore, should strive to get data or information from

primary sources instead of secondary whenever it is feasible to do so.

 Thus, the sources of data can also be classified as published information,

published records and originally collected data.

 Unpublished records, such as accounting, personnel, and sales records of a

firm, as well as unpublished data from trade associations and
governments, may be used

c. Writing and Literature Review

 Begin your discussion of the related literature and relevant secondary data
from a comprehensive perspective.

 Then move to more specific studies that are associated with your problem

 If the problem has a historical background, begin with the earliest


 Do a brief review of the information, not a comprehensive report

 Always refer to the original source

 Emphasize the important results and conclusions of other studies, the

relevant data and trends from previous research, and particular methods or
designs that could be duplicated or should be avoided.

 Discuss how the literature applies to the study you are proposing, show the
weaknesses or faults in the design, discussing how you would avoid
similar problems.

 The literature review may also explain the need for the proposed work to
appraise the shortcomings and informational gaps in secondary data

 Examine the accuracy of secondary sources, the credibility of these

sources, and the appropriateness of earlier studies.

 This exercise often reveals a choice of using a secondary data approach

instead of collecting primary data.

 A comprehensive review lays the basis for this choice.

 Close the literature review section by summarizing the important aspects

of the literature and interpreting them in terms of your problem.

 Refine the problem as necessary in light of your findings.


a. Population

A population is the total collection of elements about which we wish to

make some inferences.
All office workers in the firm compose a population of interest.
All files in the organization define a population of interest.

A census is a count of all the elements in a population.

A population element is the subject on which the measurement is being
It is the unit of study.

A population is any group of individuals that have one or more

characteristics in common that are of interest to the researcher.
The population may be all the individuals of a particular type, or a more
restricted part of that group.

b. Sample: Rational and Relationship to Population

A sample is a small proportion of a population selected for observation

and analysis.
By observing the characteristics of the sample, one can make certain
inferences about the characteristics of the population from which it is
Samples are not selected haphazardly, they are chosen in a systematically
random way, so that chance or the operation of probability can be utilized.
The economic advantages of taking a sample rather than a census are
It is costly to take a census.
The quality of the study is often better with sampling than with a census.
This is due to better interviewing (testing), more thorough investigation of
missing, wrong, or suspicious information, better supervision and better
processing. Sampling also provides much quicker results than does a
Some situations, require sampling, e.g. destructing testing of materials.
Sampling is also the only process possible if the population is infinite.
When the population is small and variable, any sample drawn may not be
representative of the population from which it is drawn.
Then the resulting values we calculate from the sample are incorrect as
estimates of the population values.

c. Sample size and Sampling Errors and Bias

Sample Size

One false belief is that a sample must be large or it is not representative.

There is usually a trade-off between the desirability of a large sample and
the feasibility of a small one.
The ideal sample is large enough to serve as an adequate representation of
the population about which the researcher wished to generalize and small
enough to be selected economically – in terms of subject availability,
expense in both time and money, and complexity of data analysis.
There is no fixed number or percentage of subjects that determines the size
of an adequate sample.
It may depend upon the nature of the population of interest or the data to
be gathered and analyzed.
The absolute size of a sample is much more important than its size
compared with the population.
How large a sample should be is a function of the variation in the
population parameters under study and the estimating precision needed by
the researcher.
It is often stated that samples of 30 or more are to be considered large
samples and those with fewer than 30, small samples.

The basic formula for calculating sample size in probability sampling
assumes an infinite population.
The most important factor in determining the size of a sample needed for
estimating a population parameter is the size of the population variance.
The greater the dispersion or variance in the population, the larger the
sample must be to provide estimation precision.

d. Sampling errors and Bias

The process of sampling makes it possible to draw valid inferences or

generalizations on the basis of careful observation of variables within a
relatively small proportion of the population.

A measure value based upon sample data is a statistic.

A population value inferred from a statistic is a parameter.
The ultimate test of a sample design is how well it represents the
characteristics of the population it purports to represent.

In measurement terms, the sample must be valid.

Validity of a sample depends on two considerations.

 Accuracy – the degree to which bias is absent from the sample.

When the sample is drawn properly, some sample elements underestimate

the parameters and others overestimate them.
Variations in these values counteract each other, this counteraction results
in a sample value that is generally close to the population value.

For these offsetting effects to occur, there must be enough members in the
sample that have been carefully drawn.
An accurate (unbiased) sample is one in which the under estimators and
the over estimators are balanced among the members of the sample.

There is no systematic variance with an accurate sample.

Systematic variance is “the variation in measures due to some known or
unknown influences that cause the scores to lean in one direction more
than another.”

 Precision – a second criterion of a good sample design is precision of

No sample will fully represent its population in all respects.
The numerical descriptors that describe samples may be expected to
differ from those that describe populations because of random
fluctuations inherent in the sampling process.

This is called sampling error and reflects the influences of chance in
drawing the sample members.

Sampling error is what is left after all known sources of systematic

variance have been accounted for.
In theory, sampling error consists of random fluctuations only, although
some unknown systematic variance may be included when too many or
too few sample elements possess a particular characteristic.
Precision is measured by the standard error of estimate, a type of standard
deviation measurement.
The smaller the standard error of estimate, the higher is the precision of
the sample.
The ideal sample design produces a small standard error of estimate.
So more important than size is the care with which the sample is selected.
The ideal method is random selection, letting chance or the laws of
probability determine which members of the population are to be selected.

When random sampling is employed, whether the sample is large or small,

the errors of sampling may be estimated, giving researchers an idea of the
confidence that they may place in their findings.

Basically, there are 2 types of sampling errors:

(i) First type is due to chance factors.

It might happen that in a particular sample one element and not
has been included.
(ii) Second type is due to bias in sample selection arising primarily
from faulty techniques.

 Confidence Level

Suppose that we frequently want to know what percentage of the

population agrees with certain statements.
On each of these occasions we might put such a statement to a sample,
compute the percentage that agree, and take this result as an estimate of
the proportion of the population who agree.
We can devise a number of sampling plans that will carry the insurance
that our estimates will differ from the corresponding true population
figures by say, more than 5 percent on more than, say 10 percent of these
The estimates will be correct within 5 percent points (the margin of error,
or limit of accuracy) 90 percent of the time (the probability or confidence

Sample size summary:
Subject availability and cost factors are legitimate considerations in
determining appropriate sample size.

d. Sampling Techniques

A variety of sampling technique is available.

The one selected depends on the requirements of the project, its
objectives, and funds available.

Representation: The members of a sample are selected wither on a

probability basis or by other means.
Probability sampling is based on the concept of random selection – a
controlled procedure that assures that each population element is given a
known nonzero chance of selection.
Non probability sampling is non-random and subjective.
Each member does not have a known non-zero chance of being included.

Only probability samples provide estimates of precision.

Types of sampling Designs

Representation Basis

Element Selection Probability Non-probability

Unrestricted Simple rand Convenience

Restricted Complex random Purposive
Systematic Judgement
Cluster Quota
Stratified Snowball

Element Selection: When each sample element is drawn individually from

the population at large; it is an unrestricted sample.

Restricted sampling covers all other forms of sampling.

Probability Sampling: The unrestricted, sample random sample is the

simplest form of probability sampling.
Since all probability samples must provide a known nonzero chance of
selection for each population element, the simple random sample is
consider a special case in which each population element has a known and

equal chance of selection. Probability sampling is also known as random

Steps in sampling design

The following questions are normally asked when it comes to securing a

What is the relevant population?
What are the parameters of interest?
What is the sampling frame?
What is the type of sample?
What size sample is needed?
How much will it cost?

A sampling frame is a list of all units from which a sample is drawn.

This has to be adequate so as to avoid discarding parts of the target

Sampling concepts

Random samples can be selected by using a sampling procedure from a

statistical software program, a random number generator, or a table of
random numbers.
A sample mean is a point estimate and the best predictor of the unknown
population mean, u (the arithmetic average of the population).

The means scores form their own distribution, a distribution of none are
perfect duplications because no sample perfectly replicates its population.
We can estimate the interval in which the true u will fall by using any of
the sample means.
This is accomplished by using a formula that computes the standard error
of the mean.
The standard error of the mean measures the standard deviation of the
distribution of sample means.
The standard error of the mean varies directly with the standard deviation
of the population from which it is drawn.
The sample standard deviation is used as an unbiased estimator of the
population standard deviation.
The standard error creates the interval range that brackets the point

Thus u is predicted to be the sample mean + the standard error.

This range may be visualized on a continuum:

_ __ __
x+ standard error x X - standard error

Further, because standard errors have characteristics like other standard

scores, we have 68 percent confidence in this estimate.

That is, one standard error encompasses ± 1 Z or 68 percent of the area

under the normal curve.

Recall that the area under the curve also represents the confidence
estimates that we make about our results.
The combination of the interval range and the degree of confidence creates
the confidence interval.
Since the standard error is calculated using the formula,
σx = σ

Where σ–
x = Standard error of the mean or the standard deviation of all
possible x
σ= Population standard deviation
n = sample size

But since the sample standard deviation is used as an unbiased estimator

of the population standard deviation, the formula becomes,
σx= S

Where S = Standard deviation of the sample and sample size, n, can be

easily computed as:

n= S

Complex Probability sampling

Simple random sampling is often impractical due to the following:

It requires a population list that is often not available.

The design may also be wasteful because it fails to use all the information
about a population.
It may be expensive in time and money.

These problems have led to the development of alternative designs that are
superior to the simple random design in statistical and/or economic
A more efficient sample in a statistical sense is one that provides a given
precision (standard error of the mean) with a smaller sample size.
Four alternative probability sampling approaches are: systematic,
stratified, cluster, and double sampling.

Systematic sampling: This is a versatile form of probability sampling.

In this approach, every uth element in the population is sampled beginning
with a random start of an element in the range of 1 to k.

The major advantage of systematic sampling is its simplicity and

While systematic sampling has some theoretical problems, from a
practical point it is usually treated as a simple random sample.
It is statistically more efficient than a simple random sample when similar
population elements are grouped on the lists.

A concern with systematic sampling is the possible periodicity in the

population that parallels the sampling ratio.
For example, in sampling days of the week, a 1 in 7 ratio would give
biased results.
Another difficulty may arise when there is a monotonic trend in the
population elements.
That is, the population list varies from the smallest to the largest element
or vice versa.

Stratified Sampling: Most populations can be segregated into several

mutually exclusive subpopulations or strata.
The process by which the sample is constrained to include elements from
each of the segments is called stratified random sampling.
For example, university students can be divided by their class level,
school, gender, etc.
After a population is divided into the appropriate strata, a simple random
sample can be taken within each stratum.
The sampling results can then be weighed and combined into appropriate
population estimates.
Three reasons for choosing a stratified random sample:

 To increase a sample‟s statistical efficiency

 To provide adequate data for analyzing the various subpopulation
 To enable different research methods and procedures to be used in
different strata.

The ideal stratification would be based on the primary variable under

Cluster Sampling: In a simple random sample, each population element

is selected individually.
The population can also be divided into groups of elements with some
groups randomly selected for study.
This is cluster sampling.
The differences between stratified and cluster sampling are:

Stratified sampling Cluster Sampling

We divide the population into a few We divide the population into many
subgroups, each with many subgroups, each with a few elements in it.
elements in it. The subgroups are
selected according to some criterion The subgroups are selected according to
that is related to the variables under some criterion of ease or availability in
study. data collection

We try to secure homogeneity We try to secure heterogeneity within

within subgroups and heterogeneity subgroups and homogeneity between
between subgroups. subgroups, but we usually get the reverse.

We randomly choose elements from We randomly choose a number of the

within each subgroup. subgroups, which we then typically study
in toto

When done properly, cluster sampling also provides an unbiased estimate

of population parameters.
Statistical efficiency for cluster sample is usually lower than for simple
random samples chiefly because clusters are usually homogeneous.
But economic efficiency is often great enough to overcome this weakness.

Double sampling:

It may be more convenient or economical to collect some information by

sample and then use this information to select a sub sample for further

This procedure is called double sampling, sequential sampling multiphase

It is usually found with stratified and/or cluster designs.

For instance, one can use a telephone survey or another inexpensive
survey method to discover who would be interested in something and the
degree of their interest.
One might then stratify the interested respondents by degree of interest
and sub-sample among them for intensive interviewing some specific

Non Probability Sampling

These sampling techniques are not based on a theoretical framework and

do not operate from statistical theory.
Consequently, they produce selection bias and non representative samples.

In probability sampling, researchers use a random selection of elements to

reduce or eliminate sampling bias.
This causes us to have substantial confidence that the sample is
representative of the population from which it is drawn.
We can also estimate an interval range within which the population
parameter is expected to fall.
We can thus reduce the chance for sampling error and estimate the range
of probable sampling error present.
With a subjective approach like non-probability sampling, the probability
of selecting population elements is unknown.
There are a variety of ways to choose persons or cases to include in the
sample. When this occurs, there is greater opportunity for bias to enter the
sample selection procedure and to distort the findings of the study.

Some of the reasons for using non probability sampling procedures are:
They satisfactorily meet the sampling objectives.
If there is no need to generalize to a population parameter, then there is
much less concern about whether the sample fully reflects the population.
Non probability sampling is cheap in terms of cost and time, but random
sampling calls for more planning, hence expensive carefully controlled
non probability sampling often seems to give acceptable results.

Even carefully stated random sampling procedures may be subject to

careless application by the people involved.
Thus, the ideal probability sampling may be only partially achieved
because of the human element.
It is also possible that non- probability sampling may be the only feasible
alternative-the total population may not be available for study in certain
In another sense, those who respond are included in a sample may select

In mail surveys, those who respond may not represent a true cross section
of those who receive the questionnaire.

Non Probability Sampling Methods:

(a) Convenience

Non-probability samples that are unrestricted are called convenience

They are the least reliable design but normally the cheapest and easiest
to conduct.
Researchers or field workers have the freedom to choose whomever
they find, thus the name convenience.
Examples include informal pools of friends and neighbors or people
responding to a newspaper‟s invitation for readers to state their
positions on some public issue.
Often you take a convenience sample to test ideas or even to gain ideas
about a subject of interest.
In the early stages of exploratory research, when you are seeking
guidance, you might use this approach.

(b) Purposive Sampling

A non probability sampling that conforms to certain criteria is called

purposive sampling.
There are two major types – judgement sampling and quota sampling.

Judgement Sampling: This occurs when a researcher selects sample

members to conform to some criterion (only those who have directly
experienced the condition).

When used in the early stages of an exploratory study, a judgement

sample is appropriate.
When one wishes to select a biased group for screening purposes, this
sampling method is also a good choice.
Companies often try out new product ideas on their employees.
The rationale is that one would expect the firm‟s employees to be more
favourably disposed toward a new product idea than the public.
If the product does not pass this group, it does not have prospects for
success in the general market.

Quota Sampling: We use this second type to improve

The rationale is that certain relevant characteristics describe the
dimensions of the population.

If a sample has the same distribution on these characteristics, then it is
likely representative of the population regarding other variables on
which we have no control.

In most quota samples, researchers specify more than one control

Each should meet two tests:
It should have a distribution in the population that we can estimate,
and , it should be pertinent to the topic studies.

Precision control is the type of control when all combinations of the

dimensions (factors) are considered.
This gives greater assurance that a sample will be representative of the
population but it is too costly and difficult to carry out with more than
three variables (factors).

When we wish to sue more than three control dimensions, we should

depend on frequency control.
With this form of control, the overall percentage of those with each
characteristic in the sample should match the percentage holding the
same characteristic in the population.
Quota sampling has several weaknesses.
First, the idea that quotas on some variables assume a
representativeness on others is argument by analogy.
It gives no assurance that the sample is representative on the variables
being studied.

Second, often the data used to prove controls may also be dated or
Third, there is also a practical limit on the number of simultaneous
controls that can be applied to ensure precision.

Finally, the choice of subjects is left to field works to make on a

judgemental basis.
They may choose only friendly looking people, people who are
convenient to them, etc.

Despite the problems with quota sampling, it is widely used by opinion

pollsters and marketing and other researchers.
Where predictive validity had been checked (e.g. in elections polls),
quota sampling has been generally satisfactory.

c) Snowball (Network sampling)

This design has found a niche in recent years in applications where

respondents are difficult to identify and are best located through referral
In the initial stage of snowball sampling, individuals are discovered any
may or may not be selected through probability methods.

This group is then used to locate others who possess similar

characteristics and who, in turn, identify others.
The “snowball” gathers subjects as it rolls along.
Various techniques are available for selecting a non probability snowball
with provisions for error identification and statistical testing.

Variations on snowball sampling have been used to study community

relations, insider trading and other applications where respondents are
difficult to identify and contact.


An experiment allows the researcher to alter systematically the variables of

interest and observe what changes follow.

Experimentation is a widely accepted research process in the physical and

natural sciences as they are applied in business.
Its usefulness has often been regarded as quite limited, however, in the study
of the social or human aspects of a firm.
But today various problems that executives must face in such matters as
personnel relations, production control, plant layout, marketing and public
relations can be at least partially solved by experimentation.

Experiments are studies involving intervention by the researcher beyond that

required for measurement.
The usual intervention is to manipulate some variable in a setting and observe
how it affects the subjects being studied.
The researcher manipulates the independent or explanatory variable and then
observes whether the hypothesized dependent variable is affected by the

An experiment consists of the testing of a hypothesis under controlled

conditions, the procedures defined by an experimental design, yielding valid
The controlled conditions usually refer to the control of the procedures – the
scientific method is mandated.

Advantages of experiments:

The researcher‟s ability to manipulate the independent variable increases the

probability that changes in the dependent variable are a function of that
Also, a control group serves as a comparison to assess the existence and
potency of the manipulation.
Contamination from extraneous variables can be controlled more effectively
than in other designs.
The element of human error is reduced to the minimum. No other method can
equal experimentation in objectivity.

Control of the conditions being tested can be exercised more completely than
in any other method of research.
Experimentation is often less time consuming than other techniques.
The convenience and cost of experimentation are superior to other methods.

Replication – repeating an experiment with different subject groups and

conditions until results are definitely determined – leads to the discovery of an
average effect of the independent variable across people, situations and times.
Researchers can use naturally occurring events and field experiments to
reduce subjects „ perceptions of the researcher as a source of intervention or
deviation in their daily lives.

Disadvantages of Experiments:

The artificiality of the laboratory setting

Generalization from non-probability samples can pose problems despite
random assignment.

Despite the low costs of experimentation, many applications of

experimentation far outrun the budgets of other primary data collection
Experimentation is most effectively targeted at problems of the present or
immediate future.

For some research problems, it is not possible to set up and control the
conditions to be tested for the following reasons:
Experimentation is often not possible because groups of persons or individuals
cannot be manipulated, controlled and made to react in conformity with
experimental test requirements (ethical considerations).

Some experiments require very costly equipment that must be operated by

scarce and high-salaried experts.
Some necessary equipment may be relatively immobile because of large size
and/or scarcity of fuel sources.

Experimentation is of limited use in determining opinions of persons, their
motives, reasons, and possible future opinions and actions.

Conducting an Experiment:

In a well-executed experiment, researchers must complete a series of activities

to carryout their craft successfully.
There are seven activities that the researcher must accomplish to make the
endeavour successful:
 Select relevant variables
 Specify the level (s) of the treatment
 Control the experimental design
 Select and assign the subjects
 Pilot-test, revise, and test
 Analyze the data

Selecting Relevant Variables. A hypothesis must be operationalized which is

the transformation of concepts into variables to make them measurable and
subject to testing.
Researcher has to select variables that are the best operation representations of
the original concepts, determine how many variables to test, and select or
design appropriate measures for them.

Specifying the levels of treatment. The treatment levels of the independent

variable are the distinctions the researcher makes between different aspects of
the retreatment condition.
For example, salary might be divided into high, middle and low ranges.
Alternatively, a control group could provide a base level for comparisons.
The control group is composed of subjects who are not exposed to the
independent variable (s).

Controlling the Experimental environment. Extraneous variables need to be

controlled or eliminated because they have potential for distorting the effect of
the treatment on the dependent variable.
Environment control is the holding constant of the physical environment of
the experiment.

These include the introduction of the experiment instructions, arrangement of

the room, time of administration, experimenter‟s contact with the subjects, etc.
These must all be consistent across each administration of the experiment.

When subjects do not know if they are receiving the experimental treatment,
they are said to be blind.
When the experimenters do not know if they are giving the treatment to the
experimental group or to the control, the experiment is double blind.

Choosing the Experimental design. Experimental designs are unique to the
experimental method.
They serve as positional and statistical plans to designate relationships
between experimental treatments and the experimenter‟s observations or
measurement points in the temporal scheme of the scheme.

Selecting and Assigning Subjects. The subjects selected for the experiment
should be representative of the population to which the research wishes to
The procedure for random sampling of experimental subjects is similar in
principle to the selection of respondents for a survey.
The researcher first prepares a sampling frame and then assigns the subjects
for the experiment to groups using a randomization technique.

Since the sampling frame is often small, experiment subjects are recruited,
thus they are a self-selecting sample.
When it is not possible to randomly assign subjects to groups, matching may
be used.
Matching employs a non-probability quota sampling approach.
The object of matching is to have each experimental and control subject
matched on every characteristic used in the research.

Pilot testing, Revising and Testing. The procedures for this stage are similar
to those of other forms of primary data collection.

Analyzing the Data. If adequate planning and pretesting have occurred, the
experimental data will take an order and structure uncommon to surveys and
unstructured observational studies.
Data from experiments are more conveniently arranged because of the levels
of the treatment condition, pretests and posttests, and the group structure.

Researchers have several measurement and instrument options with

Among them are observational techniques and coding schemes; paper-and-
pencil tests, self-report instruments with open or closed questions, scaling
techniques (e.g. liner Scales), physiological measures (e.g. voice pitch


There is always a question about whether the results are true.

Validity refers to whether a measure accomplishes its claims.
Internal validity (do the conclusions we draw about a demonstrated
experimental relationship truly imply cause?) and external validity (does an

observed causal relationship generalize across persons, settings and times?)
are the two major varieties here.
Each type of validity has specific threats we need to guard against.

Internal validity factors cause confusion about whether the experimental

treatment (x) or extraneous factors are the source of observation differences.

External validity is concerned with the interaction of the experimental

treatment with other factors and the resulting impact on abilities to generalize.

Problems of internal validity can be solved by the careful design of

As a rule of thumb, first seek internal validity.
Then try to secure as much external validity as is compatible with the internal
validity requirements by making experimental conditions as similar as
possible to conditions under which the results will apply.


Simple Randomized Design:

The most basic design of experiment is one involving a control group and an
experimental group.
A control group may consist of persons, retail stores, production runs of
product, etc., in a controlled condition, to serve as a base against which to
measure the changes that occur in the experimental group.

The experimental group is identical to the control group, except that the
experimental variable to be tested has been included in it.
Such a design is known as the simple randomized design.
The simple randomized design includes not only the control group-
experimental group test, but also the before-after test.
In the before-after experiment, the same group is tested before and after the
application of the experimental variable.

The measurement of the experimental effect may be accomplished by

computing the significance of the difference in the means of the two groups,
whether the control-experimental plan or the before-after plan is used, to test
the hypothesis: the difference between the means is zero.

Completely randomizes Design:

While the simple randomized design is a two-sample design, since only two
samples are used, the completely randomized design permits testing when
more than two samples are involved.
It is thus a K-sample design.

Three or more treatments may be tested simultaneously for significance of the
difference among their means.
The technique involved is called the analysis of variance (ANOVA).
In this test, the F-distribution is used.
With this distribution, two variances are tested: the variance among the means
against the variance within the data, the latter variance measuring the chance
error, called the experimental error.
The results are stated in terms of probability.

Replicated Randomizes Design:

A change in the basic designs above may be effected by taking sub-sample of

each group or treatment (including the control group).
Each sub-sample would be chosen independently by random methods.
Each sub sample would have the same number of observations.
This design includes a larger total number of observations than otherwise
might be taken.
It permits more coverage of geographical area and improves the control of
extraneous effects by making it more certain that all chance effects are

The analysis of variance (ANOVA) is applied to this design but its

complicated nature is such that it is usually applied using computer analysis.
Most business experiments are conducted by the simple randomized design or
the completely randomizes design.

Factorial design:

The above designs are said to have one basis of classification, whether in
groups or treatments.
Where there are two bases of classification, two classifications are tested, so
that one experiment will include two tests involving the F-distribution.
One test will test for significant difference in the column classification, the
second for significant difference in the raw classification.

A diagram of this design would b identical to the diagram for the replicated
randomized diagram, except that the row classification would replace the
For example, with three different treatments for the column “classification of
fertilizers”, the row classification may be three different grain crops: wheat,
barley, rice.

The factorial design employs the same ANOVA employed for the replicated
randomizes design.

In each case there will be two F-tests: for the column classification and for the
row classification.
This design is called two bases of classification with more than one
observation in each class.

Replicated Randomized Design

Treatment Treatment Treatment


Sample 1 x x x
(replication) x x x
x x x
x x x

Sample 2 x x x
(replication) x x x
x x x
x x x

Sample 3 x x x
(replication) x x x
x x x
x x x

b) Descriptive Studies

Descriptive studies try to discover answers to the questions who, what, when, where and
sometimes, how.
The researcher attempts to describe or define a subject, often by creating a profile of a
group of problems, people or events.

Such studies may involve the collection of data and the creation of a distribution of the
number of times the researcher observes a single event or characteristic (known as a
research variable) or they may involve relating the ineraction of two or more cariables.
Descriptive studies may or may not have the potential for drawing powerful inferences.
Organizations that maintain databases of their employees, customers and suppliers
already have significant data to conduct descriptive studies using internal information
(data mining).

A descriptive study, however, does not explain why an event occurred or why the
variables interact the way they do.
The descriptive study is popular in business research because of its versatility across
They have a broad appeal to the administrator and policy analyst for planning, monitoring
and evaluating purposes.
In this context, how questions address issues such as quantity, cost, efficiency,
effectiveness and adequacy.

In every discipline, but particularly in its early stages of development, purely descriptive
research is indispensable.
The very essence of description, the greater is the likelihood that the units derived from
the description will be useful in subsequent theory building.

Descriptive research is concerned with all of the following: hypothesis formulation and
testing, the analysis of the relationships between non-manipulated variable, and the
development of generalization.
Unlike the experimental method, in which variables are deliberately arranged and
manipulated through the intervention of the researcher, occurred are selected and
This process is described as exposit facto, explanatory observational or causal-
comparative research.

Both descriptive and experimental methods employ careful sampling procedures so that
generalizations may be extended to other individuals, groups, times or settings.
Descriptive research describes „What is‟, describing recording, analyzing and interpreting
conditions that exist.

It involves some type of comparison or contrast and attempts to discover relationships

between existing non- manipulated variables. It is sometimes known as a non
experimental or correlational research.

The expectation is that if variable A is systematically associated with variable B,
prediction of future phenomena may be possible and the results may suggest additional or
competing hypotheses to test.
The method of descriptive research is particularly appropriate in the behavioral sciences
because many of the types of behavior that interest the researcher cannot be arranged in a
realistic setting. Introducing significant variables may be harmful or threatening to
human subjects, raising ethical considerations.
Descriptive research can be divided into two broad categories: quantitative and
qualitative research.

Quantitative Research

This research consists of those studies in which the data concerned can be analyzed in
terms of numbers.

An example of quantitative research might be a study comparing two methods of

teaching reading to first-grade children.
This is because the data used to determine which method is more successful will be a test
The average score of the children receiving one method will be compared to the average
score o children receiving the other method.


The researcher has carefully planned to study including the tests, or other data collection
instruments, to be used.
Each subject is studied in an identical manner and there is little room for human bias to
create problems with the data.

The research is based more directly on its original plans and its results are more readily
analyzed and interpreted.
Examples of quantitative research are the experimental and quasi-experimental designs.

Quasi - Experimental Designs

These designs provide control of when and to whom the measurement is applied, but
because random assignment to experimental and control treatments has not been applied,
the equivalence of the groups is not assured.
There are many quasi-experimental designs but the most common are:

i. Non equivalent Control Group Design

This is a strong and widely used quasi-experimental design.

It differs from the pretest-posttest control group design because the test and control
groups are not randomly assigned.
The design is diagrammed as follows:

0 X 0

0 0

Where an O identifies a measurement or observation activity,

an x represents the introduction of an experimental stimulus to a group.

There are two varieties.

One is the intact equivalent design, in which the membership of the experimental and
control groups is naturally assembled.
For example, we may use different classes in a school, membership in similar clubs or
customers from similar stores.
Ideally, the two groups are as alike as possible.
This design is especially useful when any type of individual selection process would be

The second variation, the self-selected experimental group design, is weaker because
volunteers are recruited to form the experimental group, while non-volunteer subjects are
used for control.
Such a design is likely when subjects believe it would be in their interest to be a subject
in an experiment, e.g an experimental training program.

Comparison of pretest results (O1 – 03) is one indicator of equivalence between test and
control groups.
If pretest observations are similar between groups, there is a basis for the groups‟
comparability and more reason to believe internal validity of the experiment is good.

ii. Separate Sample Pretest - Post test Design

This design is most applicable when we cannot know when and to whom to introduce the
treatment but we can decide when and who to measure.
The basic design is:

R 01 (x)
R X 02

Where R indicates that the group members have been randomly assigned to a group.

The bracketed treatment (x) is irrelevant to the purpose of the study but is shown to
suggest that the experiments cannot control the treatment.
This is not a strong design because several threats to internal validity are not handled
In contrast, it is considered superior to true experiments in external validity.

Its strength results from its being a field experiment in which the samples are usually
drawn from the population to which we wish to generalize our findings.
We would find this design more appropriate if the population is large, or a before-after
measurements is reactive, or there is no way to restrict the application of the treatment.

iii. Group Time Series

A time series introduces repeated observations before and after the treatment and allows
subject to act as their own controls.
The single treatment group design has before-after measurements as the only controls.

There is also a multiple design with two or more comparison groups as well as the
repeated measurements in each treatment group.
The time series format is especially useful where regularly kept records are a natural part
of the environment and are unlikely to be reactive.
This approach is also a good way to study unplanned events in an ex post facto manner.
The internal validity problem for this design is history.
To reduce this risk, we keep a record of possible extraneous factors during the
experiment and attempt to adjust the results to reflect their influence.

c. Qualitative Research Methods

Research can also be qualitative, that is, it can describe events, persons and so forth
scientifically without the use of numerical data.
A study consisting of interviews is an example of qualitative research.
Such a study would carefully and logically analyze the responses and report those
responses that are consistent as well as areas of disagreement.

Exploration relies more heavily on qualitative techniques. It is particularly useful when

researchers lack a clear idea of the problems they will meet during the study.
Through exploration researchers develop concepts more clearly, establish proprieties,
envelop operational definitions, and improve the final research design.

Exploration may also save time and money.

When we consider the scope of qualitative research, several approaches are adaptable for
exploratory investigations of management questions:

i. In-depth interviewing (usually conversational, not structured)

ii. Participant observation (to perceive firsthand what participants in the setting

iii. Films, photographs, and videotape (to capture life of a group under study).

iv. Projective techniques and psychological testing (e.g games or role-playing).

v. Case studies (for an in-depth contextual analysis of a few events or conditions).

vi. Document analysis (to evaluate historical or contemporary confidential or public

records, reports, government documents, and opinions)

vii. Street ethnography.

When these approaches are combined, four exploratory techniques emerge with wide
applicability for the management researcher: secondary data analysis, experience
surveys, focus groups and two-stage designs.

Secondary Data Analysis:

The first step in an exploratory study is a search of the secondary literature.

Studies made by others for their own purposes represent secondary data.
Within secondary data exploration, a researcher should start first with an organization‟s
own data archives.
Data from secondary sources help us decide what needs to be done and can be a rich
source of hypotheses.

A search of secondary sources provides and excellent background and will supply many
good leads if one is creative.

Experience Surveys:

While published data are a valuable resource, seldom is more than a fraction of the
existing knowledge in a field put into writing.
A significant portion of what is know on a topic, while in writing, may be proprietary to a
given organization and thus unavailable to an outside researcher.

Also, internal data archives are rarely well organized, making secondary sources, even
when known, difficult to locate.
Thus, we will profit by seeking information from persons experienced in the area of
study, tapping into their collective memories and experiences.

When we interview persons in an experience survey, we should seek their ideas about
important issues or aspects of the subject and discover what is important across the
subject‟s range.
The investigative format we use should be flexible enough so that we can explore various
avenues that emerge during the interview.

The product of such questioning may be a new hypothesis the discarding of an old one, or
information about the practicality of doing the study.

People who might provide insightful information include:

i. Newcomers to the scene – employees or personnel who may have recently been
transferred to this plant from similar plants.

ii. Marginal or peripheral individuals – persons whose jobs place them on the margin
between contending groups, e.g. first-line supervisors and lead workers.

iii. Individuals in transition – recently promoted employees who have been transferred to
new departments.

iv. Deviants and isolates – those in a given group who hold a different position from the

v. “Pure” cases or cases that show extreme examples of the conditions under study – the
most unproductive departments, the most antagonistic workers, etc.

vi. Those who fit well and those who do not , e.g. in the organization.

vii. Those who represent different positions in the system.

Focus Groups:

The most common application of focus group research continues to be in the consumer
arena (market research).
However, many corporations are using focus group results for diverse exploratory
The topical objective of a focus group is often a new product or product concept.
The output of the session is a list of ideas and behavioral observations with
recommendations of the moderator.
These are often used for later quantitative testing.

A focus group is a panel of people led by a trained moderator who meet for 90 minutes to
2 hours.
The facilitator or moderator uses group dynamics principles to focus or guide the group
in an exchange of ideas, feelings, and experiences on a specific topic.
Typically the focus group panel is made up of 6 to 10 respondents.

Two-Stage Design:

A useful way to design a research study is a two-stage design.

With this approach, exploration becomes a separate first stage with limited objectives:
Clearly defining the research question, and

Developing the research design

In arguing for a two-stage approach, we recognize that much about the problem is not
known but should be known before effort and resources are committed.

The end of an exploratory study comes when the researchers are convinced they have
established the major dimensions of the research task.

The Case Study:

The case study is a way of organizing social data for the purpose of viewing social
It examines a social unit as a whole and the unit may be a person, a family, a social
group, a social institution or a community.
The purpose is to understand the life cycle or an important part of the life cycle of the

The case study probes deeply and analyzes interactions between the factors that explain
present status or that influence change or growth.
It is a longitudinal approach, showing development over a period of time.
The element of typicalness, rather than uniqueness, is the focus of attention.

Thus, the selection of the subject of the case study needs to be done carefully in order to
ensure that it is typical of those to whom we wish to generalize.
Data may be gathered by a wide variety of methods, including:

Observation by the researcher or his informants of physical characteristics, social

quantities or behaviour
Interviews with the subject (s), relatives, friends, teachers, counselors and others.
Questionnaires, opinionnaries, psychological tests and inventories.
Recorded data from newspapers, schools, courts, clinics, government agencies or other

A single case study emphasizes analysis in depth.

Though it may be fruitful in developing hypotheses to be tested it is not directed toward
broad generalizations.
One cannot generalize from a number (N) of 1.
To the extend that a single case may represent an atypical situation, the observation is

But if the objective analysis of an adequate sample of cases leads researchers to

consistent observations of significant variable relationships, hypotheses may be
confirmed, leading to valid generalizations.

Characteristics of the case study

The method may look deceptively simple.

To use it effectively, the researcher must be thoroughly familiar with existing theoretical
knowledge of the field of inquiry, and skillful in isolating the significant variables from
many that are irrelevant.
Subjective bias is a constant threat to objective data-gathering and analysis.
Effects may be wrongly attributed to factors that are merely associated rather than cause-
and-effect related.

Case studies place more emphasis on a full contextual analysis of fewer events or
conditions and their interrelations.
Although hypotheses are often used, the reliance on qualitative data makes support or
rejection more difficult.
An emphasis on detail provides valuable insight for problem solving, evaluation and

This detail is secured from multiple sources of information.

It allows evidence to be verified and avoids missing data.
A single, well-designed case study can provide a major challenge to a theory and provide
a source of new hypotheses and constructs simultaneously.


When the problem has been accurately defined and hypotheses as to the possible causes
or solutions have been established, the researcher is ready to begin compiling a written
list of the specific information necessary to substantiate or reject the hypotheses.
The problem as defined and the hypotheses or questions that must be tested and answered
will determine the exact data that will be sought.
The kind of analysis to which the data are to be subjected in testing the hypotheses must
be related to both the methods of collection of the data and to the hypotheses themselves.
The analysis, while usually statistical in nature, may be qualitative, involving value
judgements and the experience of the analyst rather than the numerical analysis of
quantitative variables.
Research designs can be classified by the approach used to gather primary data.
There are really only two alternatives.
We can observe conditions, behavior, events, people or processes.
Or we can communicate with people about various topics.
Three communication data collection methods are self-administered surveys,
questionnaires and personal interviewing.

a) Surveying, Questionnaire and Interview

The communication approach is questioning or surveying people and recording their
responses for analysis.
The great strength of conducting a survey as a primary data collecting technique is its
Abstract information of all types can be gathered by questioning others including
opinions, attitudes, intentions and expectations.
Questioning is more efficient and economical than observation.
But its major weakness is that the quality and quantity of information secured depends
heavily on the ability and willingness of respondents to cooperate.

Even if respondents do participate, they may not have the knowledge sought or even have
an opinion on the topic of concern.
Respondents may also interpret a question or concept differently from what was intended
by the researcher.
Thus, survey responses should be accepted for what they are – statements by others that
reflect varying degrees of truth.
Surveys may be used for descriptive, explanatory and exploratory purposes.

One of the most common research methods used in the social sciences these days
involves the administration of a questionnaire – either by interview or through the mail –
to a sample of respondents.
A central element in survey research is the standardized questionnaire.
In terms of the fundamental issue of measurement, this kind of questionnaire insures that
exactly the same observation technique is used with each and every respondent in the
A list containing all conceivable items of information that might be helpful in the solution
of the particular problem being studied should be compiled.
The questionnaire should be long enough to contain all the essential questions that must
be asked to obtain the information needed.

Guidelines for Asking Questions:

Variables are often operationalized when researchers ask people questions as a way of
getting data for analysis and interpretation.
Sometimes the questions are written down and given to respondents for completion.
These are called self-administered questionnaires.

۰Questions and Statements.

Though the term questionnaire suggests a collection of questions, an examination of a
typical questionnaire will probably reveal as many statements as questions.

Rensis Likert has greatly formalized the procedure of asking respondents whether they
agree or disagree with a statement.
He created the Likert scale, a format in which respondents are asked to strongly agree,
agree, disagree, or strongly disagree, or perhaps strongly approve, approve, etc.
Using both questions and statements in a given questionnaire gives one more flexibility
in the design of items and can make the questionnaire more interesting.

۰ Open-Ended and Closed-Ended Questions

The open-form or unrestricted questionnaire calls for a free response in the respondent‟s
own words.
Researchers may ask open-ended questions, in which case the respondent is asked to
provide her own answer to the question.
No clues are given and this form probably provides for greater depth of response.
But returns are often meager and the open-form item can sometimes be difficult to
interpret, tabulate, and summarize in the research report.
Researchers may also ask closed-ended questions, where the respondent is asked to select
an answer from among a list provided by the researcher.
Questionnaires that call for short, tick-mark responses are known as the restricted or
closed-form type.
Here you mark a yes or no, write a short response or tick an item from a list of suggested
But providing an “Other” category permits respondents to indicate what might be their
most important reason, one that the questionnaire builder had not anticipated.
Closed-ended questions are very popular because they provide a greater uniformity of
responses and are more easily processed.
Many questionnaires include both open- and closed-type items.

۰ Make Items Clear

Questionnaire items should be clear and unambiguous because the possibilities for
misunderstanding are endless.

۰ Avoid Double-Barreled Questions
Frequently, researchers ask respondents for a single answer to a combination of
This can result in a variety of answers from the respondents.

۰ Respondents Must be Competent to Answer

In asking respondents to provide information, you should continually ask yourself
whether they are able to do so reliably.

۰ Respondents Must Be Willing to Answer

Often, we would like to learn things from people that they are unwilling to share with us.

۰ Questions Should Be Relevant

Similarly, questions asked in a questionnaire should be relevant to most respondents.
When attitudes are requested on a topic that few respondents have thought about or really
care about, the results are not likely to be very useful.

۰ Short Items Are Best

In general, you should assume that respondents will read items quickly and give quick
Therefore, you should provide clear, short items that will not be misinterpreted under
those conditions.
Hence, avoid long and complicated items because respondents are often unwilling to
study an item in order to understand it.

۰ Avoid Negative Items

The appearance of a negation in a questionnaire item paves the way for easy

۰ Avoid Biased Items and Terms

The meaning of someone‟s response to a question depends in large part on its wording.

Some questions seem to encourage particular responses more than other questions do.
Questions that encourage respondents to answer in a particular way are biased.
Be generally wary of the social desirability of questions and answers.
Whenever you ask people for information, they answer through a filter of what will make
them look good.
Other guidelines to improve the quality of questions are:
۰ Define or qualify terms that could easily be misinterpreted.
۰ Be careful in using descriptive adjectives and adverbs that have no agreed-upon
۰ Beware of double negatives.
۰ Be careful of inadequate alternatives.
۰ Avoid the double-barreled question.
۰Underline a word if you wish to indicate special emphasis.
۰ When asking for ratings or comparisons, a point of reference is necessary.
۰ Avoid unwanted assumptions.
۰ Phrase questions so that they are appropriate for all respondents.
۰ Design questions that will give a complete response.
۰ Provide for the systematic quantification of responses.
۰ Consider the possibility of classifying the responses yourself, rather than having the
respondent choose categories.

۰ General Questionnaire Format

As a general rule, the questionnaire should be spread out and uncluttered.
An improperly laid out questionnaire can lead respondents to miss questions, confuse
them about the nature of the data desired, and even lead them to throw the questionnaire

۰ The Self-Administered Questionnaire

The self-administered questionnaire has become ubiquitous in modern living.
Often a short questionnaire is left to be completed by the respondent in a convenient
location or is packed with a product.

The mail survey is a self-administered questionnaire delivered by the Postal Service, fax
or a courier service.
Other delivery modalities include computer-delivered and intercept studies.
Computer-delivered self-administered questionnaires use organizational intranets, the
Internet or online services to reach their respondents.
Intercept studies may use a traditional questionnaire or a computerized instrument in a
predetermined environment without interviewer assistance.

۰ Mail Surveys
Mail surveys typically cost less than personal interviews.
Using mail can also enable us to contact respondents who might otherwise be
inaccessible, e.g. CEOs.
In a mail survey, the respondent can take more time to collect facts, talk with others, or
consider replies at length than is possible with the telephone, personal interview or
intercept studies.
Mail surveys are perceived as more impersonal, providing more anonymity than the other
communication modes.
Its major weakness is non-response error.
A high percentage of those who reply to a given survey have usually replied to others,
while a large share of those who do not respond are habitual non-respondents.
In general, we usually know nothing about how those who answer differ from those who
do not answer.

۰ Preparing and Administering the Questionnaire

 Preparing the Questionnaire
Get all the help you can in planning and constructing your questionnaire.
Study other questionnaires and submit your items for criticism to other people, especially
those who have had experience in questionnaire construction.
Revise the instrument based upon the feedback, if any.

 Pretesting the Questionnaire
No matter how carefully you design a data-collection instrument such as a questionnaire,
there is always the possibility of error.
The surest protection against such errors is to pretest the questionnaire in full and/or in
It is not usually essential that the pretest subjects comprise a representative sample,
although you should use people to whom the questionnaire is at least relevant.
This small group of people should be similar to those who will be used in the study.
Give the questionnaire to 10 of such people.
It is better to ask people to complete the questionnaire rather than reading through it
looking for errors.
Revise the instrument accordingly using the feedback obtained during the pretest phase.

 Administering the Questionnaire

Choose the respondents carefully.
It is important that questionnaires be sent only to those who possess the desired
information and are likely to be sufficiently interested to respond conscientiously and
A better return is obtained when the original request is sent to the administrative head of
an organization rather than directly to the person who has the desired information.
It is likely that when a superior officer gives a staff member a questionnaire to fill out,
there is an implied feeling of obligation.
Try to get endorsement as recipients are more likely to answer if a person, organization,
or institution of prestige has endorsed the project.
If the desired information is delicate or intimate in nature, consider the possibility of
providing for anonymous responses.
The anonymous instrument is most likely to produce objective and honest responses.
If identification is needed, it is essential to convince the respondents that their responses
will be held in strict confidence and that their answers will not jeopardize the status and
security of their position or their organization.

Be sure to include a courteous, carefully constructed cover letter to explain the purpose of
the study.
The cover letter should assure the respondent that all information will be held in strict
confidence or that the questionnaire is anonymous.
Since recipients are often slow to return completed questionnaires, to increase the number
of returns may require a vigorous follow-up procedure.
The inclusion of a stamped, self-addressed return envelope encourages response because
it simplifies questionnaire return.

۰ Personal Interviewing:
A personal interview ( i.e. face to face ) is a two-way conversation initiated by an
interviewer to obtain information from a respondent.
If the interview is carried off successfully, it is an excellent data collection technique.
The greatest value lies in the depth of information and detail that can be secured.
It far exceeds the information secured from telephone and self-administered studies via
intercepts ( e.g. in shopping malls).
Interviewers can note conditions of the interview, probe with additional questions, and
gather supplemental information through observation.
But interviewing is a costly method, in both time and money.
Many people have become reluctant to talk with strangers or permit visits in their homes.
Interviewers are also reluctant to visit unfamiliar neighborhoods alone, especially for
evening interviewing.
Interviewers can also influence respondents or ask questions in ways that bias the results.

۰ Requirements for Success

Three broad conditions must be met to have a successful personal interview.
i. Availability of the needed information from the respondent.
ii. An understanding by the respondent of his role.
iii. Adequate motivation by the respondent to cooperate.

۰ Increasing Respondent‟s Receptiveness
The first goal in an interview is to establish a friendly relationship with the respondent.
Three factors will help with respondent receptiveness.
The respondents must,
i. believe the experience will be pleasant and satisfying,
ii. think answering the survey is an important and worthwhile use of their time, and
iii. have any mental reservations satisfied.
Whether the experience will be pleasant and satisfying depends heavily on the

۰ Gathering the Data

The interview should center on a prearranged questioning sequence ( a structured
questioning procedure).
The interviewer should follow the exact wording of the questions, ask them in the order
presented, and ask every question that is specified.
When questions are misunderstood or misinterpreted, they should be repeated.
A difficult task in interviewing is to make certain the answers adequately satisfy the
question‟s objectives.
To do this, the interviewer must learn the objectives of each question beforehand.
The technique of stimulating respondents to answer more fully and relevantly is termed

۰ Interview Problems
Non response Error. In personal interviews, non response error occurs when you cannot
locate the person whom You are supposed to study or when you are unsuccessful in
encouraging the person to participate.
The most reliable solution to non response problems is to make call backs.

Response Error. When the data reported differ from the actual data, response error
Errors can be made in the processing and tabulating of data.
Errors occur when the respondent fails to report fully and accurately.
Interviewer error is also a major source of response bias. Throughout the interview, there
are many points where the inter viewer‟s control of the process can affect the quality of
the data. The most insidious form of interviewer error is cheating.
Falsification of an answer to an overlooked question is perceived as an easy solution to
counter balance the incomplete data.
An interviewer can also distort the results of any survey by inappropriate suggestions,
word emphasis, tone of voice and question rephrasing.
Older interviewers are also often seen as authority figures by young respondents, who
modify their responses accordingly.

Costs. Interviewing is costly, and these costs continue to rise. Much of the cost results
from the substantial interviewer time taken up with administrative and travel tasks
(respondents are usually geographical scattered).
Repeated contracts are also expensive.


Much of what we know comes from observation.

One of the main virtues of observation is that the human element can ordinarily be
reduced to the minimum.
Greater objectivity usually can be obtained by these techniques than is possible in most
questionnaire surveys.
Mechanical measuring and recording devices can be relied on rather extensively.
Despite the utilization of human senses and judgement, the systematic procedures
followed place less reliance on the human factor in the investigator or the persons being

Observation qualifies as scientific inquiry when it is specifically designated to answer a
research question, is systematically planned and executed, uses proper controls, and
provides a reliable and valid account of what happened.
The versatility of observation makes it an indispensable primary source method and a
supplement for other methods.
Besides collecting data visually, observation involves listening, reading, smelling and
Observation includes the full range of monitoring behavioral and nonbehavioral activities
and conditions, which can be classified roughly as follows:

i. Nonbehavioral Observation
a) Record Analysis
This is a most prevalent form of observation research.
It may involve historical or current records and public or private records.
They may be written, printed, sound-recorded, photographed or videotaped.
Historical statistical data are often the only sources used for a study.
Analysis of current financial records and economic data also provides a major data source
for studies.
Other examples of this type of observation are the content analysis of competitive
advertising and the analysis of personnel records.

b) Physical Condition Analysis

This is typified by store audits of merchandise available, studies of plant safety
compliance, analysis of inventory conditions, and analysis of financial statements.

c) Physical Process Analysis

Process or activity analysis includes time/motion studies of manufacturing processes,
analysis of traffic flows in a distribution system, paperwork flows in an office, and
financial flows in the banking system.

ii. Behavioral Analysis
The observational study of persons can be classified into four major categories.
a) Nonverbal Analysis
This is the most prevalent of these and includes body movement, motor expressions, and
even exchanged glances.
b) Linguistic Analysis
One simple type of linguistic behavior is the tally of “ahs” or other annoying sounds or
words a lecturer makes or uses during a class.
More serious applications are the study of a sales presentation‟s content or the study of
what, how and how much information is conveyed in a training situation.
A third form involves interaction processes that occur between two people or in small
c) Extralinguistic Analysis
Behavior may also be analyzed on an extralinguistic level.
Sometimes extralinguistic behavior is as important a means of communication as
linguistic behavior.
Four dimensions of extralinguistic activity are vocal, including pitch, loudness, and
timbre; temporal, including the rate of speaking, duration of utterance, and rhythm;
interaction, including the tendencies to interrupt, dominate or inhibit; and, verbal stylistic,
including vocabulary and pronunciation peculiarities, dialect, and characteristic

d) Spatial Analysis
This fourth type of behavior study involves spatial relationships, especially how a person
relates physically to others.
One form of this study, proxemics, concerns how people organize the territory about
them and how they maintain discrete distances between themselves and others.
Often in a study, the researcher will be interested in two or more of these types of
information and will require more than one observer.


Observation is the only method available to gather certain types of information, e.g. study
of records.
Another value of observation is that we can collect the original data at the time they occur
without depending on reports by others.
A third strength is that we can secure information that most participants would ignore
either because it is so common and expected or because it is not seen as relevant.
The fourth advantage of observation is that it alone can capture the whole event as it
occurs in its natural environment.
Finally, subjects seem to accept an observational intrusion better than they respond to

The observer normally must be at the scene of the event when it takes place, yet it is
often impossible to predict where and when the event will occur.
Observation is a slow and expensive process that requires either human observers or
costly surveillance equipment.
Observation‟s most reliable results are restricted to information that can be learned by
overt action or surface indicators but to go below the surface demands that the observer
make inferences.
The research environment is more likely suited to subjective assessment and recording of
data than to controls and quantification of events.
Observation is limited as a way to learn about the past.
It is similarly limited as a method by which to learn what is going on in the present at
some distant place.
It is also difficult to gather information on such topics as intentions, attitudes, opinions,
or preferences.


The relationship between observer and subject may be viewed from three perspectives:
(1) whether the observation is direct or indirect, (2) whether the observer‟s presence is
known or unknown to the subject, and (3) what role the observer plays.

Directness of Observation:
Direct observation occurs when the observer is physically present and personally
monitors what takes place.
This approach is very flexible because it allows the observer to react to and report subtle
aspects of events and behavior as they occur.
He is also free to shift places, change the focus of the observation, or concentrate on
unexpected events if they occur.
But this approach can overload observers‟ perception circuits as events move quickly.
Also, observer fatigue, boredom, and distracting events can reduce the accuracy and
completeness of observation.
Indirect observation occurs when the recording is done by mechanical, photographic, or
electronic means.
Indirect is less flexible, biasing and less erratic in accuracy than direct observation.
Its other advantage is that the permanent record can be reanalyzed to include many
different aspects of an event.

This concerns whether the presence of the observer should be known to the subjects.
Observers use concealment to shield themselves from the object of their observation.
When the observer is known, there is a risk of atypical activity by the subjects.
Often, technical means are used such as one-way mirrors, hidden cameras or
These methods reduce the risk of observer bias but bring up a question of ethics since
hidden observation is a form of spying.

A modified approach involves partial concealment whereby the presence of the observer
is not concealed, but the objectives and subject of interest are.

This concerns whether the observer should participate in the situation while observing.
A more involved arrangement, participant observation, exists when the observer enters
the social setting and acts as both an observer and a participant.
While reducing the potential for bias, this again raises an ethical issue.
Participant observation makes a dual demand on the observer – recording can interfere
with participation, and participation can interfere with observation.


The Type of Study:
Observation is found in almost all research studies, at least at the exploratory stage.
Such data collection is known as simple observation.
Its practice is not standardized, as one would expect, because of the discovery nature of
exploratory research.
If the study is to be something other than exploratory, systematic observation employs
standardized procedures, trained observers, schedules for recording, and other devices for
the observer that mirror the scientific procedures of other primary data methods.

Observational studies can be classified by the degree of structure in the environmental

setting and the amount of structure imposed on the environment by the researcher.
Researcher Environment
1. Completely unstructured Natural setting
2. Unstructured Laboratory
3. Structured Natural setting
4. Completely structured Laboratory

The researcher conducting a class 1, completely unstructured, study would be in a natural
setting or field setting endeavoring to adapt to the culture, e.g. ethnographic study.
With other purposes in mind, business researchers may use this type of study for
hypothesis generation.
Class 4 studies – completely structured research – are at the opposite end of the
continuum from completely unstructured field investigations.
The research purpose of class 4 studies is to test hypotheses; therefore, a definitive plan
for observing specific, operationalized behavior is known in advance.
This requires a measuring instrument, called an observational checklist, analogous to a
Many team-building, decision-making, and assessment center studies follow this
structural pattern.
The two middle classes emphasize the best characteristics of either researcher-imposed
controls or the setting.
In class 2, the researcher uses the facilities of a laboratory – videotape recording, two-
way mirrors, props, and stage sets – to introduce more control into the environment while
simultaneously reducing the time needed for observation.
In contrast, a class 3 study takes advantage of a structured observational instrument in a
natural setting.

Content Specification:
Specific conditions, events, or activities that we want to observe determine the
observational reporting system (and correspond to measurement questions).
To specify the observation content, we should include both the major variables of interest
and any other variables that may affect them.
From this cataloging, we then select those items we plan to observe.
For each variable chosen, we must provide an operational definition if there is any
question of concept ambiguity or special meanings.
Observation may be at either a factual or an inferential level.

Observer Training:
Observer trials with the instrument and sample videotapes should be used until a high
degree of reliability is apparent in their observations.
Data Collection:
The data collection plan specifies the details of the task.
In essence it answers the questions who, what, when, and how.
Who Are the Targets?- What qualifies a subject to be observed?
What?- The characteristics of the observation must be set as sampling elements and units
of analysis.
In event sampling, the researcher records selected behavior that answers the investigative
In time sampling, the researcher must choose among a time-point sample (fixed points for
a specified length), continuous real-time measurement (behavior or the elapsed time of
the behavior), or a time-interval sample (every behavior in real time counted only once
during the interval).
When?- Is the time of the study important, or can any time be used?
How? – Will the data be directly observed? How will the results be recorded for later


Raw data are rarely useful in management decision making.
Managers need information.
Researchers generate information by analyzing data after its collection.
Data analysis usually involves reducing accumulated data to a manageable size,
developing summaries, looking for patterns, and applying statistical techniques.
Scaled responses on questionnaires and experimental instruments often require the
analyst to derive various functions, and relationships among variables are frequently
explored after that.
Conceptualization is the refinement and specification of abstract concepts.
Operationalization is the development of specific research procedures (operations) that
will result in empirical observations representing those concepts in the real world.
An attribute is a characteristic or quality of something.
Variables are logical sets of attributes.
Thus, gender is a variable composed of the attributes female or male.
The conceptualization and operationalization processes can be seen as the specification of
variables and the attributes composing them.

Characteristics of Variables. Every variable must have two important qualities.

First, the attributes composing it should be exhaustive ( i.e. one must be able to classify
every observation in terms of one of the attributes composing the variable).
Second, attributes composing a variable must be mutually exclusive ( i.e. one must be
able to classify every observation in terms of one and only one attribute).
The attributes composing variables may represent different levels of measurement.

Levels of Measurement. There are four levels of measurement or types of data: nominal,
ordinal, interval and ratio.

Nominal Measures. Variables whose attributes have only the characteristics of
exhaustiveness and mutual exclusiveness are nominal measures.
Examples include gender, religious affiliation, political party affiliation, birthplace,
college major, and hair color.
Nominal measures merely offer names or labels for characteristics.
Ordinal Measures. Variables with attributes we can logically rank-order are ordinal
The different attributes represent relatively more or less of the variable.
Variables of this type are social class, conservatism, alienation, prejudice, intellectual
sophistication, etc.

Interval Measures. For the attributes composing some variables, the actual distance
separating those attributes does have meaning.
Such variables are interval measures.
For these, the logical distance between attributes can be expressed in meaningful standard
The zero point is arbitrary, e.g. in Fahrenheit and Celsius temperature scales.
Another example of these measures are constructed measures such as standardized
intelligence tests, e.g. IQ scores.

Ratio Measures. Most interval measures also meet the requirements for ratio measures.
In ratio measures, the attributes composing a variable are based on a true zero point.
Examples include age, length of residence in a given place, number of organizations
belonged to, number of times attending church during a particular period of time, number
of times married, number of American friends, etc.

Implications of Levels of Measurement. Certain quantitative analysis techniques require

variables that meet certain minimum levels of measurement.
To the extent that the variables to be examined in your research project are limited to a
particular level of measurement, you should plan your analytical techniques accordingly.

More precisely, you should anticipate drawing research conclusions appropriate to the
levels of measurement used in your variables.

Univariate Analysis: Univariate analysis is the examination of the distribution of cases on

only one variable at a time.
Univariate analyses describe the units of analysis of study and, if they are a sample drawn
from some larger population, allow us to make descriptive inferences about the larger
Bivariate and multivariate analyses are aimed primarily at explanation ( i.e. subgroups).

Continuous and Discrete Variables:

There are two types of variables – continuous and discrete.
Age is a continuous, ratio variable.
It increases steadily in tiny fractions instead of jumping from category to category as does
a discrete variable such as gender or military rank.
If discrete variables were being analyzed – a nominal or ordinal variable, for example –
then some of the techniques cannot be applicable.
Medians and means should be calculated only for interval and ratio data, respectively.

a) Processing Data

Once the data begin to flow in, attention turns to data analysis.
The first step is data preparation, which includes editing, coding and data entry.
These activities ensure the accuracy of the data and its conversion from raw form to
reduced and classified forms that are more appropriate for analysis.

The customary first step in analysis is to edit the raw data.
Editing detects errors and omissions, corrects them when possible, and certifies that
minimum data quality standards are achieved.

The editor‟s purpose is to guarantee that data are (1) accurate, (2) consistent with other
information, (3) uniformly entered, (4) complete, and (5) arranged to simplify coding and
Field Editing. During the stress of data collection, the researcher often uses ad hoc
abbreviations and special symbols.
Soon after the interview, experiment, or observation the investigator should review the
reporting forms while the memory is still fresh.
Central Editing. At this point, the data should get a thorough editing.
For a small study, the use of a single editor produces maximum consistency.
In large studies, the tasks may be broken down so each editor can deal with one entire

This involves assigning numbers or other symbols to answers so the responses can be
grouped into a limited number of classes or categories.
The classifying of data into limited categories sacrifices some data detail but is necessary
for efficient analysis.
For example, M and F could be used as codes for Male and Female.
If the coding system uses a combination of numbers and symbols, the code is
When numbers are used exclusively, the code is numeric.
Coding helps the researcher to reduce several thousand replies to a few categories
containing the critical information needed for analysis.

Coding Rules. The categories should be (1) appropriate to the research problem and
purpose, (2) exhaustive (i.e. adequate list of alternatives), (3) mutually exclusive, and (4)
derived from one classification principle (i.e. single dimension or concept).
The “don‟t know” (DK) response presents special problems for data preparation.
When it is the most frequent response received, it can be of major concern.

Data Entry:
This converts information gathered by secondary or primary methods to a medium for
viewing and manipulation.
Keyboard entry remains a mainstay for researchers who need to create a data file
immediately and store it in a minimal space on a variety of media.
Optical scanning instruments have improved efficiency.
The cost of technology has allowed most researchers access to desktop or portable
computers or a terminal linked to a large computer.
This technology enables computer-assisted telephone or personal interviews to be
completed with answers entered directly for processing, eliminating intermediate steps
and errors.
Voice recognition and response systems are providing some interesting alternatives for
the telephone interviewer.
Bar code readers are used in several applications: at point-of-sale terminals, for inventory
control, for product and brand tracking, etc.
This simplifies the interviewer‟s role as data recorder since the data are recorded in a
small, lightweight unit for translation later.

Data Entry Formats. A full-screen editor, where an entire data file can be edited or
browsed, is a viable means of data entry for statistical packages like SPSS, SAS or
The same software makes accessing data from databases, spreadsheets, data warehouses
or data marts effortless.
For large projects, database programs serve as valuable data entry devices.
A database is a collection of data organized for computerized retrieval.
Programs allow users to define data fields and link files so storage, retrieval, and
updating are simplified.
Descriptive statistics and tables are readily generated from within the base.
Spreadsheets are a specialized type of database for data that need organizing, tabulating
and simple statistics.

Data entry on a spreadsheet uses numbered rows and letter columns with a matrix of
thousands of cells into which an entry may be placed.
Spreadsheets allow you to type numbers, formulas, and text into appropriate cells.
A data warehouse organizes large volumes of data into categories to facilitate retrieval,
interpretation, and sorting by end-users.
It provides an accessible archive.
Data marts are intermediate storage facilities that compile locally required information.

Descriptive Statistics:
Descriptive statistics is a method for presenting quantitative descriptions in a manageable
Sometimes we want to describe single variables, and sometimes we want to describe the
associations that connect one variable with another.
The following are some of the ways to do this.

Data Reduction. Scientific research often involves collecting large masses of data.
Thus, much scientific analysis involves the reduction of data from unmanageable details
to manageable summaries.
Some of the ways of summarizing univariate data are averages ( e.g. mode, median,
mean) and measures of dispersion ( e.g. range, standard deviation, etc.).
It‟s also possible to summarize the associations among variables.
If interval or ratio variables are being associated, one appropriate measure of association
is Pearson‟s product-moment correlation, r.
r is based on guessing the value of one variable by knowing the other.

b) Qualitative Data Analysis

This is the second step which is made up of exploring, displaying, and examining data
and involves breaking down, inspecting, and rearranging data to start the search for
meaningful descriptions, patterns, and relationships.
Then data mining is used to extract patterns and predictive trends from databases.

Data mining combines exploration and discovery with confirmatory analysis.
It also bridges primary and secondary types of data.

An Exploratory Data Analysis Approach:

Exploratory data analysis (EDA) is a data analysis perspective and set of techniques.
In exploratory data analysis, the data guide the choice of analysis – or a revision of the
planned analysis – rather than the analysis presuming to overlay its structure on the data
without the benefit of the analyst‟s scrutiny.
The flexibility to respond to the patterns revealed by successive iterations in the
discovery process is an important attribute of this approach.
By comparison, confirmatory data analysis occupies a position closer to classical
statistical inference in its use of significance and confidence.
But confirmatory analysis may also differ from traditional practices by using information
from a closely related data set or by validating findings through the gathering and
analyzing of new data.
Exploratory data analysis is the first step in the search for evidence, without which
confirmatory analysis has nothing to evaluate.
A major contribution of the exploratory approach lies in the emphasis on visual
representations and graphical techniques over summary statistics.

1. Frequency Tables, Bar Charts and Pie Charts

Several useful techniques for displaying data are not new to EDA.
They are essential to any preliminary examination of the data.
For example, a frequency table is a simple device for arraying data.
It arrays data from the lowest value to the highest, with columns for percent, percent
adjusted for missing values, and cumulative percent.
The same information can be displayed using a bar chart and a pie chart.
In these graphic formats, the values and percentages are more readily understood, and
visualization of the categories and their relative sizes is improved.
When the variable of interest is measured on an interval-ratio and is one with many
potential values, these techniques are not particularly informative.

2. Histograms
The histogram is a conventional solution for the display of interval-ratio data.
Histograms are used when it is possible to group the variable‟s values into intervals.
Histograms are constructed with bars where each value occupies an equal amount of area
within the enclosed area.
Data analysts find histograms useful for (1) displaying all intervals in a distribution, even
those without observed values, and (2) examining the shape of the distribution for
skewness, kurtosis and the modal pattern.

3. Stem-and-Leaf Displays
The stem-and-leaf display is an EDA technique that is closely related to the histogram.
In contrast to histograms, which lose information by grouping data values into intervals,
the stem-and-leaf presents actual data values that can be inspected directly without the
use of enclosed bars or asteriks as the representation medium.
This feature reveals the distribution of values within the interval and preserves their rank
order for finding the median, quartiles and other summary statistics.
Visualization is the second advantage of stem-and-leaf displays.
The range of values is apparent at a glance, and both shape and spread impressions are
Patterns in the data are easily observed such as gaps where no values exist, areas where
values are clustered, or outlying values that differ from the main body of the data.
To develop a stem-and-leaf display for a given data set, the first digits of each data item
are arranged to the left of a vertical line.
Next, we pass through the percentages given in the order they were recorded and place
the last digit for each item (the unit position) to the right of the vertical line.
The digit to the right of the decimal point is ignored.
The last digit for each item is placed on the horizontal row corresponding to its first digit
Then rank-order the digits in each row, creating the stem-and-leaf display.
Each line or row in the display is referred to as the stem, and each piece of information
on the stem is called a leaf.

The first line or row could be
5 | 4556667888899
This means there are 13 items in the data set whose first digit is five:
When the stem-and-leaf display is turned upright ( rotated 90 degrees to the left), the
shape is the same as that of the histogram with the same data.

4. Boxplots
The boxplot, or box-and-whisker plot, is another technique used frequently in EDA.
A boxplot reduces the detail of the stem-and-leaf display and provides a different visual
image of the distribution‟s location, spread, shape, tail length and outliers.
Boxplots are extensions of the five-number summary of a distribution which consists of
the median, upper and lower quartiles, and the largest and smallest observations.
The basic ingredients of the plot are the (1) rectangular plot that encompasses 50 percent
of the data values, (2) a center line marking the median and going through the width of
the box, (3) the edges of the box, called hinges, and (4) the whiskers that extend from the
right and left hinges to the largest and smallest values.
These values may be found within 1.5 times the interquartile range (IQR) from either
edge of the box.
With the five-number summary, we have the basis for a skeletal plot: minimum, lower
hinge, median, upper hinge and maximum.
Beginning with the box, the ends are drawn using the lower and upper quartile (hinge)
The median is drawn in, then the IQR is calculated and from this, we can locate the lower
and upper fences.
Next, the smallest and largest data values from the distribution within the fences are used
to determine the whisker length.
Outliers are data points that exceed + 1.5 IQRs of a boxplot‟s hinges.
Data values for the outliers are added, and identifiers may be provided for interesting

Data Transformations
When data has departed from normality, it poses special problems in preliminary
Transformation is one solution to this problem.
Transformation is the reexpression of data on a new scale using a single mathematical
function for each data point.
The fact that data collected on one scale are found to depart from the assumptions of
normality and constant variance does not preclude reexpressing them on another scale.

We transform data for several reasons:

(i) to improve interpretation and compatibility with other data sets;
(ii) to enhance symmetry and stabilize spread; and,
(iii) to improve linear relationships between and among variables.
We improve interpretation when we find alternate ways to understand the data and
discover patterns or relationships that may not have been revealed on the original scales.
A standard score, or Z score, may be calculated to improve compatibility among
variables that come from different scales and require comparison.
Z scores convey distance in standard deviation units with a mean of 0 and a standard
deviation of 1.
This is accomplished by converting the raw score, Xi, to
Z = Xi – X
Conversion of centimeters to inches, stones to pounds, liters to gallons or Celsius to
Fahrenheit are examples of linear conversions that change the scale but do not change
symmetry or spread.
Nonlinear transformations are often needed to satisfy the other two reasons for
reexpressing data.
Normality and constancy of variance are important assumptions for many parametric
statistical techniques.
A transformation to reduce skewness and stabilize variance makes it possible to use
various confirmatory techniques without violating their assumptions.

Transformations are defined with power, p.
The most frequently used power transformations are given below:
Power Transformation
3 Cube
2 Square
1 No charge: existing data
½ Square root
0 Logarithm ( usually Log10)
-1/2 Reciprocal root
-1 Reciprocal
-2 Reciprocal square
-3 Reciprocal cube
When researchers communicate their findings to management, the advantages of
reexpression must be balanced against pragmatism: Some transformed scales have no
familiar analogies.

b) Using Statistics

Statistical Inference:
Statistical inference can be defined as inferring the characteristics of a universe under
investigation from the evidence of a sample representative of that universe.
The practical solution is to select samples that are representative of the population of
Then, through observations and analysis of the sample data, the researcher may infer
characteristics of the population.
Estimating or inferring a population characteristic (parameter) from a random sample
(statistic) is not an exact process.
Fortunately, an advantage of random selection is that the sample statistic will be an
unbiased estimate of the population parameter.

The Central Limit Theorem:
An important principle, known as the central limit theorem, describes the characteristics
of sample means.
If a large number of equal-sized samples (greater than 30 subjects) is selected at random
from an infinite population:
(i) The means of the samples will be normally distributed.
(ii) The mean value of the sample means will be the same as the mean of the population.
(iii) The distribution of sample means will have its own standard deviation.
This is in actuality the distribution of the expected sampling error, known as the
standard error of the mean.
As the sample is reduced in size and approaches 1, the standard error of the mean
approaches the standard deviation of the individual scores.
As sample size increases, the magnitude of the error decreases.
Sample size and sampling error are negatively correlated.
In general, as the number of independent observations increases, the error involved in
generalizing from sample values to population values decreases and accuracy of
prediction increases.
Thus the value of a population mean, inferred from a randomly selected sample mean,
can be estimated on a probability basis.

Decision Making:
Statistical decisions about parameters based upon evidence observed in samples always
involve the possibility of error.
Rejection of a null hypothesis when it is really true is known as a Type I error.
The level of significance (alpha) selected determines the probability of a Type I error.
For example, when the researcher rejects a null hypothesis at the .05 level, she is taking a
5 percent risk of rejecting what should be a sampling error explanation when it is
probably true.
Not rejecting a null hypothesis when it is really false is known as a Type II error.
This decision errs in accepting a sampling error explanation when it is probably false.

Setting a level of significance as high as .01 level minimizes the risk of Type I error but
this high level of significance is more conservative and increases the risk of a Type II
The researcher sets the level of significance based upon the relative seriousness of
making a Type I or Type II error.

Degrees of Freedom:
The number of degrees of freedom in a distribution is the number of observations or
values that are independent of each other, that cannot be deduced from each other.
The strength of a prediction is increased as the number of independent observations or
degrees of freedom is increased.


Testing Approaches
There are two approaches to hypothesis testing.
The more established is the classical or sampling-theory approach and the second is
known as the Bayesian approach.

Classical statistics are found in all major statistics books and are widely used in research
This approach represents an objective view of probability in which the decision making
rests totally on an analysis of available sampling data.
A hypothesis is established; it is rejected or fails to be rejected, based on the sample data

Bayesian statistics are an extension of the classical approach.

They also use sampling data for making decisions, but they go beyond them to consider
all other available information.
This additional information consists of subjective probability estimates stated in terms of
degrees of belief.

These subjective estimates are based on general experience rather than on specific
collected data and are expressed as a prior distribution that can be revised after sample
information is gathered.
Various decision rules are established, cost and other estimates can be introduced, and the
expected outcomes of combinations of these elements are used to judge decision

Statistical Significance
Following the sampling-theory approach, we accept or reject a hypothesis on the basis of
sampling information alone.
Since any sample will almost surely vary somewhat from its population, we must judge
whether these differences are statistically significant or insignificant.
A difference has statistical significance if there is good reason to believe the difference
does not represent random sampling fluctuations only.

Example: The controller of a large retail chain may be concerned about a possible
slowdown in payments by the company’s customers.
She measures the rate of payment in terms of the average age of accounts receavables
Generally, the company has maintained an average of about 50 days with a standard
deviation of 10 days.
Suppose the controller has all of the customer accounts analyzed and finds the average
now is 51 days.
Is this difference statistically significant from 50?
Of course it is because the difference is based on a census of the accounts and there is no
sampling involved.
It is a fact that the population average has moved from 50 to 51 days.
Since it would be too expensive to analyze all of a company’s receivables frequently, we
normally resort to sampling.
Assume a sample of 25 accounts is randomly selected and the average number of days
outstanding is calculated to be 54.

Is this statistically significant? The answer is not obvious.
It is significant if there is good reason to believe the average age of the total group of
receivables has moved up from 50.

Since the evidence consists of only a sample, consider the second possibility, that this is
only a random sampling error and thus not significant.
The task is to judge whether such a result from this sample is or is not statistically
To answer this question, we need to consider further the logic of hypothesis testing.

The Logic of Hypothesis Testing

In classical tests of significance, two kinds of hypotheses are used.
The null hypothesis is used for testing.
It is a statement that no difference exists between the parameter and the statistic being
compared to it.
Analysts usually test to determine whether there has been no change in the population of
interest or whether a real difference exists.

A second, or alternative hypothesis, holds that a difference exists between the parameter
and the statistic being compared to it.
The alternative hypothesis is the logical opposite of the null hypothesis.

The accounts receivable example above can be explored further to show how these
concepts are used to test for significance.
The null hypothesis states that the population parameter of 50 days has not changed.
The alternative hypothesis holds that there has been a change in average days outstanding
(i.e. the sample statistic of 54 indicates the population value probably is no longer 50).

Null hypothesis, Ho: There has been no change from the 50 days average age of accounts

The alternative hypothesis may take several forms, depending on the objective of the
The HA may be of the “not the same” form:
Alternative hypothesis, HA: The average age of accounts has changed from 50 days.
A second variety may be of the “greater than” or “less than” form:
HA: The average age of receivables has increased (decreased) from 50 days.

These types of alternative hypotheses correspond with two-tailed and one-tailed tests.
A two-tailed test, or nondirectional test, considers two possibilities:
The average could be more than 50 days, or it could be less than 50 days.
To test this hypothesis, the regions of rejection are divided into two tails of the

A one-tailed test, or directional test, places the entire probability of an unlikely outcome
into the tail specified by the alternative hypothesis.
In Fig. below, the first diagram represents a nondirectional hypothesis, and the second
is a directional hypothesis of the “greater than” variety.
Hypotheses for the example may be expressed in the following form:

Null Ho: = 50 days

Alternative HA: 50 days ( not the same case)
Or HA: > 50 days (greater than case)
Or HA: < 50 days (less than case)

In testing these hypotheses, adopt this decision rule: Take no corrective action if the
analysis shows that one cannot reject the null hypothesis.

Fig. : One- and Two-Tailed Tests at the 5% Level of Significance

Reject Ho Do not reject Ho Reject Ho


Z = -1.96 50 Z = 1.96

If we reject a null hypothesis (finding a statistically significant difference), then we are

accepting the alternative hypothesis.
In either accepting or rejecting a null hypothesis, we can make incorrect decisions.
A null hypothesis can be accepted when it should have been rejected or rejected when it
should have been accepted.

Let us illustrate these problems with an analogy to the Zambian legal system.

Do not reject Ho Reject Ho

α =.025

50 Z = 1.96

In our system of justice, the innocence of an accused person is presumed until proof of
guilt beyond a reasonable doubt can be established.
In hypothesis testing, this is the null hypothesis; there should be no difference between
the presumption and the outcome unless contrary evidence is furnished.
Once evidence establishes beyond reasonable doubt that innocence can no longer be
maintained, a just conviction is required.
This is equivalent to rejecting the null hypothesis and accepting the alternative
Incorrect decisions and errors are the other two possible outcomes.
We can unjustly convict an innocent person, or we can acquit a guilty person.

The Table below compares the statistical situation to the legal one.

Table : Comparison of Statistical Decisions to Legal Analogy

Innocent Guilty of
of crime crime
not guilty Unjustly

Innocent Guilty
State of Nature
Ho is true HA is true Unjustly Justly
convicte convicted
D A Ho Correct decision Type II Error d
e c
c c Power of test Power of test
I e Probability = 1 – α Probability = β
s p
Type I Error
i t
Correct decision
o Significance level Power of test
n: HA Probability = α Probability = 1 - β

One of two conditions exists in nature – either the null hypothesis is true or the
alternative hypothesis is true.
An accused person is innocent or guilty.
Two decisions can be made about these conditions: one may accept the null hypothesis or
reject it (thereby accepting the alternative).
Two of these situations result in correct decisions; the other two lead to decision errors.

When a Type I error (α) is committed, a true null hypothesis is rejected; the innocent
person is unjustly convicted.
The α value is called the level of significance and is the probability of rejecting the true
With a Type II error (β), one fails to reject a false null hypothesis; the result is an unjust
acquittal with the guilty person going free.
In our system of justice, it is more important to reduce the probability of convicting the
innocent than acquitting the guilty.

Similarly, hypothesis testing places a greater emphasis on Type I errors than on Type II.


Where variables are composed of category data (frequency counts of nominally scaled
variables), there may be need to inspect the relationships between and among those
This analysis is commonly done with cross tabulation.

Cross tabulation is a technique for comparing two classification variables, such as gender
and selection by one‟s company for an overseas assignment.

The technique uses tables having rows and columns that correspond to the levels or
values of each variable‟s categories.
For this example, the computer-generated cross tabulation will have two rows for gender
and two columns for assignment selection.
The combination of the variables with their values produces 4 cells.

Each cell contains a count of the cases of the joint classification and also the row, column
and total percentages.

The number of row cells and column cells is often used to designate the size of the table,
as in this 2 x 2 table.
The cells are individually identified by their row and column numbers.

Row and column totals, called marginals, appear at the bottom and right “margins” of the
They show the counts and percentages of the separate rows and columns.

When tables are constructed for statistical testing, we call them contingency tables, and
the test determines if the classification variables are independent.
Of course, tables may be larger than 2 x 2.

Tests of Significance
Types of Tests: There are two general classes of significance tests: parametric and
Parametric tests are more powerful because their data are derived from interval and
ratio measurements (data).
Nonparametric tests are used to test hypotheses with nominal and ordinal data.
Parametric techniques are the tests of choice if their assumptions are met.


Probably the most widely used non parametric test of significance is the chi-square (χ²)

It is particularly useful in tests involving nominal data but can be used for higher scales
(e.g. ordinal).

Typical are cases where persons, events, or objects are grouped in two or more nominal
categories such as “yes-no”, “favor-undecided-against”, or class “A,B, C, or D”.

Using this technique, we test for significant differences between the observed distribution
of data among categories and the expected distribution based on the null hypothesis.
Chi-square is useful in cases of one-sample analysis, two independent samples, or K
independent samples.
It must be calculated with actual counts rather than percentages.
In the one-sample case, we establish a null hypothesis based on the expected frequency of
objects in each category.
Then the deviations of the actual frequencies in each category are compared with the
hypothesized frequencies.

The greater the difference between them, the less is the probability that these differences
can be attributed to chance.
The value of χ² is the measure that expresses the extent of this difference.
The larger the divergence, the larger is the χ² value.
There is a different distribution for χ² for each number of degrees of freedom (d.f.),
defined as (k-1) or the number of categories in the classification minus 1.

d.f. = k – 1

With chi-square contingency tables of the two-sample or k-sample variety, we have both
rows and columns in the cross-classification table.
In that instance, d.f. is defined as rows minus 1 (r-1) times columns minus 1 (c-1).

d.f. = (r-1) (c-1)

In a 2x2 table there is 1 d.f., and in a 3x2 table there are 2 d.f.

Depending on the number of degrees of freedom, we must be certain the numbers in each
cell are large enough to make the χ² test appropriate.

When d.f. = 1, each expected frequency should be at least 5 in size.

If d.f.>1, then the χ² test should not be used if more than 20 percent of the expected
frequency is less than 1.
Expected frequencies can often be increased by combining adjacent categories.

If there are only two categories and still there are too few in a given class, it is better to
sue the binomial test.
For example, do the variations in the living arrangement (type and location of student
housing and eating arrangements) indicate there is a significant difference among the
subjects, or are they sampling variations only?

Proceed as follows:

1. Null hypothesis. Ho: Oi =Ei .

The proportion in the population who intend to join the club is independent of
living arrangement.
In HA:Oi ≠ Ei, the proportion in the population who intend to join the club is
dependent on living arrangement.

2. Statistical Test. Use the one-sample χ² to compare the observed distribution to a

hypothesized distribution.
The χ² test is used because the responses are classified into 4 categories).

3. Significance level. Let α = .05

4. Calculate value

oi  Ei  2
x 

i 1 Ei

In which

Oi = Observed number of cases categoriezed in the ith category

Ei = Expected number of cases in the ith categories
K=The number of categories.

Calculate the expected distribution by determining what proportion of the 200

students interviewed were in each group.

Then apply these proportions to the number who intend to join the club.

Then calculate the following:

x 2

16  27 

(13  120) 2 16  12
 
15  92 2

27 12 12 9

= 4.48 + 0.08 + 4.0

d.f. = (4-4)(2-1) = 3 x 1 = 3

5. Critical test value. Enter the table of critical values of χ² (Appendix table F-
3), with 3 d.f., and secure a value of 7.82 for α = 05.

6. Decision. The calculated value is greater than the critical value, so the null
hypothesis is rejected.

Living Arrangement Intend to No. Percent (No. Expected

Join Interviewed Interviewed Frequencies

Dorm/Fraternity 16 90 45 27
House, nearby 13 40 20 12
House, distant 16 40 20 12
Live at home 15 30 15 9

Total 60 200 100 60

Two Independent Sample Case

Non parametric Test:

The chi-square (χ² test is appropriate for situations in which a test for
differences between samples is required.
When parametric data have been reduced to categories, they are frequently
treated with χ² although this results in a loss of information.

Preparing the solve this problem is the same as presented earlier although the
formula differs slightly:

o  E

x 
2 ij

i i E ij

in which

OIJ = Observed number of cases categorized in the ijth cell
Eij=Expected number of cases under Ho to be categorized in the ijth cell

Suppose a manager implementing a smoke-free workplace policy is interested

in whether smoking affects worker accidents.
Since the company had complete reports of on-the-job accidents, she draws a
sample of names of workers who were involved in accidents during the year.

A similar sample from among workers who had no reported accidents in the
last year is drawn.
She interviews members of bother groups to determine if they are smokers or
The results appear in the following table.

The expected values have been calculated and are shown. The testing
procedure is:

1. Null hypothesis. Ho: There is no difference in on-the-job accident

occurrences between smokers and non smokers.

2. Statistical test. χ² is appropriate but is may waste some of the data

because the measurement appears to be ordinal.

3. Significance level. α = .05, with d.f = (3-1)(2-1)=2

4. Calculated value. The expected distribution is provided by the marginal

totals of the table.

If there is no relationship between accidents and smoking, there will

be the same proportion of smokers in both accident and non-
accidents classes.
The numbers of expected observations in each cell are calculated by
multiplying the two marginal totals common to a particular cell and
dividing this product by n.

For example,

34 x16  8.24
, the expected value in cell (1,1)

12  8.242  4  7.752  9  7.732  6  7.722  13  18.3  22  16.972
8.24 7.75 7.73 7.72 18.03 16.97

= 6.86

5. Critical test value. Enter Appendix table F-3 and find the critical value
5.99 with  =05 and d.f.=2.

6. Decision, since the calculated value is greater than the critical

value, the null hypothesis is rejected.

On-the-Job Accident

Count Expected Row

Smoker Values Yes No Total
Heavy 12 4 16
8.24 7.75

Moderate 9 6 15
7.73 7.27

Non-smoker 13 22 35
18.03 16.97
Column Total 34 32 66


One –sample tests are used when we have a single sample and wish to test
the hypothesis that is comes from a specified population.

Parametric Tests:

The Z or t test is used to determine the statistical significance between a

sample distribution mean and a parameter.
This is a compensation for the lack of information about the population
standard deviation.
Although the sample standard deviation is used as a proxy figure, the
imprecision makes it necessary to go farther away from O to include the
percentage of values in the t distribution necessarily found in the standard
When sample sizes approach 120, the sample standard deviation becomes
a very good estimate of 6.

Beyond 120, the t and z distributions are virtually identical. Example of

the application of the test to the one-sample case.

When we substitute s for δ, we use the t distribution, especially if the

sample size is less than 30.

t=x- 
S/ n

This significance test is conducted as follows:

H 0 :   50 days
1. Null hypothesis.
H A: :   50 days (one  tailed test )

2. Statistical test. Choose the t test because the data are

ratio measurements.
Assume the underlying population is normal
and we have randomly selected the sample
from the population of customer accounts.
3. Significance level. Let   0.05, n  100
52.5  50
4. Calculated value t   1.786, d . f .  n  1  99
14 / 100
5. Critical test value. We obtain this by entering the table of critical
values of t (Appendix table f-2) with 99 d.f. and a level of
significance value of 0.5.
We secure a critical value of about 1.66 (interpolated between d.f.
= 60 and d.f. = 120)

6. Decision. In this case, the calculated value is greater than the

critical value (1.78>1.66), so we reject the null hypothesis.


K independent samples tests are often used when three or more samples are involved.

Under this condition, we are interested in learning whether the samples might have come
from the same or identical populations.
When the data are measured on an interval-ratio scale and we can meet the necessary
assumptions, analysis of variance (ANOVA) and the F test are used.

As with the two-samples case, the samples are assumed to be independent.

Parametric Tests:

The statistical method for testing the null hypothesis that the means of several
populations are equal is ANOVA.

One - way analysis of variance uses a single-factor, fixed-effects model to compare the
effects of one factor on a continuous dependent variable.

In a fixed-effects model, the levels of the factor are established in advance, and the
results are not generalizable to other levels of treatment.

To use ANOVA, certain conditions must be met.

The samples must be randomly selected from normal populations, and the populations
should have equal variances.

In addition, the distance from one value to its group‟s mean should be independence of

ANOVA is reasonably robust, and minor variations from normality and equal variance
are tolerable.

Analysis of variance, as the name implies, breaks down or partitions total variability into
component parts.

Unlike the t test, which uses sample standard deviations, ANOVA uses squared
deviations of the variance so computation of distances of the individual data points from
their own mean or from the grand mean can be summed.

In an ANOVA model, each group has its own mean and values that deviate from that
Similarly, all the data points from all of the groups produce and overall grand mean.

The total deviation is the sum of the squared differences between each data point and the
overall grand mean.
The total deviation of any particular data point may be partitioned into between-groups
variance and within-groups variance.

The differences of between-group means imply that each group was treated differently,
and the treatment will appear as deviations of the sample means from the grand mean.
The between-groups variance represents the effect of the treatment or factor.

The differences of between-group means imply that each group was treated differently,
and the treatment will appear as deviations of the sample means from the grand mean.

The within-groups variance describes the deviations of the data points within each group
from the sample mean.
This results from variability among subjects and from random variation.
It is often called error.

When the variability attributable to the treatment exceeds the variability arising from
error and random fluctuations, the viability of the null hypothesis begins to diminish.
This is how the test statistic for ANOVA works.

The test statistic for ANOVA is the F ratio.
It compares the variance from the last two sources:

F= Between – Groups variance

Within-Groups variance

= Mean square between

Mean square within


Mean square between = Sum of squares between

Degrees of freedom between

Mean square within = Sum of squares between

Degrees of freedom between

To compute the F ration, the sum of the squared deviations for the numerator and
denominator are divided by their respective degrees of freedom.
By dividing, we are computing the variance as an average or mean, thus the term mean

The degrees of freedom for the numerator, the mean square between groups, is one less
than the number of groups (K-1).
The degrees of freedom for the denominator, the mean square within groups, is the total
number of observations minus the number of groups (n-k).

If the null hypothesis is true, there should be no difference between the populations, and
the ratio should be close to 1.
If the population means are not equal, the numerator should manifest this difference, and
the f ratio necessary to reject the null hypothesis for a particular sample size and level of

Testing Procedures:

1. Null hypothesis. H o :  A1   A2   A3

H A :  A1   A2   A3

2. Statistical test. The F test is chosen because we have K independent samples,

accept the assumptions of analysis of variance and when we have interval
3. Significance level. Let   0.5 and determine d.f.
4. Calculated value. F 

5. Critical test value. Enter Appendix Table F-9 with  and d.f.
6. Make decision.

Probability Values (p Values)

According to the “make the decision” step of the statistical test procedure, the conclusion
is stated in terms of rejecting or not rejecting the null hypothesis based on reject region
selected before the test is conducted.

A second method of presenting the results of a statistical test reports the extent to which
the test statistic disagrees with the null hypothesis.
This indicates the percentage of the sampling distribution that lies beyond the sample
statistic on the curve.

Most statistical computer programs report the results of statistical tests as probability
values (p values).
The p value is the probability of observing a sample value as extreme as, or more extreme
than, the value actually observed, given that the null hypothesis is true.
This area represents the probability of a Type I error that must be assumed if the null
hypothesis is rejected.

The p value is compared to the significance level   , and on this basis the null
hypothesis is either rejected or not rejected.
If the p value is less than the significance level, the null hypothesis is rejected (if p value
<  , reject null).
If the p is greater than or equal to the significance level, the null hypothesis is not rejected
(if p value >  , don‟t reject null)
The critical value of 53.29 was computed based on standard deviation of 10, sample size
of 25, and the controller‟s willingness to accept a 5 percent  risk.
Suppose that the sample mean equaled 55. Is there enough evidence to reject the null

The p value is computed as follows:

The standard deviation of the distribution of sample means is 2

The appropriate z value is,


55  5 

 2.5

The p value is determined using the standard normal table.

The area between the mea and a Z value of 2.5 is 0.4938.
The p value is the area above the z value.

The probability of observing a z value at least as large as 2.5 is only 0.5000 – 0.4932
=0.0062 if the null hypothesis is true.

This small p value represents the risk of rejecting the null hypothesis.
It is the probability of a type I error if the null hypothesis is rejected.

Since the p value (0.0062) is smaller than   0.05 , the null hypothesis is rejected.

The controller can conclude that the average age of the accounts receivable has increased.

The probability that this conclusion is wrong is 0.0062.


The general formula for describing the association between two variables can be given
Y = f(X)
This indicates that X causes Y, so the value of X determines the value of Y.
Regression analysis is a method for determining the specific function relating Y to X.

Linear regression:
This is the simplest form of regression analysis.
This depicts the case of a perfect linear association between two variables.
The relationship between the two variables is described by the equation
This is called the regression equation.
Because all the points on a scattergram indicating the values of X and Y lie on a straight
line, we could superimpose that line over the points.

This is the regression line.
The regression line offers a graphic picture of the association between X and Y, and the
regression equation is an efficient form for summarizing that association.
The general format for this equation is
Y‟ = a + b(X)

where a and b are computed values ( intercept and slope, respectively)

X is a given value on one variable
Y‟ is the estimated value on the other

The values of a and b are computed to minimize the differences between actual values of
Y and the corresponding estimates (Y‟) based on the known value of X.
The sum of squared differences between actual and estimated values of Y is called the
unexplained variation because it represents errors that still exist even when estimates are
based on known values of X.
The explained variation is the difference between the total variation and the unexplained
Dividing the explained variation by the total variation produces a measure of the
proportionate reduction of error due to the known value of X.
This quantity is the correlation squared, r2.
But in practice we compute r.

Multiple Regression:

Very often, researchers find that a given dependent variable is affected simultaneously by
several independent variables.
Multiple regression analysis provides a means of analyzing such situations.
A multiple regression equation can take the form:
Y = b0 + b1X1 + b2X2 + b3X3 +……………………………+ bnXn + e

where b = Regression weight ( slope)

e = Residual ( variance in Y that is not accounted for by the X variables
analyzed, or error or total variation)

The values of the several b‟s show the relative contributions of the several independent
variables in determining the value of Y.
The multiple-correlation coefficient is calculated here as an indicator of the extent to
which all the independent variables predict the final value of the dependent variable.
This follows the same logic as the simple bivariate correlation discussed above but is
given as a capital R2 and indicates the percent of the total variance explained by the
independent variables.
Hence, when R = .877, R2 = .77, meaning that 77 percent of the variance in the final
value has been explained.

c) Interpretation of Results

The interpretation method aims at elaborating on an empirical relationship among

variables in order to interpret that relationship.
Interpretation is similar to explanation, except for the time placement of the test variable
and the implications that follow from that difference.
An interpretation does not deny the validity of the original, causal relationship but simply
clarifies the process through which that relationship functions.
Researchers must interpret their findings in light of the research question (s) or determine
if the results are consistent with their hypotheses and theories.
In such cases, several key pieces of information make interpretation possible:
 A statement of the functions the instrument was designed to measure and the
procedures by which it was developed.
 Detailed instructions for administration.
 Scoring keys and scoring instructions.
 Norms for appropriate reference groups.
 Evidence about reliability.
 Evidence on the intercorrelations of subscores.

 Evidence on the relationship of the instrument to other measures.
 Guides for instrument use.


a) Research Report, Content, Format and Style

It may seem unscientific and even unfair, but a poor final report or presentation can
destroy a study.
Most readers will be influenced by the quality of the reporting.
This fact should prompt researchers to make special efforts to communicate clearly and
The research report contains findings, analysis of findings, interpretations, conclusions,
and sometimes recommendations.
Reports may be defined by their degree of formality and design.
The formal report follows a well-delineated and relatively long format.
Usually headings and subheadings divide the sections.
The technical report follows the flow of the research.
The prefatory materials, such as a letter of authorization and a table of contents, are first.
An introduction covers the purpose of the study and is followed by a section on
The findings are presented next, including tables and other graphics.
The conclusion section includes recommendations.
Finally, the appendices contain technical information, instruments, glossaries, and

Prefatory Items
Title Page: The title page should include four items – the title of the report, the date, and
for whom and by whom it was prepared.
The title should be brief but include the following three elements – (i) the variables
included in the study, (ii) the type of relationship among the variables, and (iii) the
population to which the results may be applied.

Here are three acceptable ways to word report titles:
Descriptive study - The Five-Year Demand Outlook for Plastic Pipe in Zambia.
Correlation study - The Relationship Between the Value of the Dollar in World Markets
and Relative National Inflation Rates.
Causal Study - The Effect of Various Motivation Methods on Worker Attitudes among
Textile Workers.

Table of Contents: As a rough guide, any report of several sections that totals more than
6 to 10 pages should have a table of contents.
If there are many tables, charts, or other exhibits, they should also be listed after the table
of contents in a separate table of illustrations.
The introduction prepares the reader for the report by describing the parts of the project:
the problem statement, research objectives, and background material.

Background: Background material may be the preliminary results of exploration from an

experience survey, focus group, or another source.
Alternatively, it could be secondary data from the literature review.
Previous research, theory, or situations that led to the management question are also
discussed in this section.
The literature should be organized, integrated, and presented in a way that connects it
logically to the problem.
The background includes definitions, qualifications, and assumptions.
If background material is composed primarily of literature review and related research, it
should follow the objectives.
If it contains information pertinent to the management problem or the situation that led to
the study, it can be placed before the problem statement.

For a technical report, the methodology is an important section, containing at least five

Sampling Design:
The researcher explicitly defines the target population being studied and the sampling
methods used.
Explanations of the sampling methods, uniqueness of the chosen parameters, or other
points that need explanation should be covered with brevity.
Calculations should be placed in an appendix instead of in the body of the report.

Research Design:
The coverage of the design must be adapted to the purpose.
In an experimental study, the materials, tests, equipment, control conditions, and other
devices should be described.
In descriptive or ex post facto designs, it may be sufficient to cover the rationale for using
one design instead of competing alternatives.
Even with a sophisticated design, the strengths and weaknesses should be identified and
the instrumentation and materials discussed.
Copies of materials are placed in an appendix.

Data Collection:
This part of the report describes the specifics of gathering the data.
Its contents depend on the selected design.
Survey work generally uses a team with field and central supervision.
In an experiment, we would want to know about subject assignment to groups, the use of
standardized procedures and protocols, the administration of tests or observational forms,
manipulation of the variables, etc.
Typically, you would include a discussion on the relevance of secondary data that guided
these decisions.
Again, detailed materials such as field instructions should be included in an appendix.

Data Analysis:
This section summarizes the methods used to analyze the data.

Describe data handling, preliminary analysis, statistical tests, computer programs, and
other technical information.
The rationale for the choice of analysis approaches should be made clear.
A brief commentary on assumptions and appropriateness of use should be presented.

This section should be a thoughtful presentation of significant methodology or
implementation problems.
All research studies have their limitations, and the sincere investigator recognizes that
readers need aid in judging the study‟s validity.

This is generally the longest section of the report.
The objective is to explain the data rather than draw interpretations or conclusions.
When quantitative data can be presented, this should be done as simply as possible with
charts, graphics, and tables.
The data need not include everything you have collected but only that which is important
to the reader‟s understanding of the problem and the findings.
However, make sure to show findings unfavorable to your hypotheses and those that
support them.
It is useful to present findings in numbered paragraphs or to present one finding per page
with the quantitative data supporting the findings presented in a small table or chart on
the same page.

Summary and Conclusions:
The summary is a brief statement of the essential findings.
Sectional summaries may be used if there are many specific findings.
These may be combined into an overall summary.
Findings state facts; conclusions represent inferences drawn from the findings.
Conclusions may be presented in a tabular form for easy reading and reference.

Summary findings may be subordinated under the related conclusion statement.
These may be numbered to refer the reader to pages or tables in the findings sections.

There are usually a few ideas about corrective actions.
In academic research, the recommendations are often further study suggestions that
broaden or test understanding of the subject area.
In applied research, the recommendations will usually be for managerial action rather
than research action.
The writer may offer several alternatives with justifications.

The appendices are the place for complex tables, statistical tests, supporting documents,
copies of forms and questionnaires, detailed descriptions of the methodology, instructions
to field workers, lists of respondent information, and other evidence important for later

The use of secondary data requires a complete bibliography.
All sources used by the researcher must be indicated in the bibliography.
Proper citation, style, and formats are unique to the purpose of the report.
Examples are given at the end of this document.
Recommended style for journal articles:
Maliti, B. (2007): Research Methods for MBA Students. Journal of Strategic
Management, 8(2), 125 – 135.
Recommended style for books:
Maliti, B. (2007): Research Methods for MBA Students, 2nd ed. Kitwe: CBU Press.
In general:
Last name of author, first name. Title of article or document. Title of journal, newsletter,
or conference. Volume. Issue number (year) or date of publication, page numbers.

We recommend the following: Publication Manual of the American Psychological
association; Kate L. Turabian, A Manual for Writers of Term Papers, Theses, and
Dissertations; and Joseph Gibaldi, MLA Handbook for Writers of Research Papers.

Whenever you are reporting on the work of others, you must be clear about who said
That is, you must avoid plagiarism: the theft of another’s words and/or ideas –
whether intentional or accidental – and the presentation of those words and ideas as
your own.
Here are the main ground rules regarding plagiarism:
 You cannot use another writer‟s exact words without using quotation marks and
giving a complete citation, which indicates the source of the quotation such that
your reader could locate that quotation in its original context.
As a general rule, taking a passage of eight or more words without a citation is a
violation of federal copyright laws.
 It‟s also not acceptable to edit or paraphrase another‟s words and present the
revised edition as your own work.
 Finally, it‟s not even acceptable to present another‟s ideas as your own – even if
you use totally different words to express those ideas.
Plagiarism represents a serious offence.
Admittedly, there are some gray areas.
Some ideas are more or less in the public domain, not “belonging” to any one person.
Or you may reach an idea on your own that someone else has already put in writing.
If you have a question about a specific situation, discuss it with your advisor or instructor
in advance.
Though you must place your research in the context of what others have done and said,
the improper use of their materials is a serious offense.
Mastering this matter is a part of your “coming of age” as a scholar.

Thus, all the materials presented in this document have been reproduced verbatim from
the following sources, summarized by the compiler and is exclusively for teaching

Babbie, E. (1998): The Practice of Social Research (8th Ed.). New York: Wadsworth
Publishing Company.
Best, J.W., Kahn, J.V. (1989): Research in Education (6th Ed.). New Delhi: Prentice Hall
Clover, V.T., Balsley, H.L. ( 1984): Business Research Methods ( 3rd Ed.). Columbus,
Ohio: Grid Publishing, Inc.
Cooper, D.R., Schindler, P.S. (1999): Business Research Methods (6th Ed.). New Delhi:
Tata McGraw-Hill Publishing Company.