Beruflich Dokumente
Kultur Dokumente
What is research?
Nature of Research
1. Originates with a question or problem.
Scope of Research
Business Environment
Maturing of management
Powerful computations
Advanced analytical tools
Types of research
Basic research: The basic premise is the need to KNOW and the concern
is primarily academic in nature.
Sampling considerations
Research applications
Market & consumer analysis
Product research
Pricing research
Promotional research
Place research
MUST have: a logical and explicitly stated justification for the selected
methods
Independent variable
Dependent variable
Intervening variables
Moderating variables
RESEARCH DESIGNS
Exploratory research designs: are the simplest, most flexible and most loosely structured
designs. As the name suggests, the basic objective of the study is to explore and obtain clarity
on the problem situation.
Descriptive research designs: are more structured and formal in nature. As the name implies
the objective of these studies is to provide a comprehensive and detailed explanation of the
phenomena under study.
a scientific manner in order to collect valuable data related to the topic under study.
A research design is the framework or plan for a study used as a guide in collecting
and analysing data. There are three basic types of research design: exploratory,
descriptive, and causal. The names of the three types of research design describe their
purpose very well. The goal of exploratory research is to discover ideas and insights.
Descriptive research is usually concerned with describing a population with respect to
important variables. Causal research is used to establish cause-and-effect relationships
between variables. Experiments are commonly used in causal research designs because
they are best suited to determine cause and effect.
EXPLORATORY RESEARCH
Descriptive research is very common in business and other aspects of life. In fact, most
of the marketing research you’ve heard about or participated in can be categorized as
descriptive research. With a descriptive research design, we are usually trying to
describe some group of people or other entities. For example, Scion, a division of
Toyota Motor Corporation, was designed from the start to be “cool” and “edgy” in an
attempt to appeal to younger consumers. After the radical Scion xB had been in limited
release in southern California for a while, researchers used descriptive research to
confirm that the average Scion customer was 39 years old, the youngest in the
automotive industry, compared with Toyota customers, who were 54 years old on
average.
Classification of data
Primary data is original, problem or project specific and collected for the specific objectives
and needs spelt out by the researcher. The authenticity and relevance is reasonably high.
Secondary data is information that is not topical or research specific and has been collected
and compiled by some other researcher or investigative body. It is recorded and published in
a structured format.
Hypotheses designing: earlier work done on the topic and market data as well as industry
trends and market facts. Could help in developing assumptions that can be translated into
testable hypotheses for the study.
Primary base: can be used to design questionnaires for the primary study.
Validation and authentication board: earlier records and studies as well as data pools. Can
also be used to support or validate the information collected through primary sources.
Resource advantage
Accessibility of data
Assessment of data
Applicability of data: the purpose for which the information was earlier
collected was unique to that study and thus the information might not be
absolutely applicable or relevant for the current study objective.
Accuracy of data: the source and data credibility is a serious issue of
concern when using past data and records.
Questionnaire
Observations
Experiments
Example:
Ordinal scale: This is the next higher level of measurement. One of the
limitations of the nominal scale measurements is that we cannot say
whether the assigned number to an object is higher or lower than the
one assigned to another option. The ordinal scale measurement takes
care of this limitation. An ordinal scale measurement tells whether an
object has more or less of characteristics than some other objects.
Interval scale: The interval scale measurement is the next higher level of
measurement.
In the interval scale the difference of the score on the scale has
meaningful interpretation.
The mathematical form of the data on the interval scale may be written
as
Y=a+bX where a ≠ 0
Ratio scale: This is the highest level of measurement and takes care of
the limitations of the interval scale measurement, where the ratio of the
measurements on the scale does not have a meaningful interpretation.
All mathematical and statistical operations can be carried out using the
ratio scale data.
Can be conducted to collect useful data from a large population in a short duration of time
The spelt out research objectives need to be converted into specific questions
Types of questionnaire
Formalized & unconcealed questionnaire: self-explanatory with most response categories
predefined
Formalized & concealed questionnaire: most response categories are predefined, but latent
cause of behaviour are derived from indirect questions
Non-formalized & concealed questionnaire: undisguised and most response categories are
not predefined
---------------------------------------------------------------------------
Give three reasons why you believe that the 2010 Commonwealth Games in India are going
to help the country?
Population spread
Study area
Type of questions
Open ended questions:
How would you evaluate the work done by the present government?
1. Dichotomous questions
- Between Rs 2500-5000/-
- You are interested but you do not know how to procure it.
- It is not profitable.
- any other--------------------
Questionnaire structure
Instructions
Opening questions
Study questions
Classification information
Acknowledgement
Sampling Concepts
A decision-maker may not have too much of time to wait till all the
information is available.
Sampling Design
Probability Sampling Design - Probability sampling designs are used in conclusive research. In
a probability sampling design, each and every element of the population has a known chance
of being selected in the sample.
Systematic sampling
Convenience sampling
Judgmental sampling
Snowball sampling
Quota sampling
Non-sampling error: This error arises not because a sample is not a representative of the
population but because of other reasons. Some of these reasons are listed below:
The error can arise while transferring the data from the questionnaire to the spreadsheet on
the computer.
Probability sampling
The sample method involves taking a representative selection of the population and
using the data collected as research information. A sample is a “subgroup of a
population” . It has also been described as a representative “taste” of a group. The
sample should be “representative in the sense that each sampled unit will represent the
characteristics of a known number of units in the population”
Simple random sampling is often called straight random sampling. The naming
convention of this type of probability sampling method is not indicative of the
discipline but reliant upon the researcher or author of the various books and articles
referenced. That is to say that these two terms are interchangeable and is not
interdependent on a specific discipline within academia. Simple random sampling
requires that each member of the population have an equal chance of being selected (as
is the main goal of probability sampling). A simple random sample is selected by
assigning a number to each member in the population list and then “use a random
number table to draw out the members of the sample” . Another way of viewing simple
random sampling precludes that “all members of the study population are either
physically present or listed, and the members are selected at random until a previously
specified number of members or units has been selected” .Each member of
the population is “selected one at a time, independent of one another and without
replacement; once a unit is selected, it has no further chance to be selected” .
Regardless of the process used for simple random sampling, the process can be
laborious if the list of the population is long or it is completed manually without the
aid of a computer.
Stratified random sampling is “one in which the population is divided into subgroups
or ‘strata,’ and a random sample is then selected from each subgroup” . When a few
characteristics are known about a population, stratified random sampling is preferable
because the population may be arranged in subgroups and then a random sample may
be selected from each of these subgroups “arranging the original unit into categories
so that the distribution of a particular group in the population of interest will be closely
replicated in the sample” . These subgroups can exhibit characteristics including but
not limited to gender, race, ethnicity, religion, and age groups.
Cluster sampling
Cluster sampling, on the surface, is very similar to stratified sampling in that “survey
population members are divided into unique, non-overlapping groups prior to
sampling” . These groups are referred to as clusters instead of strata because they are
“naturally occurring groupings such as schools, households, or geographic units” .
Whereas a stratified sample “involves selecting a few members from each group or
stratum,” cluster sampling involves “the selection of a few groups and data are
collected from all group members” . This sampling method is used when no master list
of the population exists but “cluster” lists are obtainable
Convenience sampling
Convenience sampling includes participants who are readily available and agree to
participate in a study indicates that convenience sampling is often called accidental
but also include haphazard as an alternate title it does not use the specific title of
convenience, but calls this same type of non-probability sample.
All of these alternate names for convenience non-probability sampling include the
same definition. Convenience is just that… convenient. This is a relatively easy choice
for researchers when a group of people cannot be found to survey or question.
convenience sampling may include going to a place of business, Purposive non-
probability sample is also known as judgment or judgmental. No matter the naming
convention used, all authors agree on the definition of this non-probability sampling
method. Purposive sampling is selecting a sample “on the basis of your own
knowledge of the population, its elements, and the nature of your research aims” That
is the population is “non-randomly selected based on a particular characteristic” . The
individual characteristics are selected to answer necessary questions about a “certain
matter or product” . The researcher is then able to select participants based on internal
knowledge of said characteristic. This method is useful if a researcher wants to study
“a small subset of a larger population in which many members of the subset are easily
identified but the enumeration of all is nearly impossible” Pilot studies are well suited
to this type of non-probability sampling method.
Snowball sampling
snowball sampling is also called network sampling. Snowball sampling is used “in
those rare cases when the population of interest cannot be identified other than by
someone who knows that a certain person has the necessary experience or
characteristics to be included”. Snowball sampling also includes relying on previously
identified group members to identify others who may share the same characteristics as
the group already in place.
Quota sampling
In Quota sampling we divide the “population group being studied into subgroups.
Then based on the proportions of the subgroups needed for the final sample,
interviewers are given a number of units from each subgroup that they are to select and
interview” . Quota sampling is a good method to use to non-randomly select groups
based on gender, age, race, and ethnicity. In quota sampling where “respondents are
selected non-randomly on the basis of their known proportion to the population”.
Quota sampling as dividing the population group into subgroups and based on the
proportions, “interviewers are given a number of units from each subgroup that they
are to select and interview”. Quota non-probability sampling and stratified probability
sampling are different in that “quota sampling allows the interviewer discretion in the
selection of the individuals for the sample” . There are a number of problems that
researchers should be aware of when choosing quota sampling.
Data Processing
data editing
The data obtained is complete in all respects.
The data is structured in a manner that entering the information will not be a
problem.
Field editing: usually done by the field investigators at the end of every field day the
investigator(s) who must review the filled forms for any inconsistencies, non-
response, illegible responses or incomplete questionnaires.
Backtracking
Plug value
Data coding
The process of identifying and denoting a numeral to the responses given by the respondent
is called coding
Field
Record
File
Data matrix
Comprehensive
Mutually exclusive
Testing of Hypothesis
You make a Type I error when you reject Null Hypothesis when it is true.
You make Type II error when you reject Alternate hypothesis when
actually it is true.
Q.1. A company hypothesizes that the life of its bulbs is 145 hrs. withstd.dev. of 210 hrs.
A sample of 25 bulbs gave a mean of 130 hrs. Using 0.05 level of significance canthe
company conclude that the avg. is less than 145 hrs.
ANS.
H0 : µ = 145
H1 : µ < 145
𝜎 210
𝜎𝑥̅ = = = 42
√𝑛 √25
Q.1. The fuel efficiency as Km per litre is given below. Use 5% level of significance to test the
fuel efficiency of four blends of gasoline.
Driver Car model
A B C D
1 13 9.4 10.6 12
2 12.4 10.2 13.6 8.7
3 9.9 12.6 9.3 13.4
4 9.8 14 12.7 10.5
T.. = 182.1
𝑇∙∙2 182.12
𝐶𝐹 = = = 2072.526
𝑚×𝑚 4×4
𝑚 𝑚
2
𝑇𝑆𝑆 = ∑ ∑ 𝑥𝑖𝑗 − 𝐶𝐹 = 2118.57 − 2072.53 = 46.04
𝑖=1 𝑗=1
𝑚
1 542 49.72 41.22 37.22
𝑇𝑟𝑆𝑆 = ∑ 𝑇𝑘2 − 𝐶𝐹 = [ + + + ] − 2072.53 = 2116.84 − 2072.53
𝑚 4 4 4 4
𝑘=1
= 44.31
𝑚
1 452 44.92 45.22 472
𝑅𝑆𝑆 = ∑ 𝑅𝑖2 − 𝐶𝐹 = [ + + + ] − 2072.53 = 2073.263 − 2072.526
𝑚 4 4 4 4
𝑖=1
= 0.736
𝑚
1 45.12 46.22 46.22 44.62
𝐶𝑆𝑆 = ∑ 𝐶𝑗2 − 𝐶𝐹 = [ + + + ] − 2072.53 = 2073. .013 − 2072.53
𝑚 4 4 4 4
𝑗=1
= 0.483
ANOVA TABLE
Source of Degrees of
Sum of Squares Mean Square F
Variation Freedom
Total 16 46.04
Assume α = 0.05. Table 𝐹73 = 202.33 is greater than tabulated 𝐹73 = 4.35, reject H0. Therefore the
average fuel efficiency of four different blends of gasoline is different.
Chi- square test
Q.1.The table shows the numbers chosen at random from a telephone directory. Test whether the digits
may be taken to occure equally or not.
Digits 0 1 2 3 4 5 6 7 8 9 Total
Frequency 1026 1107 997 966 1075 933 1107 972 964 853 10,000
ANS.
H0 : Digits occur equally in the directory.
H1 : Digits don’t occur equally in the directory.
Under the assumption that null hypothesis is true, the expected frequency for each digit would be
1000.
(O-
Digit O E (O-E)2
E)2/E
0 1026 1000 676 0.676
1 1107 1000 11449 11.449
2 997 1000 9 0.009
3 966 1000 1156 1.156
4 1075 1000 5625 5.625
5 933 1000 4489 4.489
6 1107 1000 11449 11.449
7 972 1000 784 0.784
8 964 1000 1296 1.296
9 853 1000 21609 21.609
Total 10,000 58.542
𝑘
2
(𝑂𝑖 − 𝐸𝑖 )2
𝜒𝑘−1 =∑ = 58.542
𝐸𝑖
𝑖=1
Assume α = 0.05
Frequencies
Cigarette Male Female Total
A 25 30 55
B 40 15 55
C 30 10 40
Total 95 55 150
ANS.
H0 : There is no relationship between the type of cigarettes smoked and gender.
H1 : There is a relationship between type of cigarettes smoked and gender.
Assume α = 0.05
𝑅𝑖 × 𝐶𝑗
𝐸𝑖𝑗 =
𝐺
Reject H0. There is a relationship between type of cigarette smoked and gender.
Sign Test
If weight before is greater than weight after, we assign a plus sign and if weight before is less than
weight after, we assign a negative sign. If weights before and after are equal, the observation is
omitted from analysis and correspondingly sample size is reduced.
H0 : p = ½
H1 : p > ½
At α = 0.5, the table value of Z = 1.645. Since sample value of Z is less than tabulated value of Z,
accept H0. Therefore prescribed programme of physical exercise is not effective.
Run Test
Running a Test of Randomness is a non-parametric method that is used in cases when the
parametric test is not in use. In this test, two different random samples from different
populations with different continuous cumulative distribution functions are obtained. Running
a test for randomness is carried out in a random model in which the observations vary around
a constant mean. The observation in the random model in which the run test is carried out has
a constant variance, and the observations are also probabilistically independent. The run in a
run test is defined as the consecutive sequence of ones and twos. This test checks whether or
not the number of runs are the appropriate number of runs for a randomly generated series.
The observations from the two independent samples are ranked in increasing order, and each
value is coded as a 1 or 2, and the total number of runs is summed up and used as the test
statistics. Small values do not support suggest different populations and large values suggest
identical populations (the arrangements of the values should be random). Wald Wolfowitz
run test is commonly used.
Generally, in non-parametric tests, no underlying distribution is assumed. This holds for the
run test as well, but if the number of observations is more than twenty, then it is assumed (in
the run test) that the underlying distribution would be normal and would have the mean and
variance that is given by the formulas as discussed above.
Formula of the mean and the variance of the observed number of the runs:
E (R) = H + 2 Ha Hb / H
V (R) = 2 Ha Hb (2 Ha Hb – H) / H2 (H – 1)
The researcher should note that in the run test for the random type of model, if the value of the
observations is larger than twenty, then the distribution of the observed number of runs would
approximately follow normal distribution. The value of the standard normal variate of the observed
number of runs in the run test is given by the following:
Z = R – E ( R ) / Stdev ( R ).
This follows the normal distribution that has the mean as zero and the variance as 1. This is also called
the standard normal distribution that the Z variate must follow.
Krushall-Wallis Test
𝑇 − 𝜇𝑇 45 − 95 −50
𝑍= = = = −2.01
𝜎𝑇 24.85 24.85
Reject H0. The prescribed programme of physical exercise is effective. We find just the opposite
results. However, as mentioned in the text, rank test for paired sample makes use of more information
than two sample sign test and therefore should be more reliable.
MODULE-III
Factor analysis
To interpret the rotated solution the student should have done the following
- Establish whether this a robust, statistically significant factor analysis solution and the
decision maker can accept the solution.
- As we can see from Table 1 the KMO value is > 0.5 and the Bartlett’s test of sphericity
also has a p-value of 0.000. thus the solution is a robust solution.
- To interpret the rotated solution we consider the factor ladings given for the two factor
solution. The cut off could be discretionary, depending on what the student decides on. In
our case we have taken the cutoff as 0.75.
- Factor 1: I enjoy working in the company; my company is well respected and we have a
very relaxed working atmosphere in the company. Thus, we name the factor as
organization affiliation
- Factor 2: the fellow workers are helpful; team working is recognized by the company.
Co-worker affiliation.
To calculate the eigenvalue the student should attempt the following method:
- Eigenvalue of factor 1 = (-.028)2 + (.928)2 + (.976)2 + (.376)2 + (.375)2 + (.953)2
= 3.0048
To calculate the communalities for each variable the student should do the following
3.0048
= 100 = 50.08%
6
1.757
= 100 = 29.28%
6
To interpret the rotated solution the student should have done the following
- Establish whether this a robust, statistically significant factor analysis solution and the
decision maker can accept the solution.
- As we can see from Table 1 the KMO value is > 0.5 and the Bartlett’s test of sphericity
also has a p-value of 0.000. thus the solution is a robust solution
- To interpret the rotated solution we consider the factor ladings given for the two factor
solution. The cut off could be discretionary depending on what the student decides on. In
our case we have taken the cutoff as 0.75.
- Factor 1: I get heart palpitations when my boss calls me;a sitting job leads to digestive
problems and when I retire I might not be physically fit to enjoy my retired life. Thus we
name the factor as physiological job stress
- Factor 2: work life also spills over to personal life; I do not feel like meeting people
after I go home from office and negatively related to I always like to stay back after work
hours in the office. Thus we call this factor as work-life balance.
To calculate the eigenvalue the student should attempt the following method:
- Eigenvalue of factor 1 = (.971)2 + (.050)2 + (.085)2 + (.975)2 + (-.083)2 + (.977)2
= 2.8645
- Eigenvalue of factor 2 = (.088) + (.974)2 + (.920)2 + (.095)2 + (-.971)2 + (.040)2
2
= 2.7562
To calculate the communalities for each variable the student should do the following
to calculate the contribution of each individual factors to the total variance the student
should do the following calculations
Eigen valu e of factor 1
- Percentage of variance explained by factor 1 = 100
Sum total of the Eigen valu es
2.8645
= 100 = 47.74%
6
2.7562
= 100 = 45.93%
6
The total variance explained by both factors = 47.74 45.93 = 93.04%
Assumptions:
Regression residuals must be normally distributed.
A linear relationship is assumed between the dependent variable and the independent
variables.
The residuals are homoscedastic and approximately rectangular-shaped.
Absence of multicollinearity is assumed in the model, meaning that the independent variables
are not too highly correlated.
At the center of the multiple linear regression analysis is the task of fitting a single line
through a scatter plot. More specifically the multiple linear regression fits a line through a
multi-dimensional space of data points. The simplest form has one dependent and two
independent variables. The dependent variable may also be referred to as the outcome
variable or regressand. The independent variables may also be referred to as the predictor
variables or regressors.
There are 3 major uses for multiple linear regression analysis. First, it might be used to
identify the strength of the effect that the independent variables have on a dependent variable.
Second, it can be used to forecast effects or impacts of changes. That is, multiple linear
regression analysis helps us to understand how much will the dependent variable change when
we change the independent variables. For instance, a multiple linear regression can tell you
how much GPA is expected to increase (or decrease) for every one-point increase (or
decrease) in IQ.
Third, multiple linear regression analysis predicts trends and future values. The multiple
linear regression analysis can be used to get point estimates. An example question may be
“what will the price of gold be 6 months from now?”
When selecting the model for the multiple linear regression analysis, another important
consideration is the model fit. Adding independent variables to a multiple linear regression
model will always increase the amount of explained variance in the dependent variable
(typically expressed as R²). Therefore, adding too many independent variables without any
theoretical justification may result in an over-fit model.
Discriminant analysis
To calculate the similarity between each pair of respondents the students need to calculate the
distance between every pair using the Euclidean distance formula. As an illustration let us
calculate the distance between consumer 1 & 2:
C A S E 0 5 10 15 20 25
Label Num +---------+---------+---------+---------+---------+
2
4
1
3
5
However, the answer cannot be concrete as
1. The size of the sample is too small
2. Secondly, there is no secondary data about the market size and structure of the two
possibilities.
3. Thirdly, there must be demographic data available about the sample to take a decision.
REPORT WRITING
Introduction
Research Methodology
Sample section
Measurement section
Results
Formatting
Title Page
Sample (1 page)
References
Appendices
Preliminary Section
I. Title Page
II. Letter of transmittal
III. Contents
IV. Acknowledgements
V. Abstract (on a separate single page)
Background Section
i. Problem statement
ii. Objectives
iii. Review of literature
i. Research design
Finding Sections
I. Results
II. Interpretation of result
References
Appendices
Glossary
Bibliography
SHORT QUESTIONS
Ans- Solution or action oriented research, that is contextual and practical in approach.
Ans- Exploratory research is loosely structured and the basic premise is to provide
direction to subsequent, more structured method of enquiry.
2. MUST have: a logical and explicitly stated justification for the selected methods
Descriptive research designs are more structured and formal in nature. As the name
implies the objective of these studies is to provide a comprehensive and detailed
explanation of the phenomena under study
Primary data is original, problem or project specific and collected for the specific
objectives and needs spelt out by the researcher. The authenticity and relevance is
reasonably high.
Secondary data is information that is not topical or research specific and has been
collected and compiled by some other researcher or investigative body. It is recorded
and published in a structured format.
1. Resource advantage
2. Accessibility of data
4. Assessment of data
Unit of analysis
Independent variable
Dependent variable
Intervening variables
Q.14.What is a hypothesis?
1. Concomitant variation
Test units: Test units are those entities on which treatments are applied.
In quasi-experimental design, the researcher can control when measurements are taken
and on whom they are taken. However, this design lacks complete control of
scheduling of treatment and also lacks the ability to randomize test units’ exposure to
treatments.
Population refers to any group of people or objects that form the subject of study in a
particular survey and are similar in one or more ways.
Sampling frame comprises all the elements of a population with proper identification
that is available to us for selection at any stage of sampling.
Q.25What is a Sample?
Q.27What is Sampling?
2. A decision-maker may not have too much of time to wait till all the information
is available.
Sampling error: This error arises when a sample is not representative of the population.
3. Systematic sampling
5. Cluster sampling
1. Convenience sampling
2. Judgmental sampling
3. Snowball sampling
4. Quota sampling
The sampling process in which the 1st respondent is chosen by any of the probability
sampling and then next respondents are chosen by the references of the 1st respondent.
The term ‘measurement’ means assigning numbers or some other symbols to the
characteristics of certain objects. When numbers are used, the researcher must have a
rule for assigning a number to an observation in a way that provides an accurate
description.