Sie sind auf Seite 1von 54

RESEARCH METHODOLOGY

MODULE 3
Data Collection, Analysis and
Interpretation

Assoc Prof Dr. Maran Marimuthu


11 May 2019
Learning Objectives

Know of the methods and how statistical techniques are


connected
• Methods of collecting data
•Know the various sampling schemes and data-
collection methods. Know the advantages and
disadvantages of each method.
•Learn how to deal with data using appropriate
statistical methods.
• Know the parametric & non-parametric testing
•Modelling- Learn how to analyze the data and
interpret the results.
Research Defined

Research is defined as the


systematic and objective process of
generating information for aid in
making decisions.
I don’t know
if we
should
Information buy a new product?

Reduces

Uncertainty
Three Basic Types of Research
No one ever study

 Exploratory research is done via case


studies and learning the names of
variables.
Ppl do ady but need Get detail why went wrong bla detail
 Descriptive research is done via surveys
and learning how variables are distributed.

 Explanatory research is done via


experiments and learning how variables
Cause study
can be explained.
Well studied: but wan find relation/
explanatory var: independent var (sales man/quality)/ predictors
Determining When to Conduct Research
Availability of Data Benefits
Time Constraints Nature of the Decision vs. Costs
Is the infor- Does the value
Is sufficient time Is the decision
Yes mation already Yes Yes of the research Yes
available before
on hand
of considerable
information
Conducting
a managerial strategic
decision
inadequate
or tactical
exceed the cost Research
for making of conducting
must be made? importance?
the decision? research?

No No No No

Do Not Conduct Research


Problem Discovery Problem Selection of
and Definition discovery exploratory research
technique
Sampling

Selection of
exploratory research
technique Probability Nonprobability

Secondary
Experience Pilot Case Collection of
(historical) Data
survey study study data
data Gathering
(fieldwork)

Data
Editing and
Problem definition Processing
coding
(statement of and
data
research objectives) Analysis

Data
processing
Selection of
Research Design basic research
method Conclusions
Interpretation
and Report
of
findings
Experiment Survey
Secondary
Observation
Laborator Field Interview Questionnair Data Study
Report
y e
Types of Data and Measurement Scales

Data

Non-metric Metric
or or
Qualitative Quantitative

Nominal Ordinal Interval Ratio


Scale Scale Scale Scale

8
Types of Variables

Male/Female
Dichotomous Engineering/non-engineering

Engineering background
Discrete Educational level

Production Units
Continuous Costs

3-9
Description of HBAT Primary Database Variables

Variable Description Variable Type


Data Warehouse Classification Variables
X1 Customer Type nonmetric
X2 Industry Type nonmetric
X3 Firm Size nonmetric
X4 Region nonmetric
X5 Distribution System nonmetric
Performance Perceptions Variables
X6 Product Quality metric
X7 E-Commerce Activities/Website metric
X8 Technical Support metric
X9 Complaint Resolution metric
X10 Advertising metric
X11 Product Line metric
X12 Salesforce Image metric
X13 Competitive Pricing metric
X14 Warranty & Claims metric
X15 New Products metric
X16 Ordering & Billing metric
X17 Price Flexibility metric
X18 Delivery Speed metric
Outcome/Relationship Measures
X19 Satisfaction metric
X20 Likelihood of Recommendation metric
X21 Likelihood of Future Purchase metric
X22 Current Purchase/Usage Level metric 10
X23 Consider Strategic Alliance/Partnership in Future nonmetric
Selecting a Sample

Sample: subset SAMPLE


of a larger population.

POPULATION
Sampling

 Who is to be sampled?
 How large a sample?
 How will sample units be selected?
Two Major Categories of Sampling

 Probability sampling
• Known, nonzero probability for every
element
 Nonprobability sampling
• Probability of selecting any particular
member is unknown
Nonprobability Sampling

 Convenience
 Judgment
 Quota
 Snowball
Probability Sampling

 Simple random sample


 Systematic sample
 Stratified sample
 Cluster sample
Stages in the
Define the target population
Selection
of a Sample Select a sampling frame

Determine if a probability or nonprobability


sampling method will be chosen

Plan procedure
for selecting sampling units

Determine sample size

Select actual sampling units

Conduct fieldwork
Research Design

 Master plan
 Framework for action
 Specifies methods and procedures
Basic Research Design- Data Collection

 Surveys
 Experiments
 Secondary data
 Observation
The Major Decisions in Questionnaire
Design

1. What should be asked?


2. How should each question be
phrased?
3. In what sequence should the
questions be arranged?
4. What questionnaire layout will best
serve the research objectives?
5. How should the questionnaire be
pretested? Does the questionnaire
What Should Be Asked?

 Questionnaire relevance
 Questionnaire accuracy
Phrasing Questions

 Open-ended questions
 Fixed-alternative questions
IMPORTANT!

 Research Question-----
Research Objective-----
Research Hypothesis----!
 Univariate- dispersion, fluctuation
within the variable.
 Bivariate – difference, relationship,
association, causal
 Multivariate- relationship,
association, causal, modeling
illustration

uni
 RQ1- Is the O&G Industry stable in
the past 5 years? RO? RH?
 RQ2- Is there a difference between
bi
energy sector and finance sector
with regard to finance performance?
RO? RH?
multi  RQ3- what are the factors that affect
the performance of energy sector?
RO? RH?
ANALYSIS
Analysis of Quantitative Data-Dealing
with Data
 Dealing with Data: Coding, Entering, and
Cleaning
 Results with One Variable-UNIVARIATE
 Results with Two Variables- BIVARIATE
 Results with More than Two- MULTIVARIATE
 Relevant for Inferential Statistics
Dealing with Data

 Coding is systematically reorganizing raw


data into a format this is easily entered into
a computer or is machine-readable.
 Entering data can vary, but in most
computer databases, each row is a case,
participant or subject, and each column is
a variable. An entire row or set of
contiguous rows is a record for a single
case.
 Cleaning data is the term for checking the
accuracy of coding and data entry.
Results with One Variable

 Descriptive Statistics describe numerical


data one variable at a time (univariate), two
variables at a time (bivariate), or more than
two (multivariate).
 Frequency Distributions summarize
information including counts and
percentages, and cumulative counts and
percentages for nominal, ordinal, interval,
or ratio measurements.
 Graphic representations include the
histogram, bar chart, and pie chart.
Measures of Central Tendency

 Mode is the most common or frequently


occurring number.
 Median is the middle point or 50th
percentile used with ordinal, interval or
ratio data.
 Mean is the arithmetic average used with
interval or ratio level data (but it is very
sensitive to extreme values).
Measures of Variation

 In general, variation is defined as the


spread, dispersion, or variability around
the center of the distribution.
 Range is the distance between smallest
and largest scores; e.g. wastage might
vary from a range of 5- 50 units
 Percentiles are scores at a specific place
within the distribution: a 25th percentile
might indicate that 25% of wastage below
5 units.
Measures of Variation continued…

 Standard deviation provides an


average distance of each score from
the mean, requiring interval or ratio
level data.
 Z score is a standardized score, and
it represents the number of standard
deviations of a particular score
above or below the mean.
Example-dataset

 Are the cars with additive running


more efficiently?
 Ho: mileage = 10.5
H1: mileage x 10.5
Frequency Distribution: Variable X6 – Product Quality
X6 - Product Quality

Cum ulative
Frequency Percent Vali d Percent Percent
Vali d 5.0 1 1.0 1.0 1.0
5.1 1 1.0 1.0 2.0
5.2 1 1.0 1.0 3.0
5.5 2 2.0 2.0 5.0
5.6 1 1.0 1.0 6.0
5.7 4 4.0 4.0 10.0
5.8 1 1.0 1.0 11.0
5.9 2 2.0 2.0 13.0
6.0 1 1.0 1.0 14.0
6.1 2 2.0 2.0 16.0
6.2 1 1.0 1.0 17.0
6.3 1 1.0 1.0 18.0
6.4 5 5.0 5.0 23.0
6.5 2 2.0 2.0 25.0
6.6 1 1.0 1.0 26.0
6.7 4 4.0 4.0 30.0
6.9 3 3.0 3.0 33.0
7.0 1 1.0 1.0 34.0
7.1 2 2.0 2.0 36.0
2-33
7.4 2 2.0 2.0 38.0
Histograms and The Normal Curve
This is the distribution for
X19 - Satisfaction HBAT database variable
30
X19 – Satisfaction.

20

10

Std. Dev = 1.19


Mean = 6.92
0 N = 100.00
4.50 5.50 6.50 7.50 8.50 9.50
5.00 6.00 7.00 8.00 9.00 10.00

X19 - Satisfaction

2-34
Results with Two Variables

 Bivariate statistics indicate


relationships between two variables
that may exist due to covariation or
independence.
 Covariation is when two variables go
together or are associated
statistically.
 Independence means that there is no
association between two variables, it
is the opposite of covariation.
Results with Two Variables continued…

 Scattergram is a graph on which a


social researcher plots each case or
observation; each axis represents
the value of one variable, and can be
used for variables that are measured
at the interval or ratio level.
What can be learned from a scattergram?

 Form - relationships can take three forms:


independence (no relationship), linear
(forming a straight line), or curvilinear
(forming either a ‘u’ or an ‘s’ curve).
 Direction - can be one of two values:
either positive, higher values on one
variable go with higher values on the
other; or negative, higher values on one
variable go with lower values on the other.
HBAT Scatterplot: Variables X19 and X6
10

4
4 5 6 7 8 9 10 11

X6 - Product Quality
2-38
Bivariate Table

 A bivariate table presents the same


information as a scattergram but in a more
condensed fashion.
 A table is ordinarily based on a cross
tabulation of two variables at the same
time.
 A contingency table is formed by cross
tabulating two or more variables. It is
contingent because cases in each
category of a variable get distributed
according to their co-occurrence with
Reading a Percentage Table

 First look at the title, variable names, and


any background information.
 Next, look at the direction in which
percentages have been computed, in
rows or columns.
 Now look at the comparisons relevant to
the cross tabulation. Comparisons are
made in the opposite direction from that
in which percentages are computed.
Compare across if the table is
percentaged down, compare down if
percentaged across.
Measures of Association

 A measure of association is a single


number that expresses the strength, and
often the direction, of a relationship
between two or more variables.
 Measures of association are lambda,
gamma, tau, chi (squared), and rho.
 If there is a strong association it means
that there is a definite pattern in predicting
scores on the dependent variable from
variations in the independent variable.
Measures of Association continued…

 If there is a weak association it means that


there is not much of a pattern between
scores on the dependent variable
compared to variations in the independent
variable.
 Measures of association normally range
from 0.0 to +1.0, or from –1.0 to 0.0 to +
1.0. In either case, the closer the
association is to 1.0 (+ or -), the stronger
the relationship is, and the closer to 0.0,
the weaker the association.
Example

 IQ-----PRODUCTION!
 Scattergram
 Cross-tab
 correlation
Correlation Matrix for Store Image Elements

V1 V2 V3 V4 V5 V6 V7 V8 V9
V1 Price Level 1.00
V2 Store Personnel .427 1.00
V3 Return Policy .302 .771 1.00
V4 Product Availability .470 .497 .427 1.00
V5 Product Quality .765 .406 .307 .472 1.00
V6 Assortment Depth .281 .445 .423 .713 .325 1.00
V7 Assortment Width .354 .490 .471 .719 .378 .724 1.00
V8 In-Store Service .242 .719 .733 .428 .240 .311 .435 1.00
V9 Store Atmosphere .372 .737 .774 .479 .326 .429 .466 .710 1.00

3-44
Correlation Matrix of Variables After
Grouping Using Factor Analysis

V3 V8 V9 V2 V6 V7 V4 V1 V5
V3 Return Policy 1.00
V8 In-store Service .733 1.00
V9 Store Atmosphere .774 .710 1.00
V2 Store Personnel .741 .719 .787 1.00
V6 Assortment Depth .423 .311 .429 .445 1.00
V7 Assortment Width .471 .435 .468 .490 .724 1.00
V4 Product Availability .427 .428 .479 .497 .713 .719 1.00
V1 Price Level .302 .242 .372 .427 .281 .354 .470 1. 00
V5 Product Quality .307 .240 .326 .406 .325 .378 .472 .765 1.00

Shaded areas represent variables likely to be grouped together by factor analysis.

3-45
Results with More than two Variables: The
Elaboration Model

 To test for whether an alternative


explanation accounts for a relationship
found in bivariate analysis, social
researchers sometimes attempt to rule out
another variable.
 A trivariate table is built from a bivariate
table on the independent and dependent
variable for each category of a third, or
control, variable.
Multiple Regression Analysis

 Multiple regression is a statistical


technique for interval or ratio level
analysis that accounts for multiple
independent variables and their combined
influence on one dependent variable.

 Multiple regression can also be used to


test for the effects of one or more control
variables.
Regression
SUMMARY OUTPUT The global F
test statistic
Regression Statistics for the test of
Multiple R 0.848584 Coefficient of correlation H0: b1 = 0
R Square 0.72009481 Coefficient of determination
Adjusted R Square 0.67344395
Standard Error 91.4789339
Observations 8

ANOVA
df SS MS F Significance F
Note that:
Regression 1 129173.1279 129173.128 15.43583 0.00772299 (1) both t and F
Residual 6 50210.37209 8368.39535 have the same
Total 7 179383.5 p-value, and
(2) t2 = F.
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 44.3139535 108.5086985 0.40839079 0.69716178 -221.197461 309.825368
Years 38.755814 9.864427133 3.92884589 0.00772299 14.6184126 62.8932153

The calculated t for the test of H0: b1 = 0


t-Test
Excel output
t-Test: Two-Sample Assuming Unequal Variances

Japan United Kingdom


Mean 1953 1783 Note that the
Variance 378300.7 142561.9 degrees of
Observations 31 34 freedom for this t-
Hypothesized Mean Difference
0 test is 49, not 63.
df 49
t Stat 1.327629 The test statistic
P(T<=t) one-tail 0.095226 The p-value, one
t Critical one-tail 1.676551 tail
P(T<=t) two-tail 0.190453 The critical bound,
t Critical two-tail 2.009574 one tail

© 2008 Thomson South-Western


What does Statistics Mean?
Interpretation!

 Descriptive statistics
• Number of people
• Trends in employment
• Data
 Inferential statistics
• Make an inference about a population
from a sample
Stem & Leaf Diagram – HBAT Variable X6
Each stem is shown by the
numbers, and each number is a
X6 - Product Quality
leaf. This stem has 10 leaves.
Stem-and-Leaf Plot

Frequency Stem & Leaf The length of the stem, indicated by the
number of leaves, shows the frequency
3.00 5. 012
10.00 5. 5567777899
distribution. For this stem, the
10.00 6. 0112344444 frequency is 14.
10.00 6. 5567777999
5.00 7. 01144
This table shows the distribution of X6 with a stem and
11.00 7. 55666777899
leaf diagram (Figure 2.2). The first category is from 5.0 to
9.00 8. 000122234
5.5, thus the stem is 5.0. There are three observations with
14.00 8. 55556667777778
values in this range (5.0, 5.1 and 5.2). This is shown as
18.00 9. 001111222333333444
three leaves of 0, 1 and 2. These are also the three lowest
8.00 9. 56699999
values for X6. In the next stem, the stem value is again 5.0
2.00 10 . 00
and there are ten observations, ranging from 5.5 to 5.9.
These correspond to the leaves of 5.5 to 5. 9. At the other
Stem width: 1.0
Each leaf: 1 case(s) end of the figure, the stem is 10.0. It is associated with two
leaves (0 and 0), representing two values of10.0, the two
highest values for X6.
2-51
HBAT Diagnostics: Box & Whiskers Plots
Outlier = #13 Group 2 has substantially more
11 dispersion than the other groups.

10
13

6 Median

4
N= 32 35 33

Less than 1 year 1 to 5 years Over 5 years

X1 - Customer Type

2-52
One-Way ANOVA - An Example
Compare calculated values to those in the Excel output:
Anova: Single Factor

SUMMARY
Groups Count Sum Average Variance
Alone 10 637 63.7 87.56666667
WithPass 12 683 56.91666667 63.53787879

ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 250.9833333 1 250.9833333 3.37566268 0.081071382 4.351250027
Within Groups 1487.016667 20 74.35083333

Total 1738 21

The test statistic The p-value The critical bound

© 2008 Thomson South-Western


Q&A
Thank you!

Das könnte Ihnen auch gefallen