Sekaran 7

Mgt 540 Research Methods
Data Analysis
Additional sources
Compilation
of sources:
http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/Random/Order/Start.htm
Data
Analysis Brief Book (glossary) Data Analysis
http://rkb.home.cern.ch/rkb/titleA.html
Exploratory

http://www.drtomoconnor.com/3760/3760lect07.htm http://www.itl.nist.gov/div898/handbook/eda/eda.htm
Statistical

Data Analysis
http://itl.nist.gov/div898/handbook/eda/eda.htm http://home.ubalt.edu/ntsbarsh/stat-data/topics.htm
Using
Excel for Data Analysis

2
http://office.microsoft.com/en-us/excel-help/about-statisticalanalysis-tools-HP005203873.aspx http://people.umass.edu/evagold/excel.html (using Excel)
3
Copyright 2003 John Wiley & Sons, Inc. Sekaran/RESEARCH 4E
FIGURE 12.1
Data Analysis
Get the feel for the data

Get
Mean, variance' and standard deviation on each variable See if for all items, responses range all over the scale, and not restricted to one end of the scale alone. Obtain Pearson Correlation among the variables under study. Get Frequency Distribution for all the variables. Tabulate your data. Describe your sample's key characteristics (Demographic details of sex composition, education, age, length of service, etc. ) See Histograms, Frequency Polygons, etc. 4
Quantitative Data
Each
type of data requires different analysis method(s):

Nominal
Labeling
No
Categorization
inherent value basis purposes only
Ordinal
Ranking,
sequence basis (e.g. age)
Interval
Relationship
Descriptive Statistics
Describing key features of data
Central
Mean,
Tendency
median mode standard deviation, range
Spread
Variance,
Distribution
Skewness,
(Shape )
kurtosis
Descriptive Statistics
Describing key features of data
Nominal
Identification
/ categorization only statistics
Ordinal (Example on pg. 139)

Non-parametric
Do
Frequency Averages
not assume equal intervals

counts (median and mode)
Interval
Parametric
Mean,
Standard Deviation, variance
Testing Goodness of Fit

Split Half
Reliability
Consistency / stability
Cronbachs alpha
Internal Consistency Convergent
Validity
Criterion, convergent, discriminant
Discriminant Factorial
8
Involves Correlations and Factor Analysis
Testing Hypotheses
Use
appropriate statistical analysis

T-test (single or twin-tailed)
Test
the significance of differences of the mean of two groups the significance of differences among the means of more than two different groups, using the F test. the variance explained in the DV by the variance in the Ivs
catterp.htm
9
ANOVA
Test
Regression (simple or multiple)

Establish
http://itl.nist.gov/div898/handbook/eda/section3/s
Statistical Power
Claiming
Errors
Type
a significant difference
the null hypothesis when you should not. an alpha error
in Methodology
1 error 2 error
Reject
Called
Type
Fail
to reject the null hypothesis when you should. Called a beta error
Statistical
avoiding
power refers to the ability to detect true differences

type 2 errors
10
Statistical Power
see discussion at http://my.execpc.com/4A/B7/helberg/pitfalls/
Depends
Sample
on 4 issues
size The effect size you want to detect The alpha (type 1 error rate) you specify The variability of the sample
Too Too
little power
effect
Overlook Any
much power
difference is significant
11
Parametric vs. nonparametric

Parametric (characteristics referring to
specific population parameters)

Parametric
assumptions
Independent
samples of variance Data normally distributed Interval or better scale

Homogeneity
Nonparametric
Sometimes
assumptions
independence of
samples
12
t-tests
(Look at t tables; p. 435)
Used to compare two means or one observed mean against a guess about a hypothesized mean
For
large samples t and z can be considered equivalent
Calculate
Where S is the standard error of the mean, S/n and df = n-1

http://www.socialresearchmethods.net/kb/stat_t.php
13
t-tests
Statistical
programs will give you a choice between a matched pair and an independent t-test.
Your
sample and research design determine which you will use.
14
z-test for Proportions

(Look at t tables; p. 435)
Correspondence
of sample mean to population mean When data are nominal

Describe
by counting occurrences of each value From counts, calculate proportions Compare proportion of occurrence in sample to proportion of occurrence in population
Hypotheses
testing allows only one of two outcomes: success or failure

15
z-test for Proportions

(Look at t tables; p. 435) Comparing sample proportion to the population proportion
H0: H1:
= k, where k is a value k
between 0 and 1
z=p- = p- p ((1- )/n)

Equivalent
to 2 for df = 1
16
http://www.tutor-homework.com/statistics_tables/statistics_tables.html#normal
Chi-Square Test(sampling distribution) One Sample

http://ncalculators.com/math-worksheets/how-to-calculate-t-test.htm
Used for categorical data Measures sample variance

Squared
deviations from the mean based on normal distribution
Nonparametric Compare expected with observed proportion

H0:
Observed proportion = expected proportion df = number of data points

categories,
cells (k) minus 1
= (Oi E)
Ei
2
17
Univariate z Test
Test
a guess about a proportion against an observed sample;

eg.,
MBAs constitute 35% of the managerial population
H0: H1:
= .35(two-tailed test suggested) = or >.35
18
Univariate Tests
Some
univariate tests are different in that they are among statistical procedures where you, the researcher, set the null hypothesis. In many other statistical tests the null hypothesis is implied by the test itself.
19
Contingency Tables
Relationship between nominal variables
http://www.psychstat.smsu.edu/introbook/sbk28m.htm
Relationship between subjects' scores on two qualitative or categorical variables (Early

childhood intervention)
If the columns are not contingent on the rows, then the rows and column frequencies are independent. The test of whether the columns are contingent on the rows is called the chi square test of independence. The null hypothesis is that there is no relationship between row and column frequencies.
20
Correlations
A
statistical summary of the degree and direction of association between two variables Correlation itself does not distinguish between independent and dependent variables Most common Pearsons r
21
Correlations
You
believe that a linear relationship exists between two variables The range is from 1 to +1 R2, the coefficient of determination, is the % of variance explained in each variable by the other
22
Correlations
r = Sxy/SxSy or the covariance between x and y divided by their standard deviations Calculations needed
The
means, x-bar and y-bar from the means, (x x-bar) and (y y-bar) for each case The squares of the deviations from the means for each case to insure positive distance measures when added, (x - xbar)2 and (y y-bar)2 The cross product for each case (x xbar) times (y y-bar)
Deviations
23
Correlations
The
null hypothesis for correlations is H0: = 0 and the alternative is usually H1: 0 However, if you can justify it prior to analyzing the data you might also use H1: > 0 or H1: < 0 , a one-tailed test
24
Correlations
Alternative
Spearman
rranks
measures
rank correlation, rranks
and r are nearly always equivalent measures for the same data (even when not the differences are trivial)
Phi
coefficient, r, when both variables are dichotomous; again, it is equivalent to Pearsons r
25
Correlations
Alternative
measures
Point-biserial, rpb when
correlating a dichotomous with a continuous variable
If
a scatterplot shows a curvilinear relationship there are two options:

A Use
data transformation, or the correlation ratio, 2 (etasquared) SSwithin 1SStotal

26
ANOVA
For
two groups only the t-test and ANOVA yield the same results You must do paired comparisons when working with three or more groups to know where the means lie
27
Multivariate Techniques
Dependent
variable
in its various forms analysis
Regression MANOVA
Discriminant
Classificatory
Cluster
or data reduction
analysis Factor analysis Multidimensional scaling
28
Linear Regression
We
would like to be able to predict y from x linear regression with raw
Simple
scores
y x
sy = dependent variable = independent variable sx b = regression coefficient = rxy c = a constant term
The
general model is y = bx + c (+e)

29
Linear Regression
The statistic for assessing the overall fit of a regression model is the R2 , or the overall % of variance explained by the model R2 = 1 unpredictable variance
total variance
predictable variance total variance
= 1 (s2e / s2y), where s2e is the variance of the error or residual

30
10
Linear Regression
Multiple regression: more than one predictor

y
= b1x1 + b2x2 + c
Each regression coefficient b is assessed independently for its statistical significance; H0: b = 0 So, in a statistical programs output a statistically significant b rejects the notion that the variable associated with b contributes nothing to predicting y
31
Linear Regression
Multiple regression
still tells us the amount of variation in y explained by all of the predictors (x) together The F-statistic tells us whether the model as a whole is statistically significant Several other types of regression models are available for data that do not meet the assumptions needed for least-squares models (such as logistic regression for dichotomous dependent variables)
32
R2
Regression by SPSS & other Programs
Methods for developing the model

Stepwise: lets computer try to fit all chosen
variables, leaving out those not significant and reexamining variables in the model at each step Enter: researcher specifies that all variables will be used in the model Forward, backward: begin with all (backward) or none (forward) of the variables and automatically adds or removes variables without reconsideration of variables already in the model
33
11
Multicollinearity
Best
regression model has uncorrelated IVs Model stability low with excessively correlated IVs Collinearity diagnostics identify problems, suggesting variables to be dropped High tolerance, low variance inflation factor are desirable
34
Discriminant Analysis
Regression
requires DV to be interval or ratio If DV categorical (nominal) can use discriminant analysis IVs should be interval or ratio scaled Key result is number of cases classified correctly
35
MANOVA
Compare
means on two or more

limited to one DV)
DVs
(ANOVA
Pure
MANOVA via SPSS only from command syntax Can use the general linear model though
36
12
Factor Analysis
A data reduction technique a large set of variables can be reduced to a smaller set while retaining the information from the original data set Data must be on an interval or ratio scale E.g., a variable called socioeconomic status might be constructed from variables such as household income, educational attainment of the head of household, and average per capita income of the census block in which the person resides
37
Cluster Analysis
Cluster analysis seeks to group cases rather than variables; it too is a data reduction technique Data must be on an interval or ratio scale E.g., a marketing group might want to classify people into psychographic profiles regarding their tendencies to try or adopt new products pioneers or early adopters, early majority, late majority, laggards
38
Factor vs. Cluster Analysis

Factor
analysis focuses on creating linear composites of variables

Number
of variables with which we must work is then reduced Technique begins with a correlation matrix to seed the process
Cluster
analysis focuses on
cases
39
13
Potential Biases
Asking the inappropriate or wrong research questions. Insufficient literature survey and hence inadequate theoretical model. Measurement problems Samples not being representative. Problems with data collection:
researcher biases respondent biases instrument biases
Data analysis biases:

coding errors data punching & input errors inappropriate statistical analysis
Biases (subjectivity) in interpretation of results.
40
Questions to ask:
Adopted from Robert Niles
Where did the data come from? How (Who) was the data reviewed, verified, or substantiated? How were the data collected? How is the data presented?

What Be
is the context?
Cherry-picking?
skeptical when dealing with comparisons

Spurious
correlations
41
Copyright 2003 John Wiley & Sons, Inc. Sekaran/RESEARCH 4E
FIGURE 11.2
14

Sekaran 7

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Sekaran 7

Hochgeladen von

Copyright:

Verfügbare Formate

Mgt 540 Research Methods

Analysis Brief Book (glossary) Data Analysis

Excel for Data Analysis

http://office.microsoft.com/en-us/excel-help/about-statisticalanalysis-tools-HP005203873.aspx http://people.umass.edu/evagold/excel.html (using Excel)

Get the feel for the data

type of data requires different analysis method(s):

inherent value basis purposes only

sequence basis (e.g. age)

/ categorization only statistics

Ordinal (Example on pg. 139)

not assume equal intervals

Standard Deviation, variance

Testing Goodness of Fit

Internal Consistency Convergent

Involves Correlations and Factor Analysis

appropriate statistical analysis

Regression (simple or multiple)

power refers to the ability to detect true differences

Parametric vs. nonparametric

specific population parameters)

samples of variance Data normally distributed Interval or better scale

(Look at t tables; p. 435)

large samples t and z can be considered equivalent

Where S is the standard error of the mean, S/n and df = n-1

sample and research design determine which you will use.

z-test for Proportions

of sample mean to population mean When data are nominal

testing allows only one of two outcomes: success or failure

z-test for Proportions

z=p- = p- p ((1- )/n)

Chi-Square Test(sampling distribution) One Sample

Used for categorical data Measures sample variance

deviations from the mean based on normal distribution

Nonparametric Compare expected with observed proportion

Observed proportion = expected proportion df = number of data points

cells (k) minus 1

a guess about a proportion against an observed sample;

MBAs constitute 35% of the managerial population

= .35(two-tailed test suggested) = or >.35

Relationship between subjects' scores on two qualitative or categorical variables (Early

coefficient, r, when both variables are dichotomous; again, it is equivalent to Pearsons r

Point-biserial, rpb when

correlating a dichotomous with a continuous variable

a scatterplot shows a curvilinear relationship there are two options:

data transformation, or the correlation ratio, 2 (etasquared) SSwithin 1SStotal

analysis Factor analysis Multidimensional scaling

would like to be able to predict y from x linear regression with raw

sy = dependent variable = independent variable sx b = regression coefficient = rxy c = a constant term

general model is y = bx + c (+e)

predictable variance total variance

= 1 (s2e / s2y), where s2e is the variance of the error or residual

Multiple regression: more than one predictor

Regression by SPSS & other Programs

Methods for developing the model

means on two or more

Factor vs. Cluster Analysis

analysis focuses on creating linear composites of variables

researcher biases respondent biases instrument biases

Data analysis biases:

Biases (subjectivity) in interpretation of results.

skeptical when dealing with comparisons

Das könnte Ihnen auch gefallen