Sie sind auf Seite 1von 30

INTRODUCTION TO

HYPOTHESIS TESTING
AND DATA ANALYSIS
Prof. Yogesh Funde

Types of Analysis
2

Univariate. (involving one variable)

Bivariate (involving two variables.)

Measures of central tendency. (Mean, median and


mode)
Measures of dispersion (Mean deviation and
standard deviation)
Measures of association ( e.g. Correlation,
Regression)

Multivariate. (involving many variables)

Factor analysis
Discriminant Analysis

The Basics
3

Measures of Association

Refers to a number of bivariate statistical


techniques used to measure the strength of a
relationship between two variables.
The chi-square (2) test provides information about
whether two or more less-than interval variables
are interrelated.
Correlation analysis is most appropriate for interval
or ratio variables.
Regression can accommodate either less-than
interval independent variables, but the dependent
variable must be continuous.

EXHIBIT 23.1

Bivariate Analysis
Common Procedures for
Testing Association

Simple Correlation Coefficient (continued)


5

Correlation coefficient

A statistical measure of the covariation, or


association, between two at-least interval
variables.

Covariance

Extent to whichntwo variables are


associated systematically
Y each other.
X i X Yi with
rxy ryx

i 1

Xi X Yi Y
n

i 1

i 1

Simple Correlation
Coefficient

Correlation coefficient (r)

Ranges from +1 to -1
Perfect

positive linear relationship = +1


Perfect negative (inverse) linear relationship = -1
No correlation = 0

Correlation coefficient for two variables (X,Y)

Scatter Diagram to Illustrate Correlation Patterns

Correlation, Covariance, and Causation


8

When two variables covary, they display


concomitant variation.
This systematic covariation does not in
and of itself establish causality.
Roosters crow and the rising of the sun

Rooster does not cause the sun to rise.

Correlation Analysis of

Number of Hours
Worked
in Manufacturing
Industries
with Unemployment
Rate

Coefficient of Determination
10

Coefficient of Determination (R2)

A measure obtained by squaring the


correlation coefficient; the proportion of the
total variance of a variable accounted for
by another value of another variable.
Measures that part of the total variance of
Y that is accounted for by knowing the
value of X.
Explained variance
2

Total Variance

Correlation Matrix
11

Correlation matrix

The standard form for reporting correlation


coefficients for more than two variables.

Correlation and the Significance of the


Correlation

The procedure for determining statistical


significance is the t-test of the significance
of a correlation coefficient.

EXHIBIT 23.4

Pearson Product-Moment Correlation Matrix for Salesperson


Examplea

12

Numbers below the diagonal are for the sample;


those above the diagonal are omitted.
b
p< .001.
c
p< .01.
p< .05.
a

Regression Line and Slope


13

Y
130
120
110
100

Y a X

90

80

Y
X

80

90

100

110

120

130

140

150

160

170

Regression Analysis
14

Simple (Bivariate) Linear Regression

A measure of linear association that investigates


straight-line relationships between a continuous
dependent variable and an independent variable
that is usually continuous, but can be a categorical
dummy variable.

The Regression Equation (Y = + X )

Y = the continuous dependent variable


X = the independent variable
= the Y intercept (regression line intercepts Y axis)
= the slope of the coefficient (rise over run)

EXHIBIT 23.5

15

The Advantage of Standardized Regression Weights

The Regression Equation


16

Parameter Estimate Choices


1. Standardized Regression Coefficient ()

is indicative of the strength and direction of the


relationship between the independent and dependent
variable.
(Y intercept) is a fixed point that is considered a constant
(how much Y can exist without X)
Expressed on a standardized scale where higher absolute
values indicate stronger relationships (range is from -1 to 1)
Standardized regression estimates have the advantage of a
constant scale.
Standardized regression estimates should be used when
the researcher is testing explanatory hypotheses.

17

The Regression Equation


(contd)
2. Raw regression estimates (b1)
Raw

regression weights have the advantage of


retaining the scale metricwhich is also their
key disadvantage.
If the purpose of the regression analysis is
forecasting, then raw parameter estimates
must be used.
This is another way of saying when the
researcher is interested only in prediction.

EXHIBIT 23.6

18

Relationship of Sales Potential to Building Permits Issued

EXHIBIT 23.7

19

The Best Fit Line or Knocking Out the Pins

Scatter Diagram of Explained and


Unexplained Variation
20

130
120

Deviation not explained


Total Deviation

110

Deviation explained
by the regression

100

90
80

80

90

100

110

120

130

140

150

160

170

180

190
X

Ordinary Least-Squares Method of Regression


Analysis (OLS)
21

OLS

Guarantees that the resulting straight line


will produce the least possible total error in
using X to predict Y.
Generates a straight line that minimizes
the sum of squared deviations of the actual
values from this predicted regression line.

Ordinary Least-Squares Method of


Regression Analysis (OLS) (contd)
22

The Logic behind the Least-Squares


Technique

No straight line can completely represent


every dot in the scatter diagram.

There will be a discrepancy between most


of the actual scores (each dot) and the
predicted score .

Uses the criterion of attempting to make


the least amount of total error in prediction
of Y from X

Ordinary Least-Squares Method of Regression


Analysis (OLS) (contd)
23

Ordinary Least-Squares Method of Regression


Analysis (OLS) (contd)
24

The equation means that the predicted value for any value
of X (Xi) is determined as a function of the estimated slope
coefficient, plus the estimated intercept coefficient + some
error.

25

Ordinary Least-Squares Method of


Regression Analysis (OLS) (contd)

Statistical Significance Of Regression Model

ANOVA Table:

2325

2007 Thomson/SouthWestern. All rights


reserved.

26

Ordinary Least-Squares Method of


Regression Analysis (OLS) (contd)

R2

The proportion of variance in Y that is explained


by X (or vice versa)
A measure obtained by squaring the correlation
coefficient; that proportion of the total variance
of a variable that is accounted for by knowing
the value of another variable.

3,398.49
R
.875
3,882.40
2

Ordinary Least-Squares Method of


Regression Analysis (OLS) (contd)
27

Simple Regression and Hypothesis Testing

The explanatory power of regression lies in


hypothesis testing. Regression is often used to test
relational hypotheses.

The outcome of the hypothesis test involves two


conditions that must both be satisfied:
The

regression weight must be in the hypothesized


direction. Positive relationships require a positive
coefficient and negative relationships require a negative
coefficient.

The

t-test associated with the regression weight must be


significant. Compare each t-value with critical t-value to
test the significant.

EXHIBIT 23.8

28

Simple Regression Results for Building Permit Example

Regression Output
29

Interpret the overall significance of the model.

The output will include the model F and a


significance value. When the model F is significant
(low p-value less than 0.05), the independent variable
explains a significant portion of the variation in the
dependent variable.
The coefficient of determination or R2 can be
interpreted. As mentioned earlier, this is the
percentage of total variation in the dependent variable
accounted for by the independent variable. Another
way to think of this is as the extent to which the
variances of the independent and dependent variable
overlap.

Regression Output
30

The individual parameter coefficient is interpreted.


The t-value associated with the slope coefficient can be interpreted. In
this case, the t of 9.555 is associated with a very low p-value (0.000 to 3
decimal places). Therefore, the slope coefficient is significant. For simple
regression, the p-value for the model F and for the t-test of the individual
regression weight will be the same.
A t-test for the intercept term (constant) is also provided. However, this is
seldom of interest since the explanatory power rests in the slope
coefficient.
If a need to forecast sales exists, the estimated regression equation is
needed. Using the raw coefficients, the estimated regression line is

The regression coefficient (slope) indicates that for every building permit
issued, sales increase 0.546. Moreover, the standardized regression
coefficient of 0.936 would allow the researcher to compare the
explanatory power of building permits versus some other potential
independent variable. For simple regression,
equals r.

Das könnte Ihnen auch gefallen