Sie sind auf Seite 1von 22

# 26/05/12

## Analisis Regresi Linier dan

Logistik
Oleh :
Nurita Andayani

Introduction
Difference between chi-square and regression : chisquare test of independence to determine whether a
statistical relationship existed between two variables. The
chi-square test tell if there is such a relationship, but it does
not tell about what that relationship. But regression and
correlation analyses will show how to determine both the
nature and the strength of a relationship between two
variables
Regression analysis is a body of statistical methods
dealing with the formulation of mathematical models that
depict relationships among variables, and the use of these
modeled relationships for the purpose of prediction and other
statistical inferences.
The word regression was first in its present technical
context by Sir Francis Galton, who analyzed the heights of
sons and the average heights of their parents.

26/05/12

Models

## The independent or controlled variable is also called the predictor variable

and is denoted by x. The effect or response variable is denoted by y.
If the relation between y and x is exactly a straight line, then the variables
are connected by the formula :
y = + x
where indicates the intercept of the line with the y axis and represents
the slope of the line, or the change in y per unit change in x.
y

yi

+ xi

xi

Statistical Model
Yi = + xi + ei, i = 1, , n
Where :
a) x1, x2, ,xn are the set values of the controlled variable x
that the experimenter has selected for the study.
b) e1, e2, ,en are the unknown error components that are
superimposed on the true linear relation. These are
unobservable random variables, which we assume are
independently and normally distributed with a mean of
zero and unknown variance of 2.
c) The parameters and , which together locate the
straight line, are unknown.

26/05/12

Basic Notations

( x x ) x nx
( y y ) y ny
( x x )( y y ) x y nx y

x
S x2

1
n

S y2

S xy

xi ,

1
n

2
i

2
i

i i

Example
Zippy Cola is studying the effect of its
chosen at random were called and asked
how many cans of Zippy Cola they had
bought in the past week and how many
read or seen in the past week.
X (number of ads) 3 7 4
Y( cans purchased) 11 18 9

2
4

0
7

4
6

1
3

2
8

26/05/12

y x

## Least squares regression line :

Least square estimate of :

a y x

S xy

b 2
Sx

n
error is :

SSE S y2 2 S x2

( y x )
i

i 1

Estimators
a)

b)
c)

d)

E ( )
and E ( )

E (s 2 ) 2

and

E (s)

## The distribution of and are normal with means of and

, respectively; the standard deviations are the square roots
of the variances given in b).
s2=SSE/(n-2) is an unbiased estimator of 2. Also, (n-1)s2/2
is distributed as 2 with d,f,=n-2, and it is independent of
and

26/05/12

## e) Replacing 2 in b) with its sample estimate s2 and

considering the square roots of the variances, we
obtain the estimated standard errors of and ;
1 x2

## estimated standard error of s

n S x2
s
estimated standard error of
Sx

s
( )

1 x2
s

n S x2

## Inference Concerning the Slope

H 0 : 0 vs H1 : 0 is based on
S x ( 0 )
t
, d.f. n 2
s
p% confidence interval for :

t(1CI ) / 2

s
Sx

26/05/12

H 0 : 0 vs H1 : 0
( 0 )
t
, d.f. n 2
2
1 x
s
2
n Sx

is based on

1 x2
t(1CI ) / 2 .s
2
n Sx

## Checks on The Straight Line Model

yi
( xi ) ( yi xi )
observed Explainedby
residualor
y value
linear relation deviationfrom linear
relation

S y2
Total
SS of y

2 S x2

SSE

SS explained
residualSS
by linear relation (unexplained)

26/05/12

Source

Sum of Squares

d.f.

Mean Squares

Regression

SSR

MSR=SSR/1

MSR/MSE

Error

SSE

n2

MSE=SSE/(n-2)

Total

SST

n1

## Inference for regression model

H 0 : 0 H1 : 0
Rejection region: (withsignificant level )
R : F F (1,( n 2))

26/05/12

## The coefficient of determination

The sample coefficient of determination is
developed from relationship between two kinds of
variation: variation of Y values in a data set around :
The fitted regression line
Their own mean

R2

SSR
SSE
1
SST
SST

0 R 2 1 or 0 R 2 100%
Perfect fitted
regression line

unfitted
regression
model

## The coefficient correlation

Coefficient correlation ( r ) indicates the direction of
the relationship between the two variables X and Y
If an inverse relationship exist-that is, if Y decreases
as X increases-then r will fall between 0 and -1
If there is a direct relationship (if Y increases as X
increases), then r will be a value within the range 0
and 1

S xy
S x2 .S y2

26/05/12

Exercise
PUSKESMAS PANCORAN MAS ingin mengetahui
hubungan antara usia dengan besarnya tekanan
darah dari pasien. Diambil 10 pasien dan didapatkan
hasilnya sebagai berikut
Usia

38

36

72

42

68

63

Tekanan darah

115

118

160

140

152 149

49

56

60

55

145

147

155

150

## a) Buat model regresinya !

b) Jika usia pasien adalah 40 pediksikan besar tekanan
darahnya !
c) Ujilah model regresi yang telah anda buat !
d) Ujilah apakah parameter =0 dan =0 ?
e) Buat selang kepercayaan 90% untuk dan !
f) Hitung koefisien determinasi dan korelasinya, jelaskan artinya
!

## What is Logistic Regression?

Form of regression that allows the prediction
of discrete variables by a mix of continuous
and discrete predictors.
discriminant function analysis and multiple
regression do but with no distributional
assumptions on the predictors (the
predictors do not have to be normally
distributed, linearly related or have equal
variance in each group)

26/05/12

## Logistic regression is often used because

the relationship between the a discrete
variable and a predictor is non-linear

## Example from the text: the probability of heart disease

changes very little with a ten-point difference among
people with low-blood pressure, but a ten point change
can mean a drastic change in the probability of heart
disease in people with high blood-pressure.

Assumptions

Absence of multicollinearity
No outliers
Independence of errors assumes a
between subjects design. There are
other forms if the design is within
subjects.

10

26/05/12

Background

## Odds like probability. Odds are usually

written as 5 to 1 odds which is equivalent to
1 out of five or .20 probability or 20% chance,
etc.

## The problem with probabilities is that they are

non-linear
Going from .10 to .20 doubles the probability, but
going from .80 to .90 barely increases the
probability.

Background

## Odds ratio the ratio of the odds over 1

the odds. The probability of winning
over the probability of losing. 5 to 1 odds
equates to an odds ratio of .20/.80 = .25.

11

26/05/12

Background

## Logit this is the natural log of an odds

ratio; often called a log odds even though
it really is a log odds ratio. The logit
scale is linear and functions much like a
z-score scale.

Background
LOGITS ARE CONTINOUS, LIKE Z
SCORES
p = 0.50, then logit = 0
p = 0.70, then logit = 0.84
p = 0.30, then logit = -0.84

12

26/05/12

## 1 POSITIVE RESPONSE (Success) P

0 NEGATIVE RESPONSE (failure) Q = (1-P)

## MEAN(Y) = P, observed proportion of

successes
VAR(Y) = PQ, maximized when P = .50,
variance depends on mean (P)
XJ = ANY TYPE OF PREDICTOR
Continuous, Dichotomous, Polytomous

Y | X B0 B1 X1

## and it is assumed that errors are

normally distributed, with mean=0 and
constant variance (i.e., homogeneity of
variance)

13

26/05/12

## Plain old regression

E(Y | X ) B0 B1 X 1
an expected value is a mean, so

(Y ) PY 1 | X

## The predicted value equals the proportion of

observations for which Y|X = 1; P is the
probability of Y = 1(A SUCCESS) given X, and
Q = 1- P (A FAILURE) given X.

## An alternative the ogive function

An ogive function is a curved s-shaped
function and the most common is the
logistic function which looks like:

14

26/05/12

Yi

eu
1 eu

## Where Y-hat is the estimated probability

that the ith case is in a category and u is
the regular linear regression equation:

u A B1 X1 B2 X 2 BK X K

15

26/05/12

## The logistic function

b0 b1 X1

e
i b0 b1X1
1 e
The logistic function

## Change in probability is not constant

(linear) with constant changes in X
This means that the probability of a
success (Y = 1) given the predictor
variable (X) is a non-linear function,
specifically a logistic function

16

26/05/12

## It is not obvious how the regression

coefficients for X are related to changes
in the dependent variable (Y) when the
model is written this way
Change in Y(in probability units)|X
depends on value of X. Look at Sshaped function

## The values in the regression equation b0

and b1 take on slightly different
meanings.

## b0 The regression constant (moves curve

left and right)
b1 <- The regression slope (steepness of
curve)
b The threshold, where probability of
b
success
= .50
0

17

26/05/12

Logistic Function
Constant regression
constant different
slopes
v2: b0 = -4.00
b1 = 0.05 (middle)
v3: b0 = -4.00
b1 = 0.15 (top)
v4: b0 = -4.00
b1 = 0.025 (bottom)

1.0

.8

.6

.4
V4
V1
V3

.2

V1
V2
V1

0.0
30

40

50

60

70

80

90

100

Logistic Function
Constant slopes
with different
regression
constants
v2: b0 = -3.00
b1 = 0.05 (top)
v3: b0 = -4.00
b1 = 0.05 (middle)
v4: b0 = -5.00
b1 = 0.05 (bottom)

1.0
.9
.8
.7
.6
.5
.4
V4
.3

V1

.2

V3
V1

.1

V2
V1

0.0
30

40

50

60

70

80

90

100

18

26/05/12

The Logit

## By algebraic manipulation, the logistic

regression equation can be written in
terms of an odds ratio for success:

P(Y 1| X i )

exp(b0 b1 X1i )
(1 P(Y 1| X i )) (1 )

The Logit

## Odds ratios range from 0 to positive

infinity
Odds ratio: P/Q is an odds ratio; less
than 1 = less than .50 probability, greater
than 1 means greater than .50 probability

19

26/05/12

The Logit

## Finally, taking the natural log of both

sides, we can write the equation in
terms of logits (log-odds):

P(Y 1| X )
ln
ln
b0 b1 X1

(1 P(Y 1| X )) (1 )
For a single predictor

The Logit

ln
b0 b1 X1 b2 X 2 bk X k

(1 )
For multiple predictors

20

26/05/12

The Logit

## Log-odds are a linear function of the

predictors
The regression coefficients go back to
their old interpretation (kind of)

## The expected value of the logit (logodds) when X = 0

Called a logit difference; The amount
the logit (log-odds) changes, with a one
unit change in X; the amount the logit
changes in going from X to X + 1

Conversion

## EXP(logit) or = odds ratio

Probability = odd ratio / (1 + odd ratio)

21

26/05/12

THANK YOU
GOOD LUCK

22