Sie sind auf Seite 1von 19

■ '■ 目熙暉野千

'燎
n
t 1′ 14′ 2019

DISCRIMINANT ANALYSIS
Cわ apler 5

MULTIVARIATE DATA
ANALYSIS
Hair,AndersOn,Tatham,Black

d― minant analys s by ll1lk

Ch耳 漁訂 Prevlew
. Multiple regression is the most widely used
multivariate dependence technique
* The'primary basis fsr the popukrity"d'
predict and
regression has been its ability to
explain metric variables
. Multiple regression is not suitable for nonmetric
variables
. Discriminant analysis and logistic regression are
used when a dependent variable is nonmetric
anelPIS‐ bン llk

1‐

1ノ 14/2019

. The general form of MDA


・ Yl =Xl+X2+X3+.… +Xn
rf70月 me拓り fme師け

L pЮ d引 鴫
蒻 3

Wflat is discriminant ana*ysb?


・ Discrirninant analysis are the appropiate
statisticaltechniques when the dependent
uriable ttcJ猟 夢画G■・ (noml劇・Or.… ic,

and the independent variables are metttc


・ DA is used fbrtesting the hypothesis that the
group rneans of a set ofindependent variables
hれ MQcr_口血Q晨≧g□ 側Щpaatta nA‖ ni.
e The dependent variable consists of t″ o groups
aassilcalons:male‐ female or high■ ow

discriminart Brlalysis- by'litik

2
1′ 14′ 2019

Two groups or multiple discriminant


analysis?
When two classifications are
inrclued.)'fiao{tretPs.
di*riminant analYsis
Three or more classifications
) muftiple disciminant Z
rr = a +YtX * +W rX,, +... + V,rX u
analysis (MDA)
The linear combination for a
discriminant analysis, also
knouan, as ffra di*fi m i n ant
function

discriminant analysis bY lilik

What is discriminantZ score?


DA is the approprihte statisti'cal'technique fbr
testinq the hVpoihesis that the group means of a
set of-indeprindent variables for two or more
groups are equal
DA multiplies each independent variable by its
correspohOing weight and ad'ds these product
together
The result is a single composite discriminant Z
sCore for each individual in the analysis
Bv averaqinq the discriminant scores for all
i-ribividuaiS wltnin a particular group the group )
mean )is referred-to as a centroid
discriminant analYsb bY lilik

3
1′ 14/2019

What is centroid?
. Gentroids indicate the most typical location of
the discriminant function of individual from a
practicalgroup
- A emparissttof the'group'eeffisidsshemvs fuvr
far apart the groups are along the dimension
being tested
. The test for the statistical significance of the
discriminant is a generalized measure of the
distance between the group centroid, computed
by comparing the distributions of the
discriminant scores for the groups
discrinimnt arlalFis by lilik

丁he test for the statistical


significance
. The test for the statistical significance of the discriminant
is a generalized measure of the distance between the
group centroid
. Gomputed by comparing the distributions of the
discriminant scores for the groups
. lf the overlap in the distributions is small, the
discriminant function separates the groups well
- lf'the. overlap' b large; the'funetior+ is a' poor disoniminator
between the groups

dbcriminart aoalysis by.litk

4
↑ `
1′ 14r2019

Stages in Discriminant AnalYsis


Stte t・ 銀 絆 曲鴨 Sd臥
Determ i n ing whether statistica Iy sig nifi cant differences
I

exist betwe-en the average score profiles on a setof


variables for two (or more) a priori defined groups
Determining which of the independent variables account
the most tor tne differences in the average s@re profiles
of the two or more groups
Establish ing proced u res for classiffing objects
(individuals, firms, products) into groups on the basis of
iheir scores on a set of independent variables
Establishing the number and composition of the
dimensionJ of discrimination between g roups formed
from the set of independent variables

discdminant analysls by‖ ‖


k

S疑妥 2:.Rttα 瓢ふ D駐業μ f∝ DA


. Setection of dependent and independent
variables
be independent
- Specifi, which variables are tois to
(metric) and which variable be dependent
(categorical)
- Focus on the dependent variable first
- The numbe-r oJ d-epend-e-nt.variab-le gro-up,q
(categories) can be two or more, but these groups
must be mutuallY exclusive

― lys=辱 檄
由疏繭耐 ‐

5
1ノ 14/2019

sample size

ett gЮ up Sレ e
継記1麗 詔臨:諸 輛詰品鼻槻酬
『 d釧
gよ

nc“ ぬ9Ю upshodd hatt a baま 20


錮酬 謬 nd
贈:i酬器RI淵 朧X鼎 穐響:鰍辮鮮脹譜鶏溜

d“ ■lη nant analyss by mik

Divlsion of The SamPle


The usual procedure is to divide the total sample
of respond'ents randomly into two groups
The first group, the analy-sis "sample, is used to
develop flre djscriminant fu nction
The second qroup, the holdout sample, is used
to test the diScriminant function
This method of validating the function is.referred
to as the split samlrle or cross validauon
approach 150-50, 6X40, 75-25)
Selectinq individuals for the analysis and h^oldout
groups ) following a proportionately stratifaecl
sampling proceclure
discrimimrn analasb b),liH(

6
1′ 14r2019

Procedure for Selecting lndividuals


Selecting individuals for the analysis and holdout groups
) following a proportionately stratified sampling
procedure
lf the categorical group for the DA are equally
ropr€sostod in.thetotal"samptq an equal nurnber of
individuals is selected
Ex: a sample consist of 50 M, and 50 F, so the holdout
sample would have 25 M, and 25 F
lf the categorical groups are unequal, the sizes of the
orouos selected for the holdout sample should be
.to
fropixtiume re total'sdnpb disiTihftidr
Ex the sample contained 70 F, q4d 30 M, then the
holdout samples would have 35 F, 15 M

discnnlinant ana!ysis by lilik

Sfage 3
Assumptions of Discriminant Analysis

屋憮酬 駆 騰鰤轟 :Ctures


:欄輔関 鮮鱗指 ∬
uttη 襲 」、.∝M躍 園 腱 amattЮ s,caLrЮ 鮮 組 週 撃 Ject世 聡・
CleSSilCation pЮ cess

馴 麗州器8imth瞭麒
asslCation

discrirniaart, amlsb. bplr*tk 14

7
1′ 14′ 2019

Sfageイ
E dFtttDb酬 酔酢団離出
・ Two cornputational method to derive a
discrirninant function:
…sfmurraneOus“ JttCり me詢 0ご
h́erd摂 油面nantinabn_s
hdepend鵠 :° %anal聯 '製
discrirninating power c 淵群ま思
t

variables
_lhた .m・ L創 山町 wanぉ ゎ
theoretical 階
reasons 嚇葉 計
p龍 i曽
器 製tte results ba 高 塁l鍾
記 亀庸 魁
most disc‖ minatttr睫 ‖出艇ピ
ik 15

1li::│ [11:11::〕 ) LII::) 雌

!tis an aiternative to the sirnultaneous approach


:tis begun by ch00sing the single best
diSCFim輛 鋼 VaF‐ lable

The initial variable is then paired with each of


otherindependent variables one at a tirne,and
the variable thatis best abie to improve the

dttc薔 縮縮d輛9ヽ 岬
・Jtte和疑漁額・ in‐
combination with the flrst variable is chosen

discriminant, ana{C*s-by"lilik 16.

8
1′ 14r2019

°
t躙 lI鷺:¥面 釘
称謝蹴脚県は
independent variables forindusion in the functiOn
・ Variables that are not usefulin discrirninating beh″ een
the groups are e!irninated and a reduced set of variables
is identiled
・ But the researcher should note that stepwise estimation
becomes less stable and generalizab!e as the rati0 0f

_ .脚
::

variable… )itis impodantto va:idate the resu:ts

dttminant analwis by Ⅲk

漁 醐 鈍
. A number of different statistical criteria are
available to evaluate the statistical
signiificance of the discriminatory power of
the discriminant functions:
- Wilks'Lambda
- Hotelling s Trace
- Pillai's Criterion

ぬ疏 nh漱 鍛 摯 お.呼性 18.

9
1′ 14r2019

Mahalanobis Distance

Ii‖ T‖測[:1,t:寵 縄 │:淑:tttn∝


procedure measures are rnost appropiate

酬 聞躙雪機 壁i譜 縄珊
:`

unequal variances

醜籐孵 螺嶽模評誂 『
llii曇
r

used in other rneasure‐ )the increase of


number of predictor variables it does not result
h any Юdudiondm濡肥亀
朧ゃ鯖liυ 19

A loss dimensionality would cause a loss of information


because it decrease-s variability of the independent
variables
ln oeneral fulalmlanohis E[stance. is-the pnefes:ed-
oro-cedure when the researcher is interseted in the
inaximal use of available information
The conventional significance criterion of 0'05 or beyond
is often used
Manv researchers believe that if the function is not
sienifi eant at' or beYond' the' 0, 05 leveL there' is little'
lu6tiRcation for going further (some researcher, however
disagree)
Tfreir decision rule for continuing to a higher significance
tevel (0.10 level or more) is the cost versus thevalue of
the information
dirctinraant. analyr*s, by-lilk

10
1′ 14ノ 2019

@ Dimrisrir+arfi Z Sceres

ろ =α +Z為 た
十%χ 2ル
+‐ +乙 χη
… ル

disdiminant analysis by lilik

4k = discriminant Z score of
discriminant function j for object k
a = intercept
Wi = discriminant coefficient for
independent variable i
Xik = independent variable i for object k

discriminant andysb by lilik


1′ 14′ 2019

. There are two versions of the discriminant


function:
- $tandzudizedwetghts and vaLues
it is more useful for interpretation purposes
- Unstandardized
easier to use in calculating the discriminantZ
.SCCIIB

discriminant analysis bY lilik 23

The discriminant functions


V$
The classification fu nctions
. The ctassification functian, also known as Ftshels linear
disciminant functions
- lt is used in classiffing observations
- The observation is classified into the group with
the highest classification score
. The discriminant functions is used as the means of
classification because:
- it provides a concise and simple representation of
each dlscri minant fu nction
- simplifying the interpretation process and the
assessment of the contribution of independent
variables
dEcrirniEast a0abEb.by.{llik A

12
1′ 14f2019

Why calssification matrices are


erdroped?
. The statistical test for assesing the significance
of the discriminant functions do not tell how well
the function predicts
. Thus the level of significance can be a very poor
indication of the function's ability to discriminate
between the two groups
. To determine the predictive ability of a
discriminant function, the researcher must
construct classification matrices
. To clarify further usefullness of the classification
matrix piocedure, we shall relate it to concept of
R squdre in regression analysis
discriminant analfsis by lilik %

The R quare ar+d the hit ratb


. The R square
- lt is a concept in Regression analysis
- lndicating how much variance the regression equation explained
- TtEstatistical'sifificars€of R-'squtrE is F - tilst'
. The hit ratio
- it is the concept in discriminant analysis
- indicating how wetl the discriminant function classified the
objects
-- Test of signifi car+e+in.discrjminant analysie is' chi-squar+(or D.
square)

discriminant. aoalysr=- by"lilik 郎 ‐


1ノ 14r2019

Cutting Score Determination


The cufting scorc is the criterion (score) against which
each object's discriminant score is compared to
determine into which group the object should be
classified
Before a classification matrix can be constructed,
however, the researcher have to determine the cutting
score
tn constructing classification matrices, the researcher will
want to determine the optimum cutting score (citical
z wlde)'
The optimal cutting scores will differ depending on
whether the size-e of the groups are equal or unequal
lf the groups are of equal size, the optimal cutting score
will be halfway between'firo4rotpircentroids z7

The optimal cutting score


for equal size
. lf the groups are of equal size, the optimal
cutting score will be halfway between two group
centroids.

Ze = 7A+ ZB
2
Zw = ctificet cuftingrsrofe valuefw equal WorJp
sizes
A,= centroid for group A
ZB = centroid for group B
dbcriminant anabsis- by.lili!..

■1瑠
1′ 14r2019

Cutting score determination for



・ …
地 薄 聾 麗 譜 i棺
路勢憾 硫朧握鷺reび

・Zcu=NA ZB+NB ZA
NA+N8.
Zcu =Critical cutting score value for
unequal group s:zes
NA =numbe「 in group A
NB =numberin group B
ZA =Centroid for group A
Z8 =α 頚蒔d`fOF gFeuprB
・ Both ofthe formulas assume thatthe disributions are normal and the
group dispersion structures are known

discriminant arulysis by lilik

Constructing Classificalon
Matrices
O 丁o validate the discrinlinant function through the
use of classilcation rnatrices,the sample should
m ra劇 回mly diviOd into鮨 海 groups,
0・
The individual scores forthe holdout sample are
compared with the critical cutting score value
crasslivanわ dliViduarル O groupハ ″Zr7く Zcf
“ gゎ Jpわ ″Zη >Zcf
or cた 33′ ヶan lindviduarinね
Zn =discrinlinant Z score for the r7 th
individual
Zd=c面 lCJ ma∞ re vJue
坪戦 m

3
1′ 14/2019

Examllё Of classrica10n mat面 x for


取 甲 聯 DA
Adua: Predicted Actual Percentage
group group group size correctly
1 2 classified
88
4 1

22 3 25

2 5 20 25 80

Predicted 50 84彙

group size

Fr(】 n the Table



・ The entries on the diagonal ofthe rnatrix
represent the number ofindividuals correctiy
classified(22,2o)
・ The numbers offthe diagonairepresentthe
incorrect dassrica10ns(3,5)

0 丁he entries underthe column labeled"actual
duJS aduJけ
織 fT縮 橋 :旧 ::襲 1瑯

0 The entries atthe bottorn ofthe columns→ the
number ofindividuals assigned to the groups by
the disc面 minant function(27,23)
諄敗 32
… …

´ 蔵
1ノ 14/2019

・ The percentage correctly classified for each


group is shown in the right side ofthe matrix(88,
80)
・ The overa‖ percentage correctly c!assified or the
hit ratio,is shOWn atthe bottom (84)
・ Percent correctly ciassried〓 (number correctly
classified/totai number of observations)x100

[(22+20)ノ 501x100=84

d― inant analys6 by llk "

Sfage 5
lnterpreting of the results
・ Dis"minar7f ttψ お
- the traditional approach to interpreting discrimin-ant
functions examines the sign and magnttude ot tne
itandardized discriminant weights or discriminant
coefilcient
- when the sion is ionored, each weight represents the
its
relative cont-ributidn of associated variable to that


lna膵 嘲1誦 [婦 鎚1鰯曝 t随

鰈励
'じ
:量 e

discrniiaart sralFis by lilik 34

5
1′ 14r2019

Discriminant loadings
. Discriminant loadings, referred to sometimes as
structure correlations, measure the simple linear
correlation between each independent variable
and the discriminant function
. Discriminant loadings may be subiect to
instabillity
. Loadings are considered relatively more valid
than weights as a means of interpreting the
discriminating power of independent variables
because of their correlational nature
discriminant analFis bY lilik 35

Par:tial F values

Large F values indicate greater


discriminatory Power
ln practice, rankings using the F values
approach are the same as the ranking
derived from using discriminant weights
But the F vatues indicate the associat'ed
level of significance for each variable

{$s6riminant analYsis bY tilk 36



一一

              .

1′ 14r2019

Sfage 6
Valtdation of the results
. Using split sample or cross validation
procedures

discriminant analysis bY lilik

Das könnte Ihnen auch gefallen