Discriminant Analysis

‐
■ '■ 目熙暉野千
'燎
n
t 1′ 14′ 2019
DISCRIMINANT ANALYSIS
Cわ apler 5
MULTIVARIATE DATA
ANALYSIS
Hair,AndersOn,Tatham,Black
d― minant analys s by ll1lk
Ch耳漁訂 Prevlew
. Multiple regression is the most widely used
multivariate dependence technique
* The'primary basis fsr the popukrity"d'
predict and
regression has been its ability to
explain metric variables
. Multiple regression is not suitable for nonmetric
variables
. Discriminant analysis and logistic regression are
used when a dependent variable is nonmetric
anelPIS‐ bン llk
…
ヽ
1‐
1ノ 14/2019
. The general form of MDA

・ Yl =Xl+X2+X3+.… +Xn
rf70月 me拓り fme師け
L pЮ d引鴫
蒻 3
Wflat is discriminant ana*ysb?

・ Discrirninant analysis are the appropiate
statisticaltechniques when the dependent
uriable ttcJ猟夢画G■・ (noml劇・Or.… ic,
and the independent variables are metttc

・ DA is used fbrtesting the hypothesis that the
group rneans of a set ofindependent variables
hれ MQcr_口血Q晨≧g□ 側Щpaatta nA‖ ni.
e The dependent variable consists of t″ o groups
aassilcalons:male‐ female or high■ ow
discriminart Brlalysis- by'litik
2
1′ 14′ 2019
Two groups or multiple discriminant

analysis?
When two classifications are
inrclued.)'fiao{tretPs.
di*riminant analYsis
Three or more classifications
) muftiple disciminant Z
rr = a +YtX * +W rX,, +... + V,rX u
analysis (MDA)
The linear combination for a
discriminant analysis, also
knouan, as ffra di*fi m i n ant
function
discriminant analysis bY lilik
What is discriminantZ score?

DA is the approprihte statisti'cal'technique fbr
testinq the hVpoihesis that the group means of a
set of-indeprindent variables for two or more
groups are equal
DA multiplies each independent variable by its
correspohOing weight and ad'ds these product
together
The result is a single composite discriminant Z
sCore for each individual in the analysis
Bv averaqinq the discriminant scores for all
i-ribividuaiS wltnin a particular group the group )
mean )is referred-to as a centroid
discriminant analYsb bY lilik
3
1′ 14/2019
What is centroid?
. Gentroids indicate the most typical location of
the discriminant function of individual from a
practicalgroup
- A emparissttof the'group'eeffisidsshemvs fuvr
far apart the groups are along the dimension
being tested
. The test for the statistical significance of the
discriminant is a generalized measure of the
distance between the group centroid, computed
by comparing the distributions of the
discriminant scores for the groups
discrinimnt arlalFis by lilik
丁he test for the statistical

significance
. The test for the statistical significance of the discriminant
is a generalized measure of the distance between the
group centroid
. Gomputed by comparing the distributions of the
discriminant scores for the groups
. lf the overlap in the distributions is small, the
discriminant function separates the groups well
- lf'the. overlap' b large; the'funetior+ is a' poor disoniminator
between the groups
dbcriminart aoalysis by.litk
4
↑ `
1′ 14r2019
Stages in Discriminant AnalYsis

Stte t・銀絆曲鴨 Sd臥
Determ i n ing whether statistica Iy sig nifi cant differences
I
exist betwe-en the average score profiles on a setof

variables for two (or more) a priori defined groups
Determining which of the independent variables account
the most tor tne differences in the average s@re profiles
of the two or more groups
Establish ing proced u res for classiffing objects
(individuals, firms, products) into groups on the basis of
iheir scores on a set of independent variables
Establishing the number and composition of the
dimensionJ of discrimination between g roups formed
from the set of independent variables
discdminant analysls by‖ ‖

k
S疑妥 2:.Rttα 瓢ふ D駐業μ f∝ DA

. Setection of dependent and independent
variables
be independent
- Specifi, which variables are tois to
(metric) and which variable be dependent
(categorical)
- Focus on the dependent variable first
- The numbe-r oJ d-epend-e-nt.variab-le gro-up,q
(categories) can be two or more, but these groups
must be mutuallY exclusive
― lys=辱檄
由疏繭耐 ‐
5
1ノ 14/2019
sample size
ett gЮ up Sレ e
継記1麗詔臨:諸輛詰品鼻槻酬
『 d釧
gよ
nc“ ぬ9Ю upshodd hatt a baま 20

錮酬謬 nd
贈:i酬器RI淵朧X鼎穐響:鰍辮鮮脹譜鶏溜
d“ ■lη nant analyss by mik
Divlsion of The SamPle

The usual procedure is to divide the total sample
of respond'ents randomly into two groups
The first group, the analy-sis "sample, is used to
develop flre djscriminant fu nction
The second qroup, the holdout sample, is used
to test the diScriminant function
This method of validating the function is.referred
to as the split samlrle or cross validauon
approach 150-50, 6X40, 75-25)
Selectinq individuals for the analysis and h^oldout
groups ) following a proportionately stratifaecl
sampling proceclure
discrimimrn analasb b),liH(
6
1′ 14r2019
Procedure for Selecting lndividuals

Selecting individuals for the analysis and holdout groups
) following a proportionately stratified sampling
procedure
lf the categorical group for the DA are equally
ropr€sostod in.thetotal"samptq an equal nurnber of
individuals is selected
Ex: a sample consist of 50 M, and 50 F, so the holdout
sample would have 25 M, and 25 F
lf the categorical groups are unequal, the sizes of the
orouos selected for the holdout sample should be
.to
fropixtiume re total'sdnpb disiTihftidr
Ex the sample contained 70 F, q4d 30 M, then the
holdout samples would have 35 F, 15 M
discnnlinant ana!ysis by lilik
Sfage 3
Assumptions of Discriminant Analysis
屋憮酬駆騰鰤轟 :Ctures
懺
:欄輔関鮮鱗指 ∬
uttη 襲」、.∝M躍園腱 amattЮ s,caLrЮ 鮮組週撃 Ject世聡・
CleSSilCation pЮ cess
馴麗州器8imth瞭麒
asslCation
discrirniaart, amlsb. bplr*tk 14
7
1′ 14′ 2019
Sfageイ
E dFtttDb酬酔酢団離出
・ Two cornputational method to derive a
discrirninant function:
…sfmurraneOus“ JttCり me詢 0ご
h́erd摂油面nantinabn_s
hdepend鵠 :° %anal聯 '製
discrirninating power c 淵群ま思
t
variables
_lhた .m・ L創山町 wanぉゎ
theoretical 階
reasons 嚇葉計
p龍 i曽
器製tte results ba 高塁l鍾
記亀庸魁
most disc‖ minatttr睫 ‖出艇ピ
ik 15
1li::￨ [11:11::〕 ) LII::) 雌
!tis an aiternative to the sirnultaneous approach

:tis begun by ch00sing the single best
diSCFim輛鋼 VaF‐ lable
The initial variable is then paired with each of

otherindependent variables one at a tirne,and
the variable thatis best abie to improve the
・
dttc薔縮縮d輛9ヽ岬
・Jtte和疑漁額・ in‐
combination with the flrst variable is chosen
discriminant, ana{C*s-by"lilik 16.
8
1′ 14r2019
°
t躙 lI鷺:¥面釘
称謝蹴脚県は
independent variables forindusion in the functiOn
・ Variables that are not usefulin discrirninating beh″ een
the groups are e!irninated and a reduced set of variables
is identiled
・ But the researcher should note that stepwise estimation
becomes less stable and generalizab!e as the rati0 0f
_ .脚
::
variable… )itis impodantto va:idate the resu:ts
dttminant analwis by Ⅲk
漁醐鈍
. A number of different statistical criteria are
available to evaluate the statistical
signiificance of the discriminatory power of
the discriminant functions:
- Wilks'Lambda
- Hotelling s Trace
- Pillai's Criterion
ぬ疏 nh漱鍛摯お.呼性 18.
9
1′ 14r2019
Mahalanobis Distance
Ii‖ T‖測[:1,t:寵縄￨:淑:tttn∝

procedure measures are rnost appropiate
酬聞躙雪機壁i譜縄珊
:｀
unequal variances
醜籐孵螺嶽模評誂『
llii曇
r
used in other rneasure‐ )the increase of

number of predictor variables it does not result
h any Юdudiondm濡肥亀
朧ゃ鯖liυ 19
A loss dimensionality would cause a loss of information

because it decrease-s variability of the independent
variables
ln oeneral fulalmlanohis E[stance. is-the pnefes:ed-
oro-cedure when the researcher is interseted in the
inaximal use of available information
The conventional significance criterion of 0'05 or beyond
is often used
Manv researchers believe that if the function is not
sienifi eant at' or beYond' the' 0, 05 leveL there' is little'
lu6tiRcation for going further (some researcher, however
disagree)
Tfreir decision rule for continuing to a higher significance
tevel (0.10 level or more) is the cost versus thevalue of
the information
dirctinraant. analyr*s, by-lilk
10
1′ 14ノ 2019
@ Dimrisrir+arfi Z Sceres
ろ =α +Z為た
十%χ 2ル
+‐ +乙 χη
… ル
disdiminant analysis by lilik
4k = discriminant Z score of
discriminant function j for object k
a = intercept
Wi = discriminant coefficient for
independent variable i
Xik = independent variable i for object k
discriminant andysb by lilik

1′ 14′ 2019
. There are two versions of the discriminant

function:
- $tandzudizedwetghts and vaLues
it is more useful for interpretation purposes
- Unstandardized
easier to use in calculating the discriminantZ
.SCCIIB
discriminant analysis bY lilik 23
The discriminant functions

V$
The classification fu nctions
. The ctassification functian, also known as Ftshels linear
disciminant functions
- lt is used in classiffing observations
- The observation is classified into the group with
the highest classification score
. The discriminant functions is used as the means of
classification because:
- it provides a concise and simple representation of
each dlscri minant fu nction
- simplifying the interpretation process and the
assessment of the contribution of independent
variables
dEcrirniEast a0abEb.by.{llik A
12
1′ 14f2019
Why calssification matrices are

erdroped?
. The statistical test for assesing the significance
of the discriminant functions do not tell how well
the function predicts
. Thus the level of significance can be a very poor
indication of the function's ability to discriminate
between the two groups
. To determine the predictive ability of a
discriminant function, the researcher must
construct classification matrices
. To clarify further usefullness of the classification
matrix piocedure, we shall relate it to concept of
R squdre in regression analysis
discriminant analfsis by lilik %
The R quare ar+d the hit ratb

. The R square
- lt is a concept in Regression analysis
- lndicating how much variance the regression equation explained
- TtEstatistical'sifificars€of R-'squtrE is F - tilst'
. The hit ratio
- it is the concept in discriminant analysis
- indicating how wetl the discriminant function classified the
objects
-- Test of signifi car+e+in.discrjminant analysie is' chi-squar+(or D.
square)
discriminant. aoalysr=- by"lilik 郎 ‐

1ノ 14r2019
Cutting Score Determination

The cufting scorc is the criterion (score) against which
each object's discriminant score is compared to
determine into which group the object should be
classified
Before a classification matrix can be constructed,
however, the researcher have to determine the cutting
score
tn constructing classification matrices, the researcher will
want to determine the optimum cutting score (citical
z wlde)'
The optimal cutting scores will differ depending on
whether the size-e of the groups are equal or unequal
lf the groups are of equal size, the optimal cutting score
will be halfway between'firo4rotpircentroids z7
The optimal cutting score

for equal size
. lf the groups are of equal size, the optimal
cutting score will be halfway between two group
centroids.
Ze = 7A+ ZB
2
Zw = ctificet cuftingrsrofe valuefw equal WorJp
sizes
A,= centroid for group A
ZB = centroid for group B
dbcriminant anabsis- by.lili!..
■1瑠
1′ 14r2019
Cutting score determination for

経
・ …
地薄聾麗譜 i棺
路勢憾硫朧握鷺reび
縁
・Zcu=NA ZB+NB ZA
NA+N8.
Zcu =Critical cutting score value for
unequal group s:zes
NA =numbe「 in group A
NB =numberin group B
ZA =Centroid for group A
Z8 =α 頚蒔d`fOF gFeuprB
・ Both ofthe formulas assume thatthe disributions are normal and the
group dispersion structures are known
discriminant arulysis by lilik
Constructing Classificalon
Matrices
O 丁o validate the discrinlinant function through the
use of classilcation rnatrices,the sample should
m ra劇回mly diviOd into鮨海 groups,
0・
The individual scores forthe holdout sample are
compared with the critical cutting score value
crasslivanわ dliViduarル O groupハ ″Zr7く Zcf
“ gゎ Jpわ ″Zη >Zcf
or cた 33′ ヶan lindviduarinね
Zn =discrinlinant Z score for the r7 th
individual
Zd=c面 lCJ ma∞ re vJue
坪戦 m
…
3
1′ 14/2019
Examllё Of classrica10n mat面 x for

取甲聯 DA
Adua: Predicted Actual Percentage
group group group size correctly
1 2 classified
88
４１
22 3 25
2 5 20 25 80
２
７
Predicted 50 84彙
３
２
group size
Fr(】 n the Table

・
・ The entries on the diagonal ofthe rnatrix
represent the number ofindividuals correctiy
classified(22,2o)
・ The numbers offthe diagonairepresentthe
incorrect dassrica10ns(3,5)
・
0 丁he entries underthe column labeled"actual
duJS aduJけ
織 fT縮橋 :旧 ::襲 1瑯
・
0 The entries atthe bottorn ofthe columns→ the
number ofindividuals assigned to the groups by
the disc面 minant function(27,23)
諄敗 32
… …
´ 蔵
1ノ 14/2019
・ The percentage correctly classified for each

group is shown in the right side ofthe matrix(88,
80)
・ The overa‖ percentage correctly c!assified or the
hit ratio,is shOWn atthe bottom (84)
・ Percent correctly ciassried〓 (number correctly
classified/totai number of observations)x100
[(22+20)ノ 501x100=84
d― inant analys6 by llk "
Sfage 5
lnterpreting of the results
・ Dis"minar7f ttψ お
- the traditional approach to interpreting discrimin-ant
functions examines the sign and magnttude ot tne
itandardized discriminant weights or discriminant
coefilcient
- when the sion is ionored, each weight represents the
its
relative cont-ributidn of associated variable to that
朧
lna膵嘲1誦 [婦鎚1鰯曝 t随
鰈励
'じ
:量 e
discrniiaart sralFis by lilik 34
5
1′ 14r2019
Discriminant loadings
. Discriminant loadings, referred to sometimes as
structure correlations, measure the simple linear
correlation between each independent variable
and the discriminant function
. Discriminant loadings may be subiect to
instabillity
. Loadings are considered relatively more valid
than weights as a means of interpreting the
discriminating power of independent variables
because of their correlational nature
discriminant analFis bY lilik 35
Par:tial F values
Large F values indicate greater

discriminatory Power
ln practice, rankings using the F values
approach are the same as the ranking
derived from using discriminant weights
But the F vatues indicate the associat'ed
level of significance for each variable
{$s6riminant analYsis bY tilk 36
轟
■
一一
・
．
1′ 14r2019
Sfage 6
Valtdation of the results
. Using split sample or cross validation
procedures
discriminant analysis bY lilik

Discriminant Analysis

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Discriminant Analysis

Hochgeladen von

Copyright:

Verfügbare Formate

‐

d― minant analys s by ll1lk

. The general form of MDA

Wflat is discriminant ana*ysb?

and the independent variables are metttc

discriminart Brlalysis- by'litik

Two groups or multiple discriminant

discriminant analysis bY lilik

What is discriminantZ score?

丁he test for the statistical

dbcriminart aoalysis by.litk

Stages in Discriminant AnalYsis

exist betwe-en the average score profiles on a setof

discdminant analysls by‖ ‖

S疑妥 2:.Rttα 瓢ふ D駐業μ f∝ DA

nc“ ぬ9Ю upshodd hatt a baま 20

d“ ■lη nant analyss by mik

Divlsion of The SamPle

Procedure for Selecting lndividuals

discnnlinant ana!ysis by lilik

屋憮酬 駆 騰鰤轟 :Ctures

discrirniaart, amlsb. bplr*tk 14

1li::￨ [11:11::〕 ) LII::) 雌

!tis an aiternative to the sirnultaneous approach

The initial variable is then paired with each of

discriminant, ana{C*s-by"lilik 16.

variable… )itis impodantto va:idate the resu:ts

ぬ疏 nh漱 鍛 摯 お.呼性 18.

Ii‖ T‖測[:1,t:寵 縄 ￨:淑:tttn∝

used in other rneasure‐ )the increase of

A loss dimensionality would cause a loss of information

disdiminant analysis by lilik

discriminant andysb by lilik

. There are two versions of the discriminant

discriminant analysis bY lilik 23

The discriminant functions

Why calssification matrices are

The R quare ar+d the hit ratb

discriminant. aoalysr=- by"lilik 郎 ‐

Cutting Score Determination

The optimal cutting score

Cutting score determination for

discriminant arulysis by lilik

Examllё Of classrica10n mat面 x for

Fr(】 n the Table

・ The percentage correctly classified for each

d― inant analys6 by llk "

discrniiaart sralFis by lilik 34

Large F values indicate greater

{$s6riminant analYsis bY tilk 36

discriminant analysis bY lilik

Das könnte Ihnen auch gefallen

屋憮酬駆騰鰤轟 :Ctures

ぬ疏 nh漱鍛摯お.呼性 18.

Ii‖ T‖測[:1,t:寵縄￨:淑:tttn∝