Sie sind auf Seite 1von 17

Chaid Analysis

Made by:
Abhay Prabhu (A003)
Ashish Aggarwal (A016)
Omm Sanghani (A033)
Pratyush Sharma (A040)
Rahul Khemka (A041)
Saurabh Daga (A044)
PGDM 07
NMIMS, Bangalore

1
Q1. CHAID analysis, interpretation and implications.

2
Chaid Results:

Risk

Estimate Std. Error

.111 .001

Growing Method: CHAID


Dependent Variable: y

Classification

Predicted

Observed no yes Percent Correct

no 78608 1236 98.5%


yes 8782 1796 17.0%
Overall Percentage 96.6% 3.4% 88.9%

Growing Method: CHAID


Dependent Variable: y

3
Model summary for the above Tree:

Model Summary

Specifications Growing Method CHAID

Dependent Variable y

Independent Variables LINT(balance), LINT(age), job, marital, default,


housing, loan, day, month, education4, contact4

Validation None

Maximum Tree Depth 3

Minimum Cases in Parent 10000


Node

Minimum Cases in Child 5000


Node
Results Independent Variables month, contact4, marital, LINT(balance),
Included housing, day

Number of Nodes 19

Number of Terminal Nodes 12

Depth 3

Model Summary for the Original Tree:

Model Summary

Specifications Growing Method CHAID

Dependent Variable y
Independent Variables LINT(balance), LINT(age), job, marital, default,
housing, loan, day, month, education4, contact4

Validation None

Maximum Tree Depth 3

Minimum Cases in Parent 100


Node

Minimum Cases in Child 50


Node
Results Independent Variables month, day, housing, contact4, marital, job,
Included LINT(balance), LINT(age), education4, loan

Number of Nodes 145

Number of Terminal Nodes 99

Depth 3

4
Chaid Analysis:

• Taking Response of customers as the dependent variable, we OBSERVE from the tree
that the month is the most significant independent variable
• Now considering the Node 1 : month: April, March, December, October, September,
30.7 % OF customers who gave response as yes
• From node 7 it can be seen that 11.5% of the customers contact from cellular and
telephone and gave the response as yes and 3.3 % who did not give the response
were from unknown contact
• We can infer further 2.5% of the customers of unknown contact and gave the
response as yes were married and 4.5% of the customers of unknown contact and
gave response as yes were either single or divorced
• For customers having month as May, Jun, Nov, Jan contact is the most significant
variable.
• For customers having month as August, day is the most significant variable.
• For customers having month as July balance is the most significant variable.
• For customers having month as Jun, Nov and Jan and contact as Cellular and
Telephone Housing is the most significant variable with response of customers as
19.5% as yes.

5
Q2. Group the customers into 3 based on propensity to purchase and profile them. Use
charts to present findings. Ensure that all leaves are covered in the groups
The Following Assumptions have been made:

Assumptions
Customer Response as
Category yes
Best <=10%
Medium (10-50)%
Worst >=50%

1. Customer Category v/s Age of Customers:

Report
LINT(age)

Node_cat Mean N Std. Deviation

Best 46.96 2594 16.549


Medium 44.53 30143 12.667
Worst 43.47 57685 9.942
Total 43.92 90422 11.177

Customer Category v/s Age of Customers


48

47

46
Mean of Age

45

44 46.96

43
44.53
42 43.47

41
Best Medium Worst
Customer Category

Means

6
2. Customer Category v/s Job Category

job * Node_cat Crosstabulation

Node_cat

Best Medium Worst Total

job admin Count 302 3379 6661 10342

% within job 2.9% 32.7% 64.4% 100.0%

blue-collar Count 152 4878 14434 19464

% within job .8% 25.1% 74.2% 100.0%

entrepreneur Count 42 720 2212 2974

% within job 1.4% 24.2% 74.4% 100.0%

housemaid Count 50 804 1626 2480

% within job 2.0% 32.4% 65.6% 100.0%

management Count 714 7216 10986 18916

% within job 3.8% 38.1% 58.1% 100.0%

retired Count 438 2358 1732 4528

% within job 9.7% 52.1% 38.3% 100.0%

self-employed Count 84 1040 2034 3158

% within job 2.7% 32.9% 64.4% 100.0%

services Count 138 2237 5933 8308

% within job 1.7% 26.9% 71.4% 100.0%

student Count 198 1162 516 1876

% within job 10.6% 61.9% 27.5% 100.0%

technician Count 346 5153 9695 15194

% within job 2.3% 33.9% 63.8% 100.0%

unemployed Count 104 1007 1495 2606

% within job 4.0% 38.6% 57.4% 100.0%

unknown Count 26 189 361 576

% within job 4.5% 32.8% 62.7% 100.0%


Total Count 2594 30143 57685 90422

% within job 2.9% 33.3% 63.8% 100.0%

7
Customer Category v/s Job Category
100%
90%
27.51%
80% 38.25%
70% 58.08% 57.37%
64.41% 65.56% 64.41% 63.81% 63.80%
74.16% 74.38% 71.41%
Percentage

60%
50%
40% 61.94%
52.08%
30%
38.15% 38.64%
20% 32.67% 32.42% 32.93% 33.91% 33.34%
25.06% 24.21% 26.93%
10%
9.67% 10.55%
0% 2.92% 0.78% 1.41% 2.02% 3.77% 2.66% 1.66% 2.28% 3.99% 2.87%

Job Category

Best Medium Worst

3. Default v/s Customer Category

default * Node_cat Crosstabulation

Node_cat

Best Medium Worst Total

Default no Count 2590 29889 56313 88792

% within default 2.9% 33.7% 63.4% 100.0%

yes Count 4 254 1372 1630

% within default .2% 15.6% 84.2% 100.0%


Total Count 2594 30143 57685 90422

% within default 2.9% 33.3% 63.8% 100.0%

8
Default v/s Customer Category
100%

90%

80%
Percentage of customers

70% 63.42%
60% 84.17%
50%

40%

30%

20% 33.66%
10% 15.58%
0% 2.92% 0.25%
no yes
Default

Best Medium Worst

9
4. Month v/s Customer Category

month * Node_cat Crosstabulation

Node_cat

Best Medium Worst Total

Month apr Count 276 2878 2710 5864

% within month 4.7% 49.1% 46.2% 100.0%

aug Count 0 7099 5395 12494

% within month .0% 56.8% 43.2% 100.0%

dec Count 196 212 20 428

% within month 45.8% 49.5% 4.7% 100.0%

feb Count 396 2706 2196 5298

% within month 7.5% 51.1% 41.4% 100.0%

jan Count 96 1077 1633 2806

% within month 3.4% 38.4% 58.2% 100.0%

jul Count 0 3417 10373 13790

% within month .0% 24.8% 75.2% 100.0%

jun Count 240 1953 8489 10682

% within month 2.2% 18.3% 79.5% 100.0%

mar Count 52 902 0 954

% within month 5.5% 94.5% .0% 100.0%

may Count 0 7738 19794 27532

% within month .0% 28.1% 71.9% 100.0%

nov Count 300 689 6951 7940

% within month 3.8% 8.7% 87.5% 100.0%


oct Count 420 1014 42 1476

% within month 28.5% 68.7% 2.8% 100.0%

sep Count 618 458 82 1158

% within month 53.4% 39.6% 7.1% 100.0%


Total Count 2594 30143 57685 90422

% within month 2.9% 33.3% 63.8% 100.0%

10
Month v/s Custometr Category
100% 0.00% 2.85% 4.67%
7.08%
Percentage of customers 90%
80% 41.45% 43.18%
46.21%
70% 58.20% 39.55% 49.53%
71.89% 68.70%
60% 79.47% 75.22%
87.54%
50% 94.55%

40%
30% 51.08%
49.08% 56.82% 53.37%
20% 38.38% 45.79%
28.11% 18.28% 24.78% 28.46%
10% 8.68%
3.42% 7.47% 5.45% 4.71% 2.25% 3.78%
0% 0.00% 0.00% 0.00%
Jan Feb Mar Apr May June July Aug Sep Oct Nov Dec
Months

Best Medium Worst

5. Marital v/s Customer Category

marital * Node_cat Crosstabulation

Node_cat

Best Medium Worst Total

marital divorced Count 318 3172 6924 10414

% within marital 3.1% 30.5% 66.5% 100.0%

married Count 1328 17346 35754 54428

% within marital 2.4% 31.9% 65.7% 100.0%

single Count 948 9625 15007 25580

% within marital 3.7% 37.6% 58.7% 100.0%


Total Count 2594 30143 57685 90422

% within marital 2.9% 33.3% 63.8% 100.0%

11
Marital v/s Customer Category
100%

Percentage of customers 80%


66.49% 65.69% 58.67%
60%

40%

20% 37.63%
30.46% 31.87%

0% 3.05% 2.44% 3.71%


divorced married single
Marital

Best Medium Worst

6. Education v/s Customer Category

education4 * Node_cat Crosstabulation

Node_cat

Best Medium Worst Total

education4 primary Count 298 3930 9474 13702

% within education4 2.2% 28.7% 69.1% 100.0%

secondary Count 1098 14263 31043 46404

% within education4 2.4% 30.7% 66.9% 100.0%

tertiary Count 1062 10701 14839 26602

% within education4 4.0% 40.2% 55.8% 100.0%

unknown Count 136 1249 2329 3714

% within education4 3.7% 33.6% 62.7% 100.0%


Total Count 2594 30143 57685 90422

% within education4 2.9% 33.3% 63.8% 100.0%

12
Education v/s Customer Category
100%
90%
80%
Percentage of Customers

70% 55.80%
66.90% 62.70%
69.10%
60%
50%
40%
30%
40.20%
20% 33.60%
28.70% 30.70%
10%
0% 2.20% 2.40% 4.00% 3.70%
primary secondary tertiary unknown
Education

Best Medium Worst

7. Housing v/s Customer Category

housing * Node_cat Crosstabulation

Node_cat

Best Medium Worst Total

housing no Count 2056 17589 20517 40162

% within housing 5.1% 43.8% 51.1% 100.0%

yes Count 538 12554 37168 50260

% within housing 1.1% 25.0% 74.0% 100.0%


Total Count 2594 30143 57685 90422

% within housing 2.9% 33.3% 63.8% 100.0%

13
Hosuing v/s Customer Category
100%
90%
80%
Percentage of Customers

51.09%
70%
60% 73.95%

50%
40%
30%
43.80%
20%
24.98%
10%
0% 5.12% 1.07%
no yes
Housing

Best Medium Worst

8. Contact v/s Customer Category

contact4 * Node_cat Crosstabulation

Node_cat

Best Medium Worst Total

contact4 cellular Count 2268 26699 29603 58570

% within contact4 3.9% 45.6% 50.5% 100.0%

telephone Count 244 2306 3262 5812

% within contact4 4.2% 39.7% 56.1% 100.0%

unknown Count 82 1138 24820 26040

% within contact4 .3% 4.4% 95.3% 100.0%


Total Count 2594 30143 57685 90422

% within contact4 2.9% 33.3% 63.8% 100.0%

14
Contact v/s Customer Category
100%
90%
80%
Percentage of Customers

50.54%
70% 56.13%

60%
50% 95.31%

40%
30%
45.58% 39.68%
20%
10%
0% 3.87% 4.20% 4.37%
0.31%
cellular telephone unknown
Contact

Best Medium Worst

9. Balance v/s Customer Category

Report
LINT(balance)

Node_cat Mean N Std. Deviation

Best 2152.00 2594 3717.766


Medium 1760.87 30143 3555.669
Worst 1131.43 57685 2673.144
Total 1370.54 90422 3045.363

Balance v/s Customer Category


2500.00
2152.00

2000.00
1760.87
Mean of Balance

1500.00
1131.43
1000.00

500.00

.00
Best Medium Worst
Category of Customers

15
10. Loan v/s Customer Category

loan * Node_cat Crosstabulation

Node_cat

Best Medium Worst Total

loan no Count 2440 26676 46818 75934

% within loan 3.2% 35.1% 61.7% 100.0%

yes Count 154 3467 10867 14488

% within loan 1.1% 23.9% 75.0% 100.0%


Total Count 2594 30143 57685 90422

% within loan 2.9% 33.3% 63.8% 100.0%

Loan v/s Customer Category


100%
90%
Percentage of Customers

80%
70% 61.66%
60% 75.01%
50%
40%
30%
20% 35.13%
10% 23.93%
0% 3.21% 1.06%
no yes
Loan

Best Medium Worst

16
Q3. Conduct a chi-square test and show that chi-square value reported by CHAID
algorithm matches with test value. Choose any node other than root note for this.
Ans3. Considering Root 110,111 and 112 for the Validation of chi-square.
According to this the 3 nodes taken have the Chi-Square value= 30.242 which is verified by
Chi-Square Test.

Chi-Square Tests

Asymp. Sig. (2-


Value df sided)

Pearson Chi-Square 30.242a 2 .000


Likelihood Ratio 27.433 2 .000
N of Valid Cases 1804

a. 0 cells (.0%) have expected count less than 5. The minimum


expected count is 24.56.

17