Mathematics - Ijmcar - Improved Chaid Algorithm For Classifying Customer Groups

International Journal of Mathematics and
Computer Applications Research (IJMCAR)

ISSN(P): 2249-6955; ISSN(E): 2249-8060
Vol. 5, Issue 6, Dec 2015, 17-26
TJPRC Pvt. Ltd.
IMPROVED CHAID ALGORITHM FOR CLASSIFYING

CUSTOMER GROUPS: A MARKETING APPLICATION
C. P. BALASUBRAMANIAM1 & V. THIGARASU2
1
2
Research Scholar, Karpagam University, Coimbatore, Tamil Nadu, India
Associate Professor, Department of Computer Science, Gopi Arts & Science College, Humanities,
Coimbatore, Tamil Nadu, India
ABSTRACT
Supply Chain Management (SCM) makes use of a decision-tree based approach to learn and recognize the
logical elements of a tree structure and it gives the coding rules as well as Logical rules features needed by the
marketing application system. The decision-tree attributes are classified and test during the analysis of SCM. AgentBased modeling is employed by using an Improved CHi-squared Automatic Interaction Detection (I-CHAID method)
which is belonging to the SCM and the users need tested with respect to around 50 dataset attribute content with ICHAID method, the final tested result representing lower error rate for determining the logical labels is less than 5%.
CHAID with respect to supply chain management.

KEYWORDS: Decision-Tree Based Approach; Agent Based Modeling, Improved Chi-Squared Automatic Interaction
Detection, Error Rate
Received: Oct 13, 2015; Accepted: Oct 26, 2015; Published: Oct 31, 2015; Paper Id.: IJMCARDEC20152
Original Article
The efficiency quality parameters like precision, recall and error rate are calculated and explained the functioning of I-
INTRODUCTION
Supply Chain Management is the system that coordinates the interrelations and interactions among
networked business functions and networked business functions have components such as, suppliers, customers
and agents. SCM controls the information flows between businesses function just mentioned previously as
suppliers to customers with respect to the agent. Within a particular company, with the help of available
information and communication technologies, to provide highly important to the planned decisions in an effective
and efficient way, all those things will happen only by the SCM across business. Supply chain management much
useful to improving long-term performance goal like improvement opportunity for developing plan for individual
organization growth achievements.
All the process of SCM circumstances that are follows the planning process; it aims to obtain a balance
between supply and demand, from primary suppliers to final customers on the networked business functions.
Supply and demand between the suppliers and customers accordingly balanced because of delivering superior
services as well as goods through the supply chain assets with the optimized manner.
In other hand, to balancing both supply and demand were quite difficult task when large quantity of
complex decisions to be synchronized simultaneously. Some warded off issues like deliver time failure, carrier
performance, stock availability, and quality problems that can complicate the process, for instance the existence of
www.tjprc.org
editor@tjprc.org
18
C. P. Balasubramaniam & V. Thigarasu
conflicting objectives and presence of stochastic behaviors [Camarinha-Matos, LM and Afsarmanesh H]. In the recent
domain of supply chain management era, performing experimentation through agent-based systems is a very widely known
research.
Agent-based systems are simulated and test by using I-CHAID with presence of high level practitioners like supply
chain partners with in the frontier of SCM. For example, the problems will arise in SCM due to the lack of planning and
scheduling integration, supply chain coordination, supply chain dynamics, information sharing, supply chain control
structures, intelligent behavior of supply chain members. Before reduce this problems in SCM, first practitioners analyze
platforms where they resolve the problem in efficient manner. According to the certain modeling approaches practitioners
first test the SCM affected data with the help of I-CHAID. The results of I-CHAIDs error rate will reduce by using agentbased supply chain plan systems. This mechanism is gives the efficient tools for measure precision, recall and calculate error
rate efficiently.
The following figure shows that the relation between company with suppliers and customer. Each process has
three attributes like select source and delivers to the end user with the help of agent like company agent, suppliers agent
and customer agents. Each level of processing that contains the planning and resource sharing based on the relationship
between the suppliers and consumers. SCM framework are used to manage separate the suppliers materials and reputation
materials for framing the decision tree.
Figure 1: Supply Chain Model for Three Level of Management Mechanism

Improved Chi-Square Automatic Interaction Detector (I-CHAID) Decision Tree method as a method to discover
the useful pattern of the decision tree in a huge amount of data and then determines efficiency that provided steps for road
maps to the supply chain management process. That will highlighting the figure 1, supply chain model has three level of
management mechanism namely company agent system, supplier agent system and consumer agent system, the Improved
CHAID method applied to above framework and produces the feature selection [Hongwenzheng and Yanxiazhang] for
high dimensional data. Afterwards the Decision tree construction is made with presence of company, suppliers and
customer. Those are applied to the Deduction and coding [C. P Balasubramaniam and V. Thigarasu] rules as well as
Logical rules. Finally the Structure recognition [C. P Balasubramaniam and V. Thigarasu] was found based on the supply
chain management.
Impact Factor (JCC): 4.6257
NAAS Rating: 3.80
19
Improved CHAID Algorithm for Classifying

Customer Groups: A Marketing Application
LITERATURE REVIEW
Bolstorff and Rosenbaum (2003), gives the SCM and it depends on the motivation and interest of involved in
SCM. A technology provider may associate SCM with software, like enterprise resource planning, advanced planning and
scheduling systems. Third-party logistics providers align SCM with distribution practices; consulting companies may align
it with their intellectual property based on Bolstorff SCM model.
Halldorsson et al. (2007), states that "there is no such thing as a unified theory of SCM. Depending on each
situation, one can choose a theory as the dominant explanatory theory, and then complement it with one or several of the
other theoretical perspectives". To establish a frame of reference that allows mitigating the gap between the current SCM
research and Practice and the theoretical explanations and to understanding of SCM in practice. Halldorsson theory
describes structure of supply chain and goal achievement of supply chain.
Davis (1989) and Leblanc (1992) extended tree structure method are based on a definition of a within-node
homogeneity measure, unlike Segal's algorithm [E. Segal et.al] which tried to maximize between-node separation called as
expectation maximization (EM) algorithm. In addition to that the decision trees in survival analysis are popularly known as
survival trees and are type of classification and regression tree. Survival tree based analysis is a powerful non-parametric
method of clustering survival data [Sux. G. Fan, J. J] for prognostication to determine importance and effect of various
covariates.
Su and Fan (2004) extended the CART formally known as the classification and regression trees designed by
Brieman et al. (1984) for address tree size selection and other issues related to the formation of the related attribute tree
structure. CART algorithm described as the three different procedural categories like growing a large tree, pruning the
sequence of nested sub-trees, and finally selecting a best-sized tree. In addition to that there are two close approximations
are available in the formation of analysis of correlated failure times when forming CART tree. Marginal approach is the
marginal distribution of correlated failure times is formulated for classification. The other approach is the frailty model to
the regression setting.
Gordon and Olshen (1985) described adaption of CART algorithm. It concise tree-structured recursive
partitioning schemes for classification, probability class estimation, and regression adapted to cover censored survival
analysis. The only assumptions required are those which guarantee identifiability of conditional distributions of lifetime
given covariates. Thus, the techniques are applicable to more general situations than are those of the famous semiparametric model of Cox.
Allen Zaklad et al. (2003) discussed Sustainable Supply chain improvement on the business process improvement,
enabling technology, and social system transformation. They presented a model of supply chain intervention that will
enable you to address the hidden side of supply chain operations in the context of business processes and technology.
Osman. A. H. and Naomie (2012) Improved Semantic Plagiarism Detection Scheme Based on Chi-squared
Automatic Interaction Detection discussed to the semantic text plagiarism detection technique with the help of Chi-squared
Automatic Interaction Detection by the CHAID algorithm [Gilbert Ritschart] in order to select important arguments was
another feature of their method.
www.tjprc.org
editor@tjprc.org
20
Figure 2: Agent Decision Network for Selecting the Supplier

Figure 2 shows that agent decision network for selecting the supplier that maximizes the utility associated with the
supply chain. This has been constructed as agents, and needs to buy two different types of materials to manufacture its
product. Each material is associated with a list of possible supplier agents with it. The reputation node and the offer node
are conditioned on the choice of the supplier where the offer node is modeled by a deterministic node characterize the cost
of the offer proposed by the supplier (sufficient, fair, good, not available). This is determined by comparing the received
offer to the market price and/or evaluating the quality and characteristics of the offered products. The supplier agent
influences the commercial transaction to acquire the specific material, represented by the node Transaction Status. To
determine the utility of the entire sub-chain the transaction utility nodes for all the materials are used together. The
Transaction probabilistic nodes for raw materials, instead, influence the probabilistic node Supply chain, which expresses
the probability that a supply chain can be successfully established.
CHI-SQUARED AUTOMATIC INTERACTION DETECTION

CHAID formally known as the Chi-Square Automatic Interaction Detector is a tree classification methods
algorithm is a relationship between dependent variable and the series of predictor variables. CHAID is a construction of
two branches can attach to a single root or node and it has been particularly popular in marketing research. Both CHAID
and CERT techniques will construct trees of marketing research with respect to the dependent variable such as qualitative
variable and quantitative variable. Its works based on the following steps and procedures for CHAID,
NAAS Rating: 3.80
21

Step 1: Selects a set of predictors and their interactions

Step 2: Predicts the optimal value of the dependent variable
Step 3: Qualitative variable
Step 4: Quantitative variable
Step 5: Get classification tree
Step 6: Based on the CHAID method the formation of main stem of a tree which splits into different branches and
sub branches as per the suppliers and customer availability
Step 7: A series of predictor variables are studied to see if splitting the sample based on these predictors leads to a
statistically significant discrimination in the dependent variable
Step 8: For this Chi square tests and F tests are done and their P values are calculated. If the p values are not
statistically significant, then the algorithm merges the respective predictor variables (or categories in case of categorical
data). If a statistical significance is observed then a split is made. This becomes the first branching of the tree. Then for
each of the groups will formed based on the above constrains
Step 9: At the end of the tree building process we have a series of groups that are significantly different from one
another on the dependent variable
The steps and procedures for CHAID implies a lack of concern for fairness of CHAID, specifically CHAID gives
the optimum prediction, classification of tree based structure also useful for the patterns formation in complicated datasets.
CHAID is one of the oldest tree classification method proposed by Kass in 1980.
IMPROVED CHAID METHOD

CHi-squared Automatic Interaction Detection is the process of test the mining data from the decision tree
structured framework. The new skeleton are proposed in Improved CHAID have two processing schemas such as insertion
of logical rules and knowledge to support the decision tree. Improved CHAID there are two setups have been tested such
as, each logical element have a own decision tree structured as well as single decision tree structured [Xiaogang Su.
Juanjuan Fan] for all the logical elements. Both trees are recognizing by the Improved CHAID with respect to the kernel of
the system. The Improved CHAID is trust with confidence on a decision tree.
The Improved CHAID algorithm is based on data process whereby two or more tree elements are distinguished by
decision tree structure. Decision tree is one common method used in data mining [Anita Prinzie, Dirk Van Den Poel] to
extract predicted information. Morgan and Sonquist uses the regression trees in terms of decision tree for prediction and
explanation process with the help of AID (Automatic Interaction Detection). Generally CHi-squared Automatic Interaction
Detection is discrimination and classification methods followed, based on the same representation paradigm by trees.
Many methods are intended to increase the probability of solving some problem was proposed in past era.
Specially to improve the behavior of the Quinlan's system, leading to the famous C4.5 method. This mobility emerged the
concept of lattice graphs which was popularized by the induction graphs of the SIPINA method.
Rokotamalala et al (2005) explains the principle of the decision tree construction for classification and
discrimination problems based on the SIPINA software. The idea is to represent the empirical distribution of the attribute
www.tjprc.org
editor@tjprc.org
22
to be predicted by each node of the decision tree.
Thus, the tree build favors the more discriminating attributes. Here,
the difficulty is to choose among N attributes characterizing the structure elements that made it possible to have the best
discrimination rate.
Depending upon the prediction of variable candidate and the variable, the characterizing has the two different
conditions of things, generally segmentation and statistical criteria available in the supply chain management. Both are
used to entropy of Shannon and its alternatives. The segmentation makes it possible to define a contingency table crossing
the variable to be predicted and the descriptor candidate.
All those actions happened for the process comprehension. Consider the notations to describe the numbers
resulting from the crossing of the attribute class with V modalities and a descriptor with U methods.
Table 1: Number Table during the Crossing of Two Variables
To evaluate the relevance of a variable in the segmentation, CHAID proposes the independence deviation
defined by the following equation.
(1)
The values of
are not bounded, they are in the range
. The main drawback is the high emphasis of the
descriptors having a high number of modalities. To reduce this negative impact, it is much more suitable to normalize by
the number of freedom degrees. The formula T of Tschuprow has values now in a range [0; 1]. This new equation gives the
Improved CHAID algorithm.
(2)
EXPERIMENTAL RESULTS
Proposed I-CHAID method are tested by the marketing data set attributes named as Supplier Agent, Production
Manager Agent, Dealer Agent, Client Agent, Inventory Agent that containing the data set attribute (Name, quantity, id,
Price, Cost, Phone number, city, State) and that is splinted in to the blocks based on each data set attributes. In this era two
approaches [Jose Manuel Serra, Laurent A. Baumes] are compared using the measures: recall, precision, and error rate with
the proposed system.
NAAS Rating: 3.80
23

The following table 1 represents the three different approaches are compared and shows efficiency quality metrics
like the recall, precision and error rate.
Table 2: Average Efficiency Quality Metrics
Exible and Generic
Approach (Multi-Tagging
Method)
Efficiency quality metrics
(Average)
Error
Recall Precision
rate
95.7%
93.5%
5.9%
Decision-Tree Based
Agent-Based Modeling in Supply Chain
Approach (Data-Mining
Management (I-CHAID method)
Method)
Efficiency quality metrics
Efficiency quality metrics (Average)
(Average)
Error
Recall
Precision
Recall
Precision
Error rate
rate
93.0%
94.0%
6.4%
97.5%
92.2%
4.5%
The Agent-Based Modeling in Supply Chain Management (I-CHAID method) is more efficient to obtain better
results Supply Chain Management. However the results of both methods are close to the proposed system but give the
efficient results.
Figure 3: Efficiency Quality Measure

The Agent-Based Modeling in Supply Chain Management (I-CHAID method) is more accuracy to obtain better
results in Supply Chain Management.
Figure 4: Various SCM Accuracy Comparisons

www.tjprc.org
editor@tjprc.org
24
CONCLUSIONS
To determine the nature and performance of the proposed work for SCM with presence of CHAID were improved
and Bounded extent of the Synthetic dataset is used in terms of Supplier, Production Manager, Dealer, Client and Inventory
attribute set. The Synthetic dataset are managed and classified by the proposed improved CHAID framework. The obtained
results of accuracy and its performance measures of recall, precision and error rate are measured and compared with
existing technique for the input supply chain management dataset. The classification of this algorithm is viewed in tree
structure where the decision tree is classified. Figure 4 gives the accuracy comparison for the proposed framework and
existing technique like Multi-Tagging Method and Data-Mining Method. The accuracy here is measured by sensitivity and
specificity. From the figure 3, the Efficiency quality metrics obtained by the proposed technique Improved CHAID for
SCM in recall is 97.5% and precision is 92.2 % and error rate 4.50% which is better when compared with existing
technique of Exible and Generic Approach and Decision-Tree Based Approach. The comparison graph for Efficiency
quality measure attributes are calculated by sensitivity and specificity is shown in figure 3. From the figure 4 proves that
accuracy of proposed improved CHAID with SCM gives maximum while comparing with existing technique for SCM.
REFERENCES
1.
Gordon and Olshen, Tree-structured survival analysis, Cancer Treatment Reports, Journal Article, Research Support, U.S.
Gov't, Non-P.H.S., Research Support, U.S. Gov't, P.H.S. [1985, 69(10):1065-1069].
2.
Xiaogang Su, Juanjuan Fan Multivariate Survival Trees: A Maximum Likelihood Approach Based on Frailty Models,
Biometrics journal of the international Biometrics society, Volume 60, Issue 1, pages 9399, March 2004.
3.
A. Bela, T. Moinel, Y. Rangoni Improved CHAID Algorithm for Document Structure Modeling
4.
Camarinha-Matos, LM. andAfsarmanesh, H, Collaborative networked organizations: a research agenda for emerging
business models. Massachusetts, Kluwer Academic, 2004.
5.
Lee, J.-H., and Kim C.-O. (2008). "Multi-agent systems applications in manufacturing systems and supply chain management:
a review paper." International Journal of Production Research 46(1): 233-265.
6.
Moyaux, T., Chaib-Draa, B., and D'Amours, S. (2007). "Information sharing as a coordination mechanism for reducing the
bullwhip effect in a supply chain." Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews 37(3):
396-409.
7.
Frayret, J.-M., D'Amours, and Montreuil, B. (2004a). "Coordination and control in distributed and agent-based
manufacturing systems." Production Planning & Control 15(1): 1-13.
8.
Forget, P., D'Amours, S., Frayret, J.-M., and Gaudreault, J. (2008b). Design of multi-behavior agents for supply chain
planning: an application to the lumber industry. Supply Chains: Theory and Application. V. Kordic, I-TECH Education and
Publishing: 551-568.
9.
Bolstorff, R. R., and Rosenbaum, R. (2003). Supply chain excellence. New York, AMACOM.
10. Halldorsson, A., Kotzab, H., Mikkola, J. H., and Skjott-Larsen, T. (2007). "Complementary theories to supply chain
management." Supply Chain Management: An International Journal 12(4): 284-296.
11. Davis, R. and Anderson, J. (1989): Exponential survival trees, Statistics in Medicine 8, pp 947-962.
12. Lebalanc, M.; Crowlry, L. (1992): Relative risk trees for censored survival data, Biometrics. v48. 411-425.
13. Su, X. G.; Fan, J. J. (2004): Multivariate survival trees: a maximum likelihood approach based on frailty models, Biometrics
NAAS Rating: 3.80
25

60, pp. 93-99.

14. Kaplan, E. L.; Meier, Paul. (1958): Nonparametric estimation from incomplete observations, J. Am. Stat. Assoc. 53, 457-481.
15. Kass G. (1980), an exploratory technique for investigating large quantities of categorical data, Applied Statistics 29(2),
119-127, 1980.
16. Anita Prinzie, Dirk Van den Poel, WITHDRAWN: Constrained optimization of data-mining problems to improve model
performance: A direct-marketing application, http://www.researchgate.net/publication/222555006.
17. Gilbert Ritschard CHAID and Earlier Supervised Tree Methods, Publications rcentes du Dpartementdconomtrie,
http//www.unige. ch/ses/metri /cahiers.
18. HongwenZheng, Yanxia Zhang Feature selection for high dimensional data in astronomy Accepted for publication in
Advances of Space Research, arXiv:0709.0138v1 [astro-ph] 3 Sep 2007
19. Jose Manuel Serra, Laurent A. Baumes, Zeolite synthesis modelling with support vector machines: A combinatorial
approach, publication at: http:// www.researchgate.net /publication/6537094
20. C. P. Balasubramaniam, V. Thigarasu Agent-based Modeling in Supply Chain Management using Improved C4-5,
Research Journal of Applied Sciences, Engineering and Technology 9(2): 91-97, 2015, Maxwell Scientific Organization,
2015Davis, R. and Anderson, J. (1989): Exponential survival trees, Statistics in Medicine 8, pp 947-962.
21. E. Segal, M. Shapira, A. Regev, D. Peer, D. Botstein, D. Koller and N. Friedman, Nat Genet, 2003, 34, 166176.
22. Allen Zaklad, Richard McKnight, Alan Kosansky, Jim Piermarini A New Approach to Sustainable Supply Chain Excellence
www.profitpt.com, Profit Point Inc (866) 347-1130 2003.
23. Rakotomalala R. TANAGRA, Une Plate-Formed Exprimentation pour la Fouille de Donnes, Revue MODULAD, 32, pp. 7085, 2005.
www.tjprc.org
editor@tjprc.org

Mathematics - Ijmcar - Improved Chaid Algorithm For Classifying Customer Groups

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Mathematics - Ijmcar - Improved Chaid Algorithm For Classifying Customer Groups

Hochgeladen von

Copyright:

Verfügbare Formate

International Journal of Mathematics and

Computer Applications Research (IJMCAR)

IMPROVED CHAID ALGORITHM FOR CLASSIFYING

Research Scholar, Karpagam University, Coimbatore, Tamil Nadu, India

CHAID with respect to supply chain management.

C. P. Balasubramaniam & V. Thigarasu

Figure 1: Supply Chain Model for Three Level of Management Mechanism

NAAS Rating: 3.80

Improved CHAID Algorithm for Classifying

C. P. Balasubramaniam & V. Thigarasu

Figure 2: Agent Decision Network for Selecting the Supplier

CHI-SQUARED AUTOMATIC INTERACTION DETECTION

Impact Factor (JCC): 4.6257

NAAS Rating: 3.80

Improved CHAID Algorithm for Classifying

Step 1: Selects a set of predictors and their interactions

IMPROVED CHAID METHOD

C. P. Balasubramaniam & V. Thigarasu

to be predicted by each node of the decision tree.

are not bounded, they are in the range

. The main drawback is the high emphasis of the

NAAS Rating: 3.80

Improved CHAID Algorithm for Classifying

Figure 3: Efficiency Quality Measure

Figure 4: Various SCM Accuracy Comparisons

C. P. Balasubramaniam & V. Thigarasu

NAAS Rating: 3.80

Improved CHAID Algorithm for Classifying

60, pp. 93-99.

Das könnte Ihnen auch gefallen