Sie sind auf Seite 1von 5

2008 International Conference on Information Management, Innovation Management and Industrial Engineering

Application of Data Mining Classification Algorithms in Customer Membership


Card Classification Model

Lin Zhang, Yan Chen, Yan Liang, Nan Li


College of Transportation and Management,
Dalian Maritime University, Dalian, LiaoNing, 116026, China
joslin_1984@yahoo.cn

Abstract classifying rule for each category, and then use the
classifying rule to classify records in other databases.
This paper uses data mining classification algorithms-- The Food Mart is an international chain store. It is a
C5.0 and CART algorithms to get useful information to mixture of “70% food +20% daily necessities +10%
decision-making out of customers’ transaction behaviors. styles”, which has some especial characteristics especially
Firstly, by business understanding, data understanding in the style and taste. The store has made a high-taste
and data preparing, modeling and evaluating we get the shopping environment for its customers and has made a
results of the two algorithms and by comparing the lot of efforts on its operating and personnel disposition in
results ,we know that the two algorithms can both be order to attract more and more high-level consumers.
applied in the customer membership card classification The international chain store implements membership
model and can obtain a quite accurate result. Then we card system management which is helpful not only to
introduce the application of this model. Through analysis, accumulate the customer’s information but also to offer
we get to know customers’ income level and children corresponding service for different card-rank users. From
number are the two main factors to affect them to choose this way we can enhance customers' loyalty to the store.
cards. Knowing that, enterprises can take corresponding Therefore, so as to recommend corresponding card to the
measures, such as dividing customers into different appropriate customer, senior managers want to obtain
groups and then recommending the corresponding card to different card-rank customers’ characteristics and which is
the customer who has the similar characteristics. By this the most important factor that affects the customers to
means, enterprises can provide special service to different choose this kind of card not that kind.
card rank users in order to attract more and more SPSS Clementine is an open data mining tool and has
customers. won the British government SMART innovation prize
twice. It not only supports the entire data mining flow
which composes of getting data 、 transferring data 、
1. Introduction
modeling、evaluating and deploying, but also supports
the accepted data mining standard-- CRISP-DM(Cross-
Data mining[1] is the process of discovering interesting Industry Standard Process Data Mining). The
knowledge from large amounts of data stored either in visualization of Clementine makes “thought” possible,
databases, data warehouses, or other information that is, programmers can concentrate on the to-be-solved
repositories. In other words, the data you wish to analyze problem itself but not be limited to some technical work
by data mining techniques are incomplete (lacking (e.g. coding). It has also provides kinds of graphic
attribute values or certain attributes of interest, or techniques which are helpful to understand the key
containing only aggregate data), noisy (containing errors, relation between data and can instruct users to find the
or outlier values which deviate from the expected), and final solution by the most convenient way.
inconsistent (e.g. containing discrepancies in the
department.
The classification analysis is by analyzing the data in 2. Classification Algorithm
the demonstration database, to make the accurate
description or establish the accurate model or mine the Decision Tree is an important model to realize the
classification. It was a learning system—CLS builded by

978-0-7695-3435-0/08 $25.00 © 2008 IEEE 211


DOI 10.1109/ICIII.2008.168
Hunt etc, when they researched on human concept enterprise with a clear goal to recommend the card to a
modeling early in the 1960s. To the late 70s, new customer.
J.Ross.Quinlan put forward the ID3 algorithm. In 1975 This chain store has four kinds of membership cards.
some people put forward CHAID and the CART What the businessmen concern about is how to
algorithm. In 1986, J.C.Schlimmer put forward the ID4 recommend each card. Therefore, they need to have an
algorithm. In 1988, P.E.U.tgoff put forward the ID5R analysis of the historical data of the customers who have
algorithm. In 1993, Quinlan developed C4.5/C5.0 already have cards so that they can get the common
algorithm on basis of the ID3 algorithm[2-6]. Below is the characteristics of users of each type of card. When
detailed explanation of the C5.0 algorithm and the CART recommending card to a new customer, they may
algorithm which are used in this paper. determine that the customer who has the common
①C5.0 algorithm characteristic would probably be the corresponding card
C5.0 algorithm is the algorithm in the Clementine user.
decision tree model. C5.0 is the classification algorithm
which applies in big data set. C5.0 is better than C4.5 on 4. Data Understanding and Data Preparing
the efficiency and the memory.
The C5.0 model can split samples on basis of the Before modeling, we need to know which fields the
biggest information gain field. The sample subset that is data set does have, how these fields distribute, and
get from the former split will be split afterward. The whether they between conceal correlation between each
process will continue until the sample subset cannot be other and so on. Only having understood these
split and is usually according to another field. Finally, information, can we decide that using which fields to do
examine the lowest level split, those sample subsets that data mining has actual significance.
don’t have remarkable contribution to the model will be Through the analysis of the customer basic information
rejected or the trimmed. table's attributes, we need to add the Filter module of
② CART algorithm Field Ops into the data stream, and remove the
Classification and Regression Trees (CART) is one of unnecessary fields. For the fields that have been selected,
the classification algorithms. It is a flexible method to set their directions in the Direction option. Here, we set
describe how the variable Y distributes after assigning the membership card type is the output option, others input
forecast vector X. option.
This model uses the binary tree to divide the forecast
space into certain subsets on which Y distribution is
continuously even. Tree's leaf nodes correspond to
different division areas which are determined by Splitting
Rules relating to each internal node. By moving from the
tree root to the leaf node, a forecast sample will be given
an only leaf node, and Y distribution on this node also be
determined. Figure1. Member card classification type setting
This paper chooses SPSS Clementine tool and the
accepted standard---CRISP-DM and applies classification In the data understanding stage, by means of analyzing
algorithm to the international chain store. the characteristics of the primary data, we can make
further understanding of the data distribution and the
3. Business Understanding affect of various factors to membership card type.
Through the preliminary analysis, we get the statistics of
In commercial operation, using the membership card customer records in the database, the golden card
service is the most superior method to help the customers account for 11.65%, the silver card customers
businessmen to accumulate the customers’ information. 9.34%, the bronze card customers 55.47%, the normal
On the one hand they may obtain customers’ basic customers 23.54%.
information to maintain long-term contact with them. On The following various charts demonstrates the
the special day, like the holiday or the customer’s influence that each condition affects the decision-making
birthday, they can by delivering a warm blessing, promote attribute.
customers’ satisfaction. On the other hand, they may
through the customer’s transaction information, like the
purchase volume, the purchase frequency, analyze what is
the value of the customer to the enterprise and analyze the
characteristics of each kind card customers to help the

212
Figure2. Effect of member card’s classification by
country factor

For the customers investigated, the proportion that


accounts for biggest is American customers, next is
Canadians, the smallest is Mexicans.
Figure7. Effect of member card’s classification by
house owner status factor

For the golden card and silver card customers, most of


them have houses. However, for the bronze and normal
card customers, the amount of people who own houses is
Figure3. Effect of member card’s classification by basically equal to that of the people who don’t have
marriage status factor houses.

In the group of golden card customers, the amount of


the married is larger than that of the unmarried. In other
5. Modeling
groups, the amount of the married is basically the same to
In the data source, after setting the type option and
that of the unmarried.
Filter option, we select the attribute items that are needed
while classifying, and add the Sample module to make a
random sampling ,extracting 70% data from the source
data as the training set, the left over 30% as the
examination set. Here we use C5.0 and the CART
algorithm separately to carry on the classification. The
membership card classification flow chart is as follows:

Figure4. Effect of member card’s classification by


income factor

Normal card users are mainly low income customers


whose yearly income is from 10,000 to 30,000. However,
bronze card users are mainly medium income customers
whose yearly income is from 30,000 to 90,000. Figure8. Flow chart of member card classification
model

Classification result of C5.0 algorithm:

Figure5. Effect of member card’s classification by


number of children at home factor

Most of silver, bronze, normal card users don’t have


children. However, most of the golden card customers
have more than three children.

Figure6. Effect of member card’s classification by


education factor Figure9. Classification result of C5.0 model

For the normal card users, their education background Rule summary: Customer whose yearly income is from
is worse compared to other card users. 10,000 to 30,000 is the normal card customer; Customer

213
whose yearly income is from 30,000 to 150,000, and Figure11. Comparison result of C5.0 and C&RT
having less than 3 children is the bronze card customer, model by gains parameter
having more than 3 children is the golden card customer;
Customer whose yearly income is more than 150,000, and The abscissa is usually for the quantile (according to
unmarried is the silver card customer, married is the confidence descending sequence), the y-coordinate is
golden card customer. Thus, the main factors that have a accumulation Gains.
big influence on customer rank is the income, child The ideal Gains chart should achieve high
number and the marital status. accumulation Gains at high in the earlier period, tend very
Classification result of C&RT algorithm: quickly to 100% and then stay steady.
Response parameter comparison: Response = (number
of hits of quantile accumulation number of hits/ number of
quantile sample) ×100%

Figure10. Classification result of C&RT model

Rule summary: Customer whose yearly income is from


10,000 to 30,000 is the normal card customer; Customer
whose yearly income is from 30,000 to 150,000, and
having less than 3 children is the bronze card customer;
Customer whose yearly income is more than 30,000, and
Figure12. Comparison result of C5.0 and C&RT
having more than 3 children is the golden card customer;
model by response parameter
Customer whose yearly income is more than 150,000, and
having less than 3 children is the silver card customer. The abscissa is usually for the quantile (according to
Comparing the two classification algorithm we can confidence descending sequence), the y-coordinate is
reach such a conclusion: The main factors that have a big accumulation--Response.
influence on customer ranks are the income and child The ideal Response chart should maintain a section in
number. Normal card customers, bronze card customers the high accumulation, and then rapidly drop.
and golden card customers that are obtained from the two Model evaluation: From the chart we can see that the
methods are basically the same. The two results only have classified effects of the two methods are basically the
a little difference on silver card customers. The result of same. Therefore, these two kinds of models can be used in
C5.0 is customer whose yearly income is more than the card classification.
150,000 and unmarried is the silver card customer, while (2)Quantitative analysis: From the following chart we
the result of C&RT is customer whose yearly income is can see that the accuracy of C5.0 algorithm model is
more than 150,000, and having less than 3 children is the 82.53%, C&RT 82.24%, the uniformity of the two kinds
silver card customer. of models is 98.75%.

6. Evaluating

In this paper, we are from qualitative and quantitative


perspectives to evaluate and compare the two algorithm
models.
(1) Qualitative analysis: We select Gains and the
Response parameter to carry on comparison.
The Gains parameter:
Gains= number of hits of quantile accumulation /total
number of hits ×100%

Figure13. Quantitative comparison result of C5.0 and


C&RT model

214
Therefore, these two algorithms may have a very good
application in the customer membership card
classification model and obtain a quite accurate result.

7. Application

As we have known, the main factors affecting the card


ranks are the income and child number. Therefore, we are
able to take the two attributes as the main standards to
recommend card to a new customer. Moreover, we can
refer to other attributes to help judge which card to
recommend.

8. Conclusions

Applying data mining classification algorithm in the


customer membership card classification model can help
the enterprise with a clear goal to recommend the
corresponding membership card to the customer in order
to provide the special service for each kind of card users,
and is helpful to enterprise's development.

9. References

[1] Jiawei Han, Michelin Kamber, “Data Mining Concepts and


Techniques” [M], Morgan Kaufmann publishers, USA,
2001, 70-181.

[2] Hunt E B, J Marin, Stone P T, “Experiments in Induction”,


[M], Academic Press, 1996.

[3] Quinlan J R, “Induction of Decision Trees”, [J], Machine


Learning, 1986.

[4] Quinlan J R, “C4.5: Programs for Machine Learning”, [M],


San Mateo, CA: Morgan Kaufman Publishers, 1993.

[5] Quinlan J R, “Programs for Machine Learning”, [M],


Morgan Kaufman, 1992.

[6] Quinlan J R, “Bagging, Boosting and C4.5”, [A], In


Proceedings of the 13th National Conference Artificial
Intelligence, Portland, Ore, 1996:725-730.

215

Das könnte Ihnen auch gefallen