Data Mining

DMBI Assignment
RFM Analysis
Group 8
13104
Abhishek Roy
13105
Abhishek Sekhri
13158
VarunKhanna
13255
Tushar Sharma
13623
KumariVandana
13710
Debmalya Paul
Objectives
Identify the most profitable customers from the transaction data of a fashion retail store using
RFM model and to do the profiling of those customers.
Procedure Followed
Stage
Stage
Stage
Stage
Stage
1
2
3
4
5
Cleaning the transaction data

Designing the RFM Model
Segmenting the customers based on the RFM score
Cluster analysis of most profitable customers
Profiling of the customers
Cleaning the transaction data

The first step was to analyzing the data sheet provided (transaction data). The following
observations were made in the data.
1. Data sets where repetitive i.e. entries where repetitive and all the data fields where
identical. Hence data cleaning was needed to remove duplicate entries. Care should be
taken that only those entries are removed which have all the columns identical. Thus in
MS Excel we selected the entire data and selected Remove Duplicates command.
2. In the data sheet, some of the customers ID had default value 0. Thus, it was important to
remove those records. We filtered the column CustomersID and unselect the entry 0. In
that case all the records with consistent customer IDs were selected.
3. In the column IssuingplantPrice, some of the values were negative and those records
were the sales data. So, it was important to eliminate those ambiguous entries.
Designing the RFM Model

For designing the RFM Model, we have identified three fields.
Recency
Frequency
Monetary
The recency column is the difference of the number of days between todays date and the last
date of customers sales. We identified only the sales records by eliminating the return details.
We tried to identify the entries having both sales and return details for same barcode. We
compared the quantities of purchase and return and eliminated those entries whose complete
purchase was returned by the customers. This was done manually and we found 92 entries.
The Frequency column is the total number of purchases the customer made over the period.
The Monetary column is the total amount the customer purchased minus the total amount of
return the customer made over the period.
The t-SQL codes we used to create the table for RFM was:
Insertinto dbo.Transactions_RFM(CustomerID,RECENCY,FREQUENCY)
Select
CustomerID,
DATEDIFF(DAY,Max(DocumentDate),GETDATE()) RECENCY,
COUNT(CustomerID) FREQUENCY
-Sum([IssuingplantPrice]) Monetary
from
dbo.Transactions_Data
where
TransType ='Sale'
groupby
CustomerID
orderby
CustomerID
Update
set
where
dbo.Transactions_Data
IssuingplantPrice = 0 - IssuingplantPrice
TransType ='Return'
ALtertable dbo.Transactions_RFM
add Monetary float
Insertinto dbo.Transactions_RFM.monetary
Select
A.CustomerID,
temp.Monetary
from
dbo.Transactions_RFM as A JOIN
(Select
CustomerID,Sum([IssuingplantPrice]) Monetary
from
dbo.Transactions_data
groupby
CustomerID
)as temp
ON A.CustomerID = temp.CustomerID
orderby A.CustomerID
The records of the table Transactions_RFMhas been attached below.
Transactions_RFM.xl
sx
After creating the table, we sorted the records in the three columns in ascending order one by one
and provided scores as follows:
The first 20% records of Recency was given score of 5 and last 20% with score of 1
The first 20% records of Frequency was given score of 1 and last 20% with score of 5
The first 20% records of Monetary was given score of 1 and last 20% with score of 5
After providing the scores to the three factors, we decided the weights of each factor based on
the literature.
RFM score = 0.4*Recency+0.25*Frequency+0.35*Monetary
Segmenting the customers

The top 5% customers based on their RFM scores were assigned with Platinum. The next 5%
were assigned in the Gold category and the next 10% were assigned in the Silver category.
The customers segmentation has been attached below:
Customer_Segmenta
tion.xlsx
Cluster analysis of profitable customers

We defined profitable customers as one who visited the firm most recently and the age group to
which they belong. The younger they are, the more profitable they are considered. By using the
rattle tool we found that the optimum number of clusters for our case is three. So we divided the
sample records into three clusters.
First we include the rattle library then we run the rattle function. We define the input parameters.
Select the k-means clustering algorithm and the desired number of clusters. We set the iterations.
Then we click on the execute button to get the desired cluster formation. We then analyze the
salient characteristics of cluster and name them as close to their features as possible.
Profiling of the customers

We did the profiling on the basis of probabilities of various rules in the decision tree. These are
probabilities are chances of conversion of newly arrived customer if he has above mentioned
characteristics. If the probability is greater than 50% then we consider the customer as our
prospective customer. Higher the probability of his conversion, higher is his place in the priority
list of approaching the customer.
Observations
There are total of 163 records taken for sample. We have taken three input variables called
Age, Cityname, TimeSinceLastVisited. The float numbers in the bracket contain the
average of the three parameters taken as input. Two categories of premium have been created.
Gold: If TimeSinceLastVisit<2166.5 and city is one of agra,ghaziabad,new delhi and
Age>=42.5.
Silver: If TimeSinceLastVisit>=2166.5 and city is one of agra, baroda, faridabad, ghaziabad,
kanpur, mathura,new delhi and Age< 42.5.
Above are mere examples of node 1 and 4. In the similar fashion we divide the records into
several groups.
Roote node error which is 0.52
A decision tree is built on the basis of rules which have three parameters namely segment, cover
and probability. segment parameter value has already been calculated as per the classification
made above. Prob stands for probability of new customer being gold or silver if he is meeting
the respective criteria.
The snapshot of decision tree tells us that nodes, namely, 16,34,70,71,18,19,5,3 have been found
to have the rules applied on them while remaining nodes have no rules applied to them.
Recommendations
The most profitable customers should be offered discount on the date of birth.
Most of the customers belong to Faridabad. So, special offers should be offered to them
during low sales period i.e., non-festive periods.
Most of the data while analyzing have the customer id =0. So, care should be taken while
capturing customer id. So, most profitable customers can be included in loyalty program
and special offers.

Data Mining

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Data Mining

Hochgeladen von

Copyright:

Verfügbare Formate

DMBI Assignment

Cleaning the transaction data

Cleaning the transaction data

Designing the RFM Model

The records of the table Transactions_RFMhas been attached below.

RFM score = 0.4Recency+0.25Frequency+0.35*Monetary

Segmenting the customers

Cluster analysis of profitable customers

Profiling of the customers

Das könnte Ihnen auch gefallen