Beruflich Dokumente
Kultur Dokumente
RFM Analysis
Group 8
13104
Abhishek Roy
13105
Abhishek Sekhri
13158
VarunKhanna
13255
Tushar Sharma
13623
KumariVandana
13710
Debmalya Paul
Objectives
Identify the most profitable customers from the transaction data of a fashion retail store using
RFM model and to do the profiling of those customers.
Procedure Followed
Stage
Stage
Stage
Stage
Stage
1
2
3
4
5
Recency
Frequency
Monetary
The recency column is the difference of the number of days between todays date and the last
date of customers sales. We identified only the sales records by eliminating the return details.
We tried to identify the entries having both sales and return details for same barcode. We
compared the quantities of purchase and return and eliminated those entries whose complete
purchase was returned by the customers. This was done manually and we found 92 entries.
The Frequency column is the total number of purchases the customer made over the period.
The Monetary column is the total amount the customer purchased minus the total amount of
return the customer made over the period.
The t-SQL codes we used to create the table for RFM was:
Insertinto dbo.Transactions_RFM(CustomerID,RECENCY,FREQUENCY)
Select
CustomerID,
DATEDIFF(DAY,Max(DocumentDate),GETDATE()) RECENCY,
COUNT(CustomerID) FREQUENCY
-Sum([IssuingplantPrice]) Monetary
from
dbo.Transactions_Data
where
TransType ='Sale'
groupby
CustomerID
orderby
CustomerID
Update
set
where
dbo.Transactions_Data
IssuingplantPrice = 0 - IssuingplantPrice
TransType ='Return'
ALtertable dbo.Transactions_RFM
add Monetary float
Insertinto dbo.Transactions_RFM.monetary
Select
A.CustomerID,
temp.Monetary
from
dbo.Transactions_RFM as A JOIN
(Select
CustomerID,Sum([IssuingplantPrice]) Monetary
from
dbo.Transactions_data
groupby
CustomerID
)as temp
ON A.CustomerID = temp.CustomerID
orderby A.CustomerID
Transactions_RFM.xl
sx
After creating the table, we sorted the records in the three columns in ascending order one by one
and provided scores as follows:
The first 20% records of Recency was given score of 5 and last 20% with score of 1
The first 20% records of Frequency was given score of 1 and last 20% with score of 5
The first 20% records of Monetary was given score of 1 and last 20% with score of 5
After providing the scores to the three factors, we decided the weights of each factor based on
the literature.
Customer_Segmenta
tion.xlsx
Observations
There are total of 163 records taken for sample. We have taken three input variables called
Age, Cityname, TimeSinceLastVisited. The float numbers in the bracket contain the
average of the three parameters taken as input. Two categories of premium have been created.
Gold: If TimeSinceLastVisit<2166.5 and city is one of agra,ghaziabad,new delhi and
Age>=42.5.
Silver: If TimeSinceLastVisit>=2166.5 and city is one of agra, baroda, faridabad, ghaziabad,
kanpur, mathura,new delhi and Age< 42.5.
Above are mere examples of node 1 and 4. In the similar fashion we divide the records into
several groups.
Roote node error which is 0.52
A decision tree is built on the basis of rules which have three parameters namely segment, cover
and probability. segment parameter value has already been calculated as per the classification
made above. Prob stands for probability of new customer being gold or silver if he is meeting
the respective criteria.
The snapshot of decision tree tells us that nodes, namely, 16,34,70,71,18,19,5,3 have been found
to have the rules applied on them while remaining nodes have no rules applied to them.
Recommendations
The most profitable customers should be offered discount on the date of birth.
Most of the customers belong to Faridabad. So, special offers should be offered to them
during low sales period i.e., non-festive periods.
Most of the data while analyzing have the customer id =0. So, care should be taken while
capturing customer id. So, most profitable customers can be included in loyalty program
and special offers.