You are on page 1of 9

DATA MINING AND BUSINESS INTELLIGENCE

RFM Analysis
Using R and R Studio
Group 9

Alok Simha 13304, Pratheek PS 13338 Sailesh Kumar 13345 Sreenidhish N


13352 Sesank Numburi 13448

Introduction to RFM Analysis


RFM is widely used by direct marketers of all types for selecting which customers to target
offers to. The fundamental premise underlying RFM analysis is that customers who have
purchased recently, have made more purchases and have made larger purchases are more
likely to respond to your offering than other customers who have purchased less recently,
less often and in smaller amounts. RFM analysis can also be used to target special offers to
welcome new customers, encourage small purchasers to spend more, to reactivate lapsed
customers, or encourage other marketing initiatives.
RFM analysis uses information about customers past behavior that is easily tracked and
readily available. Recency is how long ago the customer last made a purchase. Frequency is
how
many purchases the customer has made (sometimes within a specified time period, such as
average number of purchases per year). Monetary is total dollars spent by the customer
(again,
sometimes within a specified time period).

Methodology
1) RFM is performed using RStudio. This included the following steps in data
preparation:
- Entries with customer ID 0 were removed as they were not part of the
loyalty program and could not be uniquely identified.
- Data type conversion was performed for R, to identify the character value
as date.

2) Once RFM was done, clustering was performed using RDataMiner.

Results of RFM Analysis

Result of K-Mean Clustering

Naming of Clusters
Recency

Frequency Monetary

Cluster
Name

High

Low

High

High

Low

Low

Heavy
Spenders
First
16.7%
Time

Low

High

High

Churn

Percentage
compositio
n
21.3%

34.1%

High

High

Low

Shoppers 27.9%

Recommendations

1)

Heavy Spenders:

2)

First Timers:

These are customers who spend heavily when they visit the
store. Store cannot afford to lose them. Care must be taken to retain the customers
by providing them excellent customer service when they visit the store.
The goal is to make these customers visit the store again. This
could be done by providing an excellent customer experience the first time. Ensure
there is direct marketing for special sales. Encourage them to provide feedback.

3) Churn:

These customers are customers moving out of the store. They should be
encouraged to come back to the store. The factors leading to churn could be
analysed. The purchase data of these customers could be analysed to send
personalised and dedicated offers.

4) Shoppers:

These are customers who are visiting the store regularly, though they
may not spend much. These are cash cows for the store and could be retained by
delivering value on every visit. Minor offers, discounts and excellent customer service
are sure ways to retain and grow this cluster of customers.

Limitations
Since the clustering was done through R, individual records belonging to each cluster could
not be identified.

R Code for RFM Analysis


f = read.table("RFM.txt", header = F)
> df <- as.data.frame(cbind(df[,1],df[,2],df[,3]))
> name <- c("ID","Date","Amount")
> names(df) <- name
> df[,2] <- as.Date(as.character(df[,2]),%Y%m%d)
> head(df)
> startDate <- as.Date(19970101?,%Y%m%d)
> endDate <- as.Date(19980701?,%Y%m%d)
> df <- getDataFrame(df,startDate,endDate)
> head(df)
> df1 <-getIndependentScore(df)
> head(df1[-(2:3)])
> drawHistograms(df1)
> par(mfrow = c(1,3))
> hist(df$Recency)
> hist(df$Frequency)

> hist(df$Monetary)

Functions
getDataFrame
<function(df,startDate,endDate,tIDColName="ID",tDateColName="Date",tAmountColName="
Amount"){
+
+ #order the dataframe by date descendingly
+ df <- df[order(df[,tDateColName],decreasing = TRUE),]
+
+ #remove the record before the start data and after the end Date
+ df <- df[df[,tDateColName]>= startDate,]
+ df <- df[df[,tDateColName]<= endDate,]
+
+ #remove the rows with the duplicated IDs, and assign the df to a new df.
+ newdf <- df[!duplicated(df[,tIDColName]),]
+
+
# caculate the Recency(days) to the endDate, the smaller days value means more
recent
+ Recency<-as.numeric(difftime(endDate,newdf[,tDateColName],units="days"))
+
+ # add the Days column to the newdf data frame
+ newdf <-cbind(newdf,Recency)
+
+ #order the dataframe by ID to fit the return order of table() and tapply()
+ newdf <- newdf[order(newdf[,tIDColName]),]
+
+ # caculate the frequency
+ fre <- as.data.frame(table(df[,tIDColName]))
+ Frequency <- fre[,2]
+ newdf <- cbind(newdf,Frequency)
+
+ #caculate the Money per deal
+ m <- as.data.frame(tapply(df[,tAmountColName],df[,tIDColName],sum))
+ Monetary <- m[,1]/Frequency
+ newdf <- cbind(newdf,Monetary)
+
+ return(newdf)
+
+}
> getIndependentScore <- function(df,r=5,f=5,m=5) {
+
+ if (r<=0 || f<=0 || m<=0) return
+

+ #order and the score


+ df <- df[order(df$Recency,-df$Frequency,-df$Monetary),]
+ R_Score <- scoring(df,"Recency",r)
+ df <- cbind(df, R_Score)
+
+ df <- df[order(-df$Frequency,df$Recency,-df$Monetary),]
+ F_Score <- scoring(df,"Frequency",f)
+ df <- cbind(df, F_Score)
+
+ df <- df[order(-df$Monetary,df$Recency,-df$Frequency),]
+ M_Score <- scoring(df,"Monetary",m)
+ df <- cbind(df, M_Score)
+
+ #order the dataframe by R_Score, F_Score, and M_Score desc
+ df <- df[order(-df$R_Score,-df$F_Score,-df$M_Score),]
+
+ # caculate the total score
+ Total_Score <- c(100*df$R_Score + 10*df$F_Score+df$M_Score)
+
+ df <- cbind(df,Total_Score)
+
+ return (df)
+
+ } # end of function getIndependentScore
> scoring <- function (df,column,r=5){
+
+ #get the length of rows of df
+ len <- dim(df)[1]
+
+ score <- rep(0,times=len)
+
+ # get the quantity of rows per 1/r e.g. 1/5
+ nr <- round(len / r)
+ if (nr > 0){
+
+
# seperate the rows by r aliquots
+
rStart <-0
+
rEnd <- 0
+
for (i in 1:r){
+
+
#set the start row number and end row number
+
rStart = rEnd+1
+
+
#skip one "i" if the rStart is already in the i+1 or i+2 or ...scope.
+
if (rStart> i*nr) next
+
+
if (i == r){
+
if(rStart<=len ) rEnd <- len else next
+
}else{
+
rEnd <- i*nr
+
}
+
+
# set the Recency score
+
score[rStart:rEnd]<- r-i+1
+

+
# make sure the customer who have the same recency have the same score
+
s <- rEnd+1
+
if(i<r & s <= len){
+
for(u in s: len){
+
if(df[rEnd,column]==df[u,column]){
+
score[u]<- r-i+1
+
rEnd <- u
+
}else{
+
break;
+
}
+
}
+
+
}
+
+
}
+
+ }
+ return(score)
+
+ } #end of function Scoring
>
> getScoreWithBreaks <- function(df,r,f,m) {
+
+ ## scoring the Recency
+ len = length(r)
+ R_Score <- c(rep(1,length(df[,1])))
+ df <- cbind(df,R_Score)
+ for(i in 1:len){
+
if(i == 1){
+
p1=0
+
}else{
+
p1=r[i-1]
+
}
+
p2=r[i]
+
+
if(dim(df[p1<df$Recency & df$Recency<=p2,])[1]>0) df[p1<df$Recency &
df$Recency<=p2,]$R_Score = len - i+ 2
+ }
+
+ ## scoring the Frequency
+ len = length(f)
+ F_Score <- c(rep(1,length(df[,1])))
+ df <- cbind(df,F_Score)
+ for(i in 1:len){
+
if(i == 1){
+
p1=0
+
}else{
+
p1=f[i-1]
+
}
+
p2=f[i]
+
+
if(dim(df[p1<df$Frequency & df$Frequency<=p2,])[1]>0) df[p1<df$Frequency &
df$Frequency<=p2,]$F_Score = i
+ }
+ if(dim(df[f[len]<df$Frequency,])[1]>0) df[f[len]<df$Frequency,]$F_Score = len+1

+
+ ## scoring the Monetary
+ len = length(m)
+ M_Score <- c(rep(1,length(df[,1])))
+ df <- cbind(df,M_Score)
+ for(i in 1:len){
+
if(i == 1){
+
p1=0
+
}else{
+
p1=m[i-1]
+
}
+
p2=m[i]
+
+
if(dim(df[p1<df$Monetary & df$Monetary<=p2,])[1]>0) df[p1<df$Monetary &
df$Monetary<=p2,]$M_Score = i
+ }
+ if(dim(df[m[len]<df$Monetary,])[1]>0) df[m[len]<df$Monetary,]$M_Score = len+1
+
+ #order the dataframe by R_Score, F_Score, and M_Score desc
+ df <- df[order(-df$R_Score,-df$F_Score,-df$M_Score),]
+
+ # caculate the total score
+ Total_Score <- c(100*df$R_Score + 10*df$F_Score+df$M_Score)
+
+ df <- cbind(df,Total_Score)
+
+ return(df)
+
+ } # end of function of getScoreWithBreaks
> drawHistograms <- function(df,r=5,f=5,m=5){
+
+ #set the layout plot window
+ par(mfrow = c(f,r))
+
+ names <-rep("",times=m)
+ for(i in 1:m) names[i]<-paste("M",i)
+
+
+ for (i in 1:f){
+
for (j in 1:r){
+
c <- rep(0,times=m)
+
for(k in 1:m){
+
tmpdf <-df[df$R_Score==j & df$F_Score==i & df$M_Score==k,]
+
c[k]<- dim(tmpdf)[1]
+
+
}
+
if (i==1 & j==1)
+
barplot(c,col="lightblue",names.arg=names)
+
else
+
barplot(c,col="lightblue")
+
if (j==1) title(ylab=paste("F",i))
+
if (i==1) title(main=paste("R",j))
+
+
}
+

+ }
+
+ par(mfrow = c(1,1))
+
+ } # end of drawHistograms function
> df <- getDataFrame(df,startDate,endDate)
> head(df)
> View(df)
> df1 <-getIndependentScore(df)
> View(df1)
> head(df1[-(2:3)])
> drawHistograms(df1)
> par(mar=c(1,1,1,1))
> drawHistograms(df1)
> par(mfrow = c(1,3))
> hist(df$Recency)
> hist(df$Frequency)
> hist(df$Monetary)
> library(xlsx)
> write.xlsx(df1,"c:/rfm.xlsx")

References
1) http://stackoverflow.com/questions
2) http://www.simafore.com/blog/bid/159575/How-to-use-RFManalysis-for-customer-segmentation-and-classification
3) http://www.r-statistics.com/2013/08/k-means-clustering-fromr-in-action/