Beruflich Dokumente
Kultur Dokumente
RFM Analysis
Using R and R Studio
Group 9
Methodology
1) RFM is performed using RStudio. This included the following steps in data
preparation:
- Entries with customer ID 0 were removed as they were not part of the
loyalty program and could not be uniquely identified.
- Data type conversion was performed for R, to identify the character value
as date.
Naming of Clusters
Recency
Frequency Monetary
Cluster
Name
High
Low
High
High
Low
Low
Heavy
Spenders
First
16.7%
Time
Low
High
High
Churn
Percentage
compositio
n
21.3%
34.1%
High
High
Low
Shoppers 27.9%
Recommendations
1)
Heavy Spenders:
2)
First Timers:
These are customers who spend heavily when they visit the
store. Store cannot afford to lose them. Care must be taken to retain the customers
by providing them excellent customer service when they visit the store.
The goal is to make these customers visit the store again. This
could be done by providing an excellent customer experience the first time. Ensure
there is direct marketing for special sales. Encourage them to provide feedback.
3) Churn:
These customers are customers moving out of the store. They should be
encouraged to come back to the store. The factors leading to churn could be
analysed. The purchase data of these customers could be analysed to send
personalised and dedicated offers.
4) Shoppers:
These are customers who are visiting the store regularly, though they
may not spend much. These are cash cows for the store and could be retained by
delivering value on every visit. Minor offers, discounts and excellent customer service
are sure ways to retain and grow this cluster of customers.
Limitations
Since the clustering was done through R, individual records belonging to each cluster could
not be identified.
> hist(df$Monetary)
Functions
getDataFrame
<function(df,startDate,endDate,tIDColName="ID",tDateColName="Date",tAmountColName="
Amount"){
+
+ #order the dataframe by date descendingly
+ df <- df[order(df[,tDateColName],decreasing = TRUE),]
+
+ #remove the record before the start data and after the end Date
+ df <- df[df[,tDateColName]>= startDate,]
+ df <- df[df[,tDateColName]<= endDate,]
+
+ #remove the rows with the duplicated IDs, and assign the df to a new df.
+ newdf <- df[!duplicated(df[,tIDColName]),]
+
+
# caculate the Recency(days) to the endDate, the smaller days value means more
recent
+ Recency<-as.numeric(difftime(endDate,newdf[,tDateColName],units="days"))
+
+ # add the Days column to the newdf data frame
+ newdf <-cbind(newdf,Recency)
+
+ #order the dataframe by ID to fit the return order of table() and tapply()
+ newdf <- newdf[order(newdf[,tIDColName]),]
+
+ # caculate the frequency
+ fre <- as.data.frame(table(df[,tIDColName]))
+ Frequency <- fre[,2]
+ newdf <- cbind(newdf,Frequency)
+
+ #caculate the Money per deal
+ m <- as.data.frame(tapply(df[,tAmountColName],df[,tIDColName],sum))
+ Monetary <- m[,1]/Frequency
+ newdf <- cbind(newdf,Monetary)
+
+ return(newdf)
+
+}
> getIndependentScore <- function(df,r=5,f=5,m=5) {
+
+ if (r<=0 || f<=0 || m<=0) return
+
+
# make sure the customer who have the same recency have the same score
+
s <- rEnd+1
+
if(i<r & s <= len){
+
for(u in s: len){
+
if(df[rEnd,column]==df[u,column]){
+
score[u]<- r-i+1
+
rEnd <- u
+
}else{
+
break;
+
}
+
}
+
+
}
+
+
}
+
+ }
+ return(score)
+
+ } #end of function Scoring
>
> getScoreWithBreaks <- function(df,r,f,m) {
+
+ ## scoring the Recency
+ len = length(r)
+ R_Score <- c(rep(1,length(df[,1])))
+ df <- cbind(df,R_Score)
+ for(i in 1:len){
+
if(i == 1){
+
p1=0
+
}else{
+
p1=r[i-1]
+
}
+
p2=r[i]
+
+
if(dim(df[p1<df$Recency & df$Recency<=p2,])[1]>0) df[p1<df$Recency &
df$Recency<=p2,]$R_Score = len - i+ 2
+ }
+
+ ## scoring the Frequency
+ len = length(f)
+ F_Score <- c(rep(1,length(df[,1])))
+ df <- cbind(df,F_Score)
+ for(i in 1:len){
+
if(i == 1){
+
p1=0
+
}else{
+
p1=f[i-1]
+
}
+
p2=f[i]
+
+
if(dim(df[p1<df$Frequency & df$Frequency<=p2,])[1]>0) df[p1<df$Frequency &
df$Frequency<=p2,]$F_Score = i
+ }
+ if(dim(df[f[len]<df$Frequency,])[1]>0) df[f[len]<df$Frequency,]$F_Score = len+1
+
+ ## scoring the Monetary
+ len = length(m)
+ M_Score <- c(rep(1,length(df[,1])))
+ df <- cbind(df,M_Score)
+ for(i in 1:len){
+
if(i == 1){
+
p1=0
+
}else{
+
p1=m[i-1]
+
}
+
p2=m[i]
+
+
if(dim(df[p1<df$Monetary & df$Monetary<=p2,])[1]>0) df[p1<df$Monetary &
df$Monetary<=p2,]$M_Score = i
+ }
+ if(dim(df[m[len]<df$Monetary,])[1]>0) df[m[len]<df$Monetary,]$M_Score = len+1
+
+ #order the dataframe by R_Score, F_Score, and M_Score desc
+ df <- df[order(-df$R_Score,-df$F_Score,-df$M_Score),]
+
+ # caculate the total score
+ Total_Score <- c(100*df$R_Score + 10*df$F_Score+df$M_Score)
+
+ df <- cbind(df,Total_Score)
+
+ return(df)
+
+ } # end of function of getScoreWithBreaks
> drawHistograms <- function(df,r=5,f=5,m=5){
+
+ #set the layout plot window
+ par(mfrow = c(f,r))
+
+ names <-rep("",times=m)
+ for(i in 1:m) names[i]<-paste("M",i)
+
+
+ for (i in 1:f){
+
for (j in 1:r){
+
c <- rep(0,times=m)
+
for(k in 1:m){
+
tmpdf <-df[df$R_Score==j & df$F_Score==i & df$M_Score==k,]
+
c[k]<- dim(tmpdf)[1]
+
+
}
+
if (i==1 & j==1)
+
barplot(c,col="lightblue",names.arg=names)
+
else
+
barplot(c,col="lightblue")
+
if (j==1) title(ylab=paste("F",i))
+
if (i==1) title(main=paste("R",j))
+
+
}
+
+ }
+
+ par(mfrow = c(1,1))
+
+ } # end of drawHistograms function
> df <- getDataFrame(df,startDate,endDate)
> head(df)
> View(df)
> df1 <-getIndependentScore(df)
> View(df1)
> head(df1[-(2:3)])
> drawHistograms(df1)
> par(mar=c(1,1,1,1))
> drawHistograms(df1)
> par(mfrow = c(1,3))
> hist(df$Recency)
> hist(df$Frequency)
> hist(df$Monetary)
> library(xlsx)
> write.xlsx(df1,"c:/rfm.xlsx")
References
1) http://stackoverflow.com/questions
2) http://www.simafore.com/blog/bid/159575/How-to-use-RFManalysis-for-customer-segmentation-and-classification
3) http://www.r-statistics.com/2013/08/k-means-clustering-fromr-in-action/