Sie sind auf Seite 1von 15

Churn Management:

A Review of Techniques and Approaches for Detecting and Managing Churn

Satyam Computer Services Ltd. #14 Langford Avenue Lalbagh Road Bangalore 560 025 India September 2004

Churn Management: A Review of Techniques and Approaches for Detecting and Managing Churn

Abstract
In recent times, customer churn in telecom sector has assumed great significance. Ever increasing competition among the service providers and growing customer expectations are the main reasons for increasing churn. Significant increase in the cost of acquiring new customers is forcing service providers to focus their efforts towards effective churn management. This paper presents a review on churn management in terms of the available techniques and approaches for detecting and managing churn. Various data mining techniques were suggested for prediction of churn. These mainly depend on the extraction of behavior patterns of customers from the historical data and identifying the patterns that are associated with possible churners. Reported techniques for building the churn prediction model were based on approaches such as Decision trees, Logistic regression, Neural Nets, and Rule based learning. Although every approach has advantages and disadvantages in terms of ease of implementation, data requirements, and ease of interpretation of the results, some of these approaches were reported to be specifically suitable for churn prediction in the telecom sector. This paper reviews the techniques based on these approaches for churn prediction.

Introduction

Until recently, telecom service providers enjoyed the benefits of a rapidly growing subscriber base. Although the subscriber bases are growing day by day, the cost of customer acquisition is also progressively increasing. With more and more new service providers entering the scene, retaining the existing customers has assumed paramount importance. Churn, though not a new problem, has assumed significant importance these days mainly due to the increasing costs of acquiring a new customer and ever increasing churn rates. The attractions for a customer to switch from an existing service provider could be several. The attraction could be in the form of competitive price offerings, discounts and incentives, or better services in terms of quality or in value-addition. Service providers need to be vigilant to the market initiatives of the competitors and respond proactively to contain the possible churn of their customers. In addition to external attractions from competitors, factors such as customers lack of understanding of the service scheme, unfulfilled privacy promises, failure to fulfill SLAs by the service provider all lead to churn. Table1 illustrates some of the major factors influencing the churn [3]. One of the recent initiatives in the form of Wireless Local Number Portability has further pushed the churn rates in already stressed telecom segment. Studies have indicated that this especially will have adverse affect on the churn of medium size business customers [1].

Churn Management

Churn is an unavoidable phenomenon, which needs to be managed to minimize the potential losses to the business. Though there may be different reasons for a customer to churn, there are some basic factors (SLA, privacy promises, service scheme understanding) that when unattended lead to churn. Fulfilling the agreed SLAs, addressing privacy issues with high importance, and proper communication of service offerings can prevent customer churn to a large extent.
Factor Call quality Pricing options Corporate Capability Customer Service Credibility/Customer Communications Roaming/Coverage Handset Billing Cost of Roaming Importance Nature of data required for prediction Network Market, Billing Market, Customer Service Customer Service Customer Service Network Application Billing Market, Billing

21% 18% 17% 17% 10% 7% 4% 3% 3%

Table-1. Factors influencing subscriber satisfaction [3] Attractive service offerings and competitive pricing are only pre-requisites for retaining a customer; however, in reality in spite of these customers do churn. The positive fact is that a large part of the possible churn can be prevented by timely detection and effective retention efforts. A proper insight into the reasons behind churn is essential to design effective retention efforts. Therefore, the effectiveness of a prediction model not only depends on its accuracy to predict a churner but also how well the results can be interpreted for inferring the possible reasons for the churn. A proactive identification of churners based on the prediction models provides ample opportunity to a service provider to design and employ the retention efforts. The damage control exercise, that follows soon after a customer declares his intentions to leave, generally yields poor results in comparison with proactive retention efforts based on proper churn prediction.

Churn Prediction

Churn prediction deals with the identification of customers who are likely to churn in the near future. The basis for such identification tasks is the historical data that contains information about past churners. A behavioral comparison is made between the churners and existing customers. Those customers, for whom the comparison suggests similarity, are chosen as likely churners. Service providers maintain huge volumes of data that contains details of the customer transactions. Broadly the data may be categorized as Personal and Transactional. The personal category includes attributes that represent customer demographic data such as geographic area, age, income, credit rating, and family information. The transactional category includes attributes describing various statistics such as duration of calls, type of calls, and call failures. Although huge databases are available with service providers, they pose quite a few challenges for usability in churn prediction. The size of the data is generally very large and this is compounded by high-dimensionality found in such data. Exploration of such large data set with very high dimension is a complex task and is quite challenging. In most of the dimensionality reduction techniques, reduction in dimensionality is traded off for possible information loss. There is also the problem of missing values in such large datasets. Missing values can be dealt with either at the attribute level or at the record level. In the attribute level, an attribute is either selected or ignored for prediction based on its significance. And the significance of an attribute for prediction is evaluated based on domain knowledge. Prediction accuracy is affected by the inclusion of an attribute with a lot of missing values or by the exclusion of an important attribute. Correct decisions must be made for good prediction results. In the record level correction approach, the missing values are replaced with a reasonable estimate based on interpolation methods. Providing a good estimate of the missing values is another key issue for good churn prediction.

Types of Churn Prediction

Most of the early models proposed for churn prediction were aimed at classifying a customer as a possible churner or non-churner. Although these models had the advantage of being simple and robust with respect to the defects in the input data, they possessed serious limitations to the interpretation of reasons for churn. As mentioned earlier, appropriate response to churn detection requires that the reasons for churn be inferred correctly. Subsequently, models were developed to predict the churn in terms of likelihood of churn instead of a two-class classification. These models predict the likelihood of churn for a customer thus differentiating the churners as more probable and less probable churners. This aids in strategizing the deployment of limited resources available for retention efforts by choosing more probable churners. Current focus is towards developing models that not only provide the churn likelihood prediction but also provide a clear insight into the possible reasons for prediction. The performance of a prediction model is evaluated primarily in terms of the accuracies of prediction. Table2 illustrates a scheme for performance evaluation of a model. True positives
4

(churners predicted as churners) and true negatives (non churners predicted as non churners) represent the correct classifications made by a model. False positives (non churners predicted as churners) and false negatives (churners predicted as non churners) are the misclassifications made by a model.
Actual Churners Churners Non Churners

True Positive

Predicted

False Positive

Non Churners

False Negative

True Negative

Table-2. Churn Prediction Category Matrix The two terms, precision and recall, that are defined below are also important in evaluation of a model. Precision represents the fraction of actual churners amongst the predicted churners: Precision = (true positives / (true positives + false positives)) Having good precision alone is not sufficient for a good model. This is because it could have captured only a fraction of actual churners and missed out a lot of them. Hence, it is important to capture this information too and is represented as recall. Recall represents the fraction of actual churners captured: Recall = (true positives / (true positives + false negatives)) It is desirable that the model should capture as many churners as possible and minimize the false positives amongst its prediction. By predicting a lot of customers as churners, we can achieve a higher recall value possibly by compromising precision. It is indeed a challenge to develop a model that has good precision and high recall. In case of likelihood prediction models, performance evaluation is not straightforward in terms of Precision and Recall. For these models, the concept of lift curve is used for evaluating the performance of the model. The lift in top 10% of the predicted churners is used as a performance measure of the model and is referred as Top Decile Lift [2]. Top decile measures the proportion of churner in top 10% of the predicted churners divided by the proportion of churners in entire data set. Once the churn likelihood of customers (f(x)) is determined, then the customers can be ordered on decreasing likelihood of churn: f (x ) f (x ) f (x ) . Actual proportion of churn in top 10 % of the predicted churners is given by:
1 2 n

10% =

1 n E[ yi = 1] n i =1
5

where E[yi=1] represents the event of true positive and E[yi=0] represents event false positive. And n represents the number of observations in top 10% of most likely churner. The proportion of churn across the entire data set is given by: 1 N = E[ y i = 1] N i =1 Lift in Top Decile is then calculated as the ratio of Predicted Churn Rate to that of Actual Churn Rate as:

TopDecile =

10%

A high lift value portrays that the number of churners in the top decile are high. This indicates that the model has captured a lot of churners in the top decile. For example, a lift value of 2 indicates that, the proportion of churners to non-churners in the top decile, is twice that of the entire population. Lift in Top Decile is one of the commonly used criteria for model evaluation, especially in cases where it is not feasible for a service provider to choose a large fraction of the population for deployment of retention efforts. Accordingly, the customers in the top decile are chosen for customer retention. Lift is a good criterion to check a models performance in the chosen decile. It is deemed necessary to evaluate a model over the entire spectrum because some models are likely to be very good in identifying top likely churners but may not be that good in identifying the midrange churners. For example, if we include the top 2 or 3 deciles, the accuracy of a model that had good performance in the top decile may go down. So, there is a need for measures that evaluate a model in the entire range. Measures like Area under the lift curve give good insight into the performance of the model over the entire spectrum [7].

100
Model Curve Percentage of churners

80 60
Random Curve

40

20

20

40

60

80

100

Percentage of ordered customers

Figure-1. Typical Illustration of a Lift Curve In Fig. 1, customers ordered on churn likelihood are placed on x-axis and percentage of churners form y-axis. The shaded region represents the area between the models cumulative lift curve and a random curve, which represents uniform distribution churners over entire customer population. This implies that more the area, better the performance of the model over the range. This area between cumulative curve and the random curve is measured in terms of Geni coefficient. Geni coefficient is defined as below: Fraction of all customers with churn likelihood above the threshold f (xT) is:

T =

1 N

E[ f ( x ) > f ( x
i =1 i

)]

Similarly fraction of all actual churners with churn likelihood above the threshold f (xT) is:
T = 1 Nc

E[ f ( x ) > f ( x
i =1 i

NC

), and yi = 1]

where N is the total number of customers and NC is the total number of actual churners in entire data set. Geni coefficient is given by:
GeniCoefficient = 2 N (
T =1 N T

T )

Fig-1 illustrates typical lift curve and dotted line shows that 60% of actual churners are captured in the top 20 % of predicted churners. This means that if a service provider chooses to target 20% of the customer population then it is expected to reach out to 60% of the actual churners.
7

Techniques for Modeling Churn Prediction

Several techniques have been applied to derive models for churn prediction. Basic underlying principle common to all these techniques have been the machine learning principles. Logistic Regression, Decision Tree, and Neural Networks are some of the popular techniques that have been applied for churn prediction. Recently, models based on evolutionary learning of rules are reported to be highly successful in churn prediction. In this section we describe briefly each of these models.

5.1

Decision Trees

Decision trees are primarily used in classification tasks. Given the dataset, the conditions that hold for churners and non-churners are learned based on training data and these conditions are expressed in the form a tree. A typical decision tree is illustrated below:

Age > 50
Yes NO

International calls < 5


Yes NO

Monthly Minutes > 2000


Yes NO

Churner

Non Churner

Non Churner

Churner

Figure-2. Illustration of Typical Decision Tree Each node in a decision tree is a test condition and the branching from the node is based on the value of the attribute that is tested. The tree represents a collection of multiple rule sets, one rule set holding true for every instance of customer record. Given a customer record, the classification is done by traversing through the tree, testing at each node the value of the attribute, and branching accordingly. This process is repeated until a leaf node is reached. The label of the leaf node (Churner or Non Churner) is assigned to the customer record under evaluation. Decision trees are used primarily because of their simplicity and ease of interpretation, though there is a criticism that they are not suitable for capturing complex and non-linear relationships between the attributes. The accuracy of the decision trees is high; however, the training data requirements are high. Recently, a model for churn prediction was developed using decision trees [8]. An ensemble of decision trees was able to capture nearly 50% of the churners in its top decile on a particular test data set. Another comparative study reveals that decision trees perform reasonably well and are highly useful in interpreting the results [5]

5.2

Logistic Regression

The prediction task involves identification of a customer as a churner or non-churner. Since the predicted attribute is associated with only two values, logistic regression techniques are suitable for such tasks. Linear Regression models are useful for predicting continuous valued attributes whereas Logistic Regression models are suitable for binary attributes. A logistic regression is a modified form of linear regression so as to obtain a discrete value for a dependent variable. The logistic regression model is simply a non-linear transformation of a linear regression model. The logistic distribution is an S-shaped distribution function that is similar to the standard normal distribution. This non-linear transformation of a linear regression model mainly overcome the limitations of linear regression which tends to give continuous probability values often greater than 1 or less than 0. Standard representation of logistic regression is referred as logit function. The estimated likelihood of churn is represented by a logit function as:
P = 1/[1 + exp (-T)] where T= a + BX

Where a represents a constant term, X represents the predictor attributes vector and B represents the coefficient vector for the predictor attributes. When T equals to zero then the likelihood is 0.5. It is equi-probable that a customer is a churner and non-churner. As T grows large (towards ), the exponential term becomes negligible and the probability comes closer to 1. When T becomes very small (towards -) the probability of churn tends to 0. Figure-3 illustrates the linear regression and logistic regression comparison. Logistic Regression models are good in modeling linear relationships between the predictor attributes and aid in determining the predictor attribute value. This technique was reported to yield an accuracy of 92% and was found that they perform better when compared with some other models [5].
E E= 1

Linear Regression model Logistic Regression model

E= 0

Figure-3. Linear Regression and Logistic Regression Models

5.3

Neural Networks

Neural Networks have been applied to various prediction tasks wherein the primary goal was prediction and lesser importance was given to model understanding. Models employing neural networks can learn complex relationships amongst the predictor attributes and accurately predict churn. They can be applied to a variety of target functions (such as discrete and continuous). The basic idea is that each attribute is associated with a weight and combinations of weighted attributes participate in the prediction task. The weights are constantly updated during learning to model the correct effect of the attribute. Given a customer record and the set of predictor attributes, a Neural Network is used to calculate a combination of these inputs, and to output either a 1 or 0 that associates Churner or Nonchurner status to the customer record. Neural Networks need a large volume of training data to arrive at a reasonable weightage for the predictor attributes. It also takes a lot of time to arrive at the correct weights. Neural Networks have been categorized as a black box model because of the difficulty in expressing the semantics of the relationship arrived. This is one of the reasons why Neural Networks are hard to interpret. They also do not help in the explanation of churn reasons. In spite of the said drawbacks, Neural Networks are applied in a number of scenarios. The accuracy of Neural Networks outweighs all these disadvantages. Neural Networks have also been applied in the area of churn prediction. A comparative study [5] reports that Neural Networks had a good recall with high prediction accuracy. They also had fewer misclassifications among the models that were evaluated. Another independent evaluation also suggests that neural networks are superior in performance in comparison with the other models [3]. Logistic Regression and Decision Trees are good at explaining the reasons for churn whereas Neural Networks are superior in churn prediction. A possible way of combining the merits of the techniques is by using Logistic Regression and Decision Trees for explaining the behavior of churn while exploiting Neural Networks in making the actual churn predictions.

5.4

Evolutionary Approach

The necessity of precise interpretation of results has motivated researchers to explore newer models for churn prediction, the models that not only have superior performance in terms of prediction accuracies but also provide greater insight into the possible reasons for churn. A recent work has proposed churn models based on evolutionary learning of rules for churn [10]. Rule based approaches have excellent capability to express relations and such approaches are quite popular in the context of prediction tasks. The main ideas is to iteratively and progressively generate rules that contain m conditions by combining the rules that contain m-1 conditions. The number of terms (conditions) in a rule determines the order and the terms are connected (using connectives) to form a rule. A simple second order rule is shown below:
Area _ code = 91 subcription _ length 1 Churn = True

10

To obtain the rules of order m (in the m-1th iteration), the algorithm generates an initial rule set of order m from the rule set of order m-1 by randomly combining them. This combination step, for example, can be done using an evolutionary technique of crossover. The rules are then tested for interestingness and the rules that do not satisfy this test are pruned. With every rule, the algorithm associates a measure that represents the evidence of truth derived for that rule from the training data. This process is continued until all interesting rules of order m are identified. The algorithm terminates when there are no more interesting rules of higher order. The rules that match the interestingness criteria are output and this forms the basis for prediction tasks. Sample set of final rules could be:

subscriber _ type = individual Area _ code = 91 subcription _ length 1 Churn = True

Area = Rural Mean _ Monthlycalls 45 Mean _ monthlyIncome 3000 Churn = True


The performance of the model was reported to be substantially better when compared with Neural Networks and Decision Trees [10]. Though the execution speed of decision trees could not be matched, the algorithm obtained a slightly better lift when compared with the other techniques. The results indicate the usefulness of the approach in making correct predictions. The model also explains the reasons for churn clearly in the form of rules.

5.5

Other Approaches

Churn prediction being an important and a complex task, there are other approaches that have been applied for churn prediction. Some of them are Bayesian approach and self-organizing map (SOM) based approach. A nave-Bayesian model was constructed for churn prediction and was reported to have an accuracy of 68%[4] and also helped in the identification of a top few attributes that are related to churn. In another approach [9], Self-Organizing Maps and U-Matrix techniques were used. The main focus was to understand the reasons for churn. The approach identified groups amongst customers and summarized the characteristics of groups using rules. A classifier was constructed out of this and results suggest that the accuracy of the classifier was more than 90%. Identification of groups lead to an understanding of group characteristics which in turn would help in identifying churn characteristics.

5.6

Lifetime Value of a Customer and its role in CM

Services for customers differ on the customers needs and the revenues that they generate. Lifetime Value of a customer is one of the ways to model revenues associated with a customer. Lifetime Value is the net revenue a service provider is likely to get over the period of customer stay. In the context of churn management, it would be advised to target those customers who are likely to bring in more revenue for customer retention. Lifetime value of a customer (LVC) aids in such decision making process. It is also interesting to note that the average lifetime value of an existing customer is inversely proportional to the churn likelihood of that customer. A decision

11

by the customer to stay back is coupled with the increment of lifetime value of the customer. When churn likelihood goes down then the customers lifetime value increases. We assume that LVC is calculated, after considering various factors that govern it. Let us consider a scenario where the prediction model for churn generates likelihood of churn for every customer. Then, the customers can be ordered and chosen for customer retention. In a normal approach, we choose the top likely churners for customer retention and generate a strategic response for those customers. However, it is beneficial to deviate from this procedure in certain situations. Instead of merely choosing customers based on churn probability, we can also consider LVC of the identified customers. If it is required to choose only a fraction of the identified churners, it is beneficial to include most of the customers with higher LVCs. This choice is rewarded in the form of revenues, which the customers are likely to bring in over a period of time, if they decide to stay back. This kind of LVC based selection can be applied as a separate step, once the model is used to make the predictions. This is depicted in the Fig. 4.

LVC Computation

LVC Customers ordered on likelihood of churn

Customer Data

Churn Prediction Model

Target Customer Selection

Target Customers for Retention

Figure-4. LVC based Target Selection for Retention This plug-in kind of LVC based selection though desirable, it may not yield effective results. The results are still heavily dependent on the model and the selection based on LVC affects the results only to a certain extent. This calls for an integrated approach in which Customer Value (CV) is considered while making predictions. The CV for a particular customer is computed based on the revenue and profit data available for that customer. And unlike LCV, CV is not dependent on the churn likelihood of the customer. The main requirement of integrated models is that they should be able to accurately identify the most valuable (high CV) customers [6]. It becomes all the more important to consider CVs in those cases where the CVs of different customers vary drastically. The effort should be focused on those customers with higher CVs in order to have lesser impact on the overall revenue. We usually select a model that maximizes gains. A way of measuring the loss caused by a model is by using a cost function. This function usually has two associated costs. One is the cost for making a needless retention effort (false positives) and another is the cost of losing a customer because the model did not predict (false negatives). The objective is to minimize the combined

12

cost. The cost factors can have different weights. By having a greater weightage for the cost associated with loss of a high CV customer, one can develop a model that minimizes the objective function by selecting most of the high value customers. Though the accuracy of the model may go down as a result, the net benefits outweigh the losses occurred due to inaccurate predictions.

Ongoing work in Churn Management

Fig. 5 illustrates the approach for churn prediction in our ongoing work at Satyam. Our approach is to address the churn prediction problem in two steps: 1. Optimum segmentation of subscriber population into classes based on their demographic and selected transactional characteristics; and 2. Generate specific churn model for each class based on the training data specific to that class.

Subscriber Clustering

Telecom Database
Demographics Usage Statistics Billing Data Service Requests Customer Value Critical Incidence Records

Class-Specific Data

Class-Specific Churn Modeler


C

Churn Scoring Subscriber Specific Churn Prediction

Churn Model Repository

Figure-5. Multi-Class Churn Prediction


13

From this we aim to achieve greater prediction accuracy for the following reasons: 1. Instead of one model which is applicable universally to entire customer population, we derive separate model for each subscriber class; hence, the models are less general (and more specific) resulting in greater accuracy; and 2. Each class is somewhat homogeneous in its characteristics and hence the models derived are likely to be more accurate. Currently, we are working on the evolutionary learning based algorithm for churn prediction. Further we are integrating customer value into the churn prediction model and also as an attribute in subscriber classification.

Summary
The problem of churn has assumed enormous significance due to high churn rates and the associated revenue losses. Churn being unavoidable, there has to be effective ways to control churn. Managing churn is a complex task and requires strategic efforts. Ways of addressing the likely churners continue to evolve as churn characteristics continue to vary. It is mandatory to understand the likely causes of churn and devise solutions accordingly. It is important to identify the likely churners amongst the existing customers to avoid revenue losses due to churn. This is achieved by building a good prediction model. The model, apart from making accurate predictions, should also be able to capture the reasons for churn and express them in a lucid manner. Customer value considerations could be integrated into prediction model so that valuable customers are retained. There is a definite need to create models that can provide better insights and higher prediction accuracy, and can adapt to changing churn characteristics.

References
1. Geppert, C., M. Medlin and T. Aw, Wireless Local Number Portability, White paper, KPMG , Accessed from: http://www.kpmg.ca/en/industries/ice/communications/documents/WirelessLocalNumberPor tability.pdf 2. Lemmens, A., C. Croux, Bagging and boosting classification trees to predict churn, Research Report, Katholieke Universiteit Leuven. Accessed from: http://www.econ.kuleuven.ac.be/tew/cteo/or_reports/0361.pdf 3. Mozer, M.C., et al, Predicting subscriber dissatisfaction and improving retention in the wireless telecommunications industry, IEEE Transactions on Neural Networks, 11, 690696. 4. Nath, V.S.and R.S. Behara , Customer Churn Analysis in the Wireless Industry: A Data Mining Approach, Technical Report, Daleen Technologies. Available: http://downloadeast.oracle.com/owsf_2003/40332.pdf

14

5. Parekkat, A., Operational comparison of Logistic regression, Decision trees & Neural networks in modeling mobile service churn, VIEWS 2003: A SAS user group meeting, 2003. Accessed from: http://www.amadeus.co.uk/events_resources/conferences/Views 2003/Arun Parekkat Operational comparison of Logistic regression.doc 6. Rosset, S. and E. Neumann, Integrating Customer Value Considerations into Predictive Modeling, Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM 2003), 19-22 December 2003, Melbourne, Florida, USA. 7. Scott A.N., et al, Defection Detection: Improving Predictive Accuracy of Customer Churn Models, Working Paper. Accessed from: http://mba.tuck.dartmouth.edu/pages/faculty/scott.neslin/working_papers.html 8. Scott, C.N., M. Golovnia, and D. Steinberg, Churn Modeling for Mobile Telecommunications. Accessed from: http://www.salford-systems.com/churn.php 9. Ultsch, A., Emergent Self-Organizing feature maps used for prediction and prevention of Churn in mobile phone markets, Journal of Targeting, Measurement and Analysis for Marketing , 10(4), 314-324, Nov. 2002. 10. Wai-Ho A, K. Chan, and X. Yao, A Novel Evolutionary Data Mining Algorithm With Applications to Churn Prediction, IEEE Transactions on Evolutionary Computation, 7(6), 532-545, Dec. 2003.

15

Das könnte Ihnen auch gefallen