Beruflich Dokumente
Kultur Dokumente
-Shubham Malviya
Case Study
AGENDA
Q&A
APPENDIX
Case Study 2
For this round of interview, I have worked to deliver on the case study results
with the objectives of Agility, Transparency and Analytic Rigor
Agility
I worked to deliver results rapidly
Transparency
Throughout the case study problems, I have been open and honest about my ideas and hurdles I
faced:
• I have built supporting evidence for my case study solutions through robust codes and models
Analytic Rigor
I have leveraged a range of models and analyses:
• I used multi-level regression at multiple time intervals and advances algorithms such as
random forest to support my results
• I used various statistical techniques to validate my results
Case Study 3
AGENDA
Where are we?
Q&A
APPENDIX
Case Study 4
Problem 1
I prepared a dummy data of 1k records having random user ids and date to
prove the solution codes
Case Study 5
AGENDA
Where are we?
Q&A
APPENDIX
Case Study 6
Problem 2
Account Length vs Churn: Maximum Churn is observed in account length bracket 75-
100, 100-125 weeks
Account Length vs Customer Calls: Majority of customers who have churned received
less Customer calls so we can say that one of the reasons to churn could be lack of attention
Account Length vs Voicemail Plan: Voice Mail Plan doesn’t have as significant impact
on Churn as compared to International Plan
Case Study 7
Problem 2
Correlation
Please refer to the R codes for detailed codes and correlation values
Case Study 8
Problem 2
I split the data into Train and Test dataset such that the churn ratio in both Train and Test dataset remains
consistent. I used inbuild R function createDataPartition() and verified manually in the codes
Case Study 9
Problem 2
2:3 I built a predictive Random Forest and XG Boost model and evaluated
each of the model
Why Random forest?
• Works best with categorical data and has feature selection capability
• Handles even non-linear relationship between independent and dependent variable
• Random forest creates multiple decision trees hence prevents overfitting
Why XG Boost?
• High XGBoost has in-built L1 (Lasso Regression) and L2 (Ridge Regression) regularization which prevents the model
from overfitting
• XGBoost utilizes the power of parallel processing and that is why it is much faster than other algorithms. It uses
multiple CPU cores to execute the model and hence this will enhance the performance of the model after
deployment
Case Study 10
Problem 2
2:4 To compare our Random Forest and XG Boost models we use below
performance metrics
Comparing on the basis of above metrics, we can say that XG Boost Model performed better as compared to the Random
Forest Model
Case Study 11
Problem 3.2
‘total day minutes’, ‘customer service calls’ and ‘international plan’ are the three most important features
Case Study 12
Problem 2
Expensive: If we are working on real time training of data and predicting the churn in seconds, we might have
to use expensive cloud infrastructure
Static to Dynamic Data: Model fitting is usually done on a static data set, however as the model goes into
production, we might have to deal with unstructured live data which might have issues like missing values,
non-defined values, changed format etc.
Consistent Access to Data: In deployment we need to ensure that all data is available in a programmatic and
trusted manner
Performance: Defining cluster configuration so that model run efficiently in the given time
Consistency in Model Deployment: When deploying our model into real life, we assume that the data we
apply the model to is representative to the data we learned the model on however there might be issues for
example- we trained the model for ‘area code’ 408, 415, 510 and in the new dataset we receive data for
entirely different area codes
Vendor may make changes in defining data: Third party vendors who provide the data might make few
minute changes in defining features that could make our model inconsistent
Case Study 13
AGENDA
Where are we?
Q&A
APPENDIX
1
Case Study
Problem 3
From our model we observe that ‘customer service calls’ was the second most important feature
From our data visualization we also observed that the majority of customers who have churned received less
Customer calls so we can say that one of the reasons to churn could be lack of attention
Considering above factors in mind, we design an experiment to reduce customer churn by increasing
‘Customer Service Calls’
Primary objective of the experiment: To check if increased customer service calls lead
to reduced churn with statistical significance
Case Study 15
Problem 3
% Clicks
controlling matches
Test and control groups
were selected such that
they are similar in all
aspects except that the test Post Event
groups were exposed to Control Test
# customer No changes more customer service
• Define the changes service calls were made calls Uplift attributed to increased calls
increased
From our previous data visualization we observed that customers with maximum churn(75-125
Account Len) have received minimum number of customer service calls, and hence we design our
experiment around this customers and evaluate if increased customer service calls has reduced
customer churn
Case Study 16
Problem 3
We take the customer group of account length 75-125 minutes(say N) and divide this customer group into two
samples Test and Control group
Test Group: Test group was given increased customer service calls over the experiment period T
Control Group: Control group was given the same previous number of customer service calls over the
experiment period of T
To make sure that we measure the churn reduced from increased calls only and not from any other factor, we
control upon other variables such that all the external promotions/offers remain same for the test and control
group and the only difference lies in the number of customer calls
Case Study 17
Problem 3
If P-value < 0.1 we reject the Null Hypothesis and conclude that the reduction in churn is
due to increased customer calls
Case Study 18
Problem 3
How long to test to arrive at result with significant level of confidence? Additionally during the time we are
testing our hypothesis we would be delivering regular number of customer calls to the control group and
hence we leaving churn as such in the control group or 50% of the customers: We should use time duration
calculator to find a time period that could give statistically significant results
If we are already very certain that the experience we are about to test will outperform the control, then
testing becomes an unnecessary overhead: both by the fixed costs it incurs, and due to the risk-adjusted
losses by delaying the implementation of the better experience: In our case, as we saw from the historical
data, that number of service calls have direct impact on churn we might also directly increase the service
calls without testing there impact
Novelty Effect- The novelty effect comes into play when we make an alteration that our typical customer isn’t
used to seeing. In our case, our customers are not used to high customer service calls and hence increased
calls might lead them to not to churn, however if this reduced churn will be because of number of increase in
calls or due to this new trend that attracts their attention: To test this effect, we can include new customers
in our experiment, since the new customers will not be used of some particular pattern
Even if we can establish that increased service calls lead to reduced churn, we will need to find out what
should be the optimal number of increased service calls to be made: There will be a saturation on the impact
of customer service calls after a particular number. This optimal number of calls could be found out by
running a separate model
Case Study 19
Questions?
Case Study 20