Sie sind auf Seite 1von 49

 At first, every time a customer logs into

Bigbasket.com, they have to go through a sea of


products to select the ones they need to
purchase though they have a small set of regular
items to purchase.
 Many a time, I place orders from my mobile
while returning from office in the evening.
Searching for the products in a small handset is
painful.

 It is common that many customers forget
grocery items and there are apps such as “out of
milk” that helps customers with their shopping
list.
 We as a technology team should create a
solution that will assist customers with shopping
list and avoid customers placing frequent orders
due to their forgetfulness.
 Often, I tend to forget items and end up walking
to the nearby store to get the forgotten item.
 E-commerce companies such as Amazon and
Flipkart use product recommendations.
 In fact, I read in a book that Amazon earned 35%
of the revenue through its product
recommendations.
 I think we need to find a solution to our
customer problems using predictive analytics.
 What we are trying to do is to predict what a
customer is likely to buy in the future and
whether the customer may have forgotten an
item.
 We estimated that about 30% of our customers
place orders through smart phones. Unlike other
e-commerce companies such as Amazon,
Bigbasket customers place order for several
products, sometimes as high as 80 in one order
depending on their purchase frequency.
 A few customers buy all their groceries once a
week and there are customers who would place
order once a month. When the basket size is
high, using smart phones to place order is
challenging.
 Q1. What is the difference in the recommender system
requirements between Bigbasket and other e-
commerce companies such as Amazon and Flipkart?
 Answer: forgetfulness
 Q2. What are the different types of recommender
systems? Which recommender system is more
appropriate for Bigbasket?
 knowledge-based and content-based recommendations
are more suitable since we would be using an individual’s
purchase history for recommendations rather than
collaborative filtering in which data of other users are used
for generating recommendations.
 This
 Q4. Given the context of business carried
out by Bigbasket, what basic tools can be
used for understanding repeat purchases?
 Many different models such as geometric
distribution, association rules, and similarity
measures can be used to solve the problem
encountered by Bigbasket
 Geometric distribution is a discrete
probability distribution in which the random
variable counts the number of failures before
the success occurs.
 The success is defined as “customer placing
an order for a specific product (SKU)”.
 A recommendation (Smart Basket) can be
created by finding the probability mass function
value (or cumulative distribution function value)
by arranging the SKUs in the descending order
of probability and having a cut-off probability
(such as include the SKU in the Smart Basket if
the probability of purchase of the SKU is greater
than say 0.2).
 To use geometric distribution, we have to
estimate the probability value “p” from the data.
The maximum likelihood estimate for “p” is
given by:
 X1, X2, … Xn are sample observations
(customer placing order on ith visit for a
SKU).
 The “Did you forget?” feature can be included
on the basis of probability calculated from
probability mass function.
 Support
 Confidence
 Lift
 800 customers buy milk
 600 customers buy orange juice
 400 customers buy milk and orange juice
 P (M) = 0.8, P (O) = 0.6 and P (M and O) = 0.4
 Support: milk implies orange juice and orange juice implies milk both
have a support of 0.4
 Case 1: Antecedent: Orange Juice, Consequent: Milk
 Confidence: P (M/O) = P (M and O)/P (O) = 0.4/0.6= 0.667
 Lift: P (M/O)/P (M) = 0.667/0.8 = 0.83375
 Interpretation: A customer who purchases orange juice is 0.83375 times
likely to purchase milk, than a randomly chosen customer.
 Case 2: Antecedent: Milk & Consequent: Orange juice
 Confidence: P (O/M) = P (M and O)/P (M) = 0.4/0.8 = 0.5
 Lift: P (O/M)/P (O) = 0.5/0.6 = 0.83
 Interpretation: A customer who purchases milk is 0.83 times likely to
purchase orange juice than a randomly chosen customer.
 Suppose a customer base is of 1000
 600 customers buy milk
 400 customers buy cereal
 300 customers buy milk and cereal
 P (M) = 0.6, P(C) = 0.4 and P (M and C) = 0.3
 Support: milk implies cereal and cereal implies milk both
have a support of 0.3

 Case 1: Antecedent: Cereal, Consequent: Milk
 Confidence: P (M/C) = P (M and C)/P(C) = 0.3/0.4= 0.75
 Lift: P (M/C)/P (M) = 0.75/0.6 = 1.25
 Interpretation: A customer who purchases cereal is 1.25
times likely to purchase milk, than a randomly chosen
customer.
 Case 2: Antecedent: Milk, Consequent: Cereal
 Confidence: P(C/M) = P (M and C)/P (M) = 0.3/0.6 =0.5
 Lift: P(C/M)/P(C) = 0.5/0.4= 1.25
 Total Transactions in the data set are
100,000.
 The data with respect to Support,
Confidence, Lift and average profit per
transaction is given in the following table.
Rule Antecede Conse Support Confidence Lift No of Profit
nt quent Transacti Per
on Transac
tion
1 AB C 0.02 0.8 20 INR 5
2 AC D 0.4 0.75 2 INR 4
3 CD E 0.2 0.8 5 INR 1
4 DE G 0.4 0.8 3 INR 3
5 HG C 0.45 0.75 2.9 INR 3.5
Transa Apple Orang Grapes Strawb Plums Green Banan
ction e erry Apple a
ID
1 1 1 1 0 1 1 1
2 0 1 0 0 0 1 1
3 0 0 0 0 0 1 1
4 1 0 0 0 1 0 0
5 1 0 0 0 1 1 1
6 0 1 1 0 0 0 1
7 0 1 1 0 0 0 1
Rule RHS Support Confiden Lift

(LHS) ce
Grapes Orange 0.43 1 1.8

Apple, Orange 0.14 1 1.8


Grapes
Apple, Green 0.14 1 1.8
Grapes Apple
Grapes Banana 0.43 1 1.2
 Customer might have forgotten could be
Orange, Green Apple, and Banana
Transa Apple Orang Grapes Strawb Plums Green Banan
ction e erry Apple a
ID
1 1 1 1 0 1 1 1
2 0 1 0 0 0 1 1
3 0 0 0 0 0 1 1
4 1 0 0 0 1 0 0
5 1 0 0 0 1 1 1
6 0 1 1 0 0 0 1
7 0 1 1 0 0 0 1
 Dice (Apple , Grapes)=
 2*1/ (3+3) = 0.33
Apple Orange Grapes Straw- Plums Green Bana
 berry Apple

Apple N/A 0.29 0.33 0.00 1.00 0.57 0.44


Orange 0.29 N/A 0.86 0.00 0.29 0.50 0.80
Grapes 0.33 0.86 N/A 0.00 0.33 0.29 0.67
Straw- 0.00 0.00 0.00 N/A 0.00 0.00 0.00
berry
Plums 1.00 0.29 0 0.00 N/A 1 1
Green 0.57 0.50 1 0.00 0 N/A 1
Apple

 Three Items in the Basket
 (Apple, Strawberry, Green Apple)
 One can use similarity to find out out which
new items are most similar to the existing
items in the basket
 Items in the Current
Basket
Apple Straw Green Max Rank
berry Apples Similarity

Items Orange 0.29 0.00 0.50 0.50 3


not
present Grapes 0.33 0.00 0.29 0.33 4
in the
Basket Plums 1.00 0.00 0.57 1.00 1

Banana 0.44 0.00 0.80 0.80 2



 It is given by
 PR(A) = (1-d) + d (PR(T1)/C(T1) + ... +
PR(Tn)/C(Tn))
 where
 PR(A) is the PageRank of page A,PR(Ti) is the
PageRank of pages Ti which link to page
A,C(Ti) is the number of outbound links on
page Ti and d is a damping factor which can
be set between 0 and 1.
 Damping factor d is usually set to 0.85, but to keep
the calculation simple we set it to 0.5. The exact
value of the damping factor d admittedly has
effects on PageRank, but it does not influence the
fundamental principles of PageRank.
 PR(A) = 0.5 + 0.5 PR(C)
PR(B) = 0.5 + 0.5 (PR(A) / 2)
PR(C) = 0.5 + 0.5 (PR(A) / 2 + PR(B))
 These equations can easily be solved. We get the
following PageRank values for the single pages:
 PR(A) = 14/13 = 1.07692308
PR(B) = 10/13 = 0.76923077
PR(C) = 15/13 = 1.15384615
 How to solve these equation?
 Considering the example of items and
baskets given above table, one can derive
how the items are related to each other in the
form of matrix.
 Conditions
 Q8: What testing strategy should be
applied to find out how the model works?
 Clustering and segmentation of the
customers according to various criteria and
creating test strategy for each of the
customer segments was one option.
 a) Bigbasket did not mandate users to provide demographic
information when they signed up. This essentially meant that
other than the user id, email and phone number, other fields that
would help with customer segmentation such as gender,
professional status (working/non-working), and number of people
in the household, were not available.
 b) Creating customer segments based on the transactional data
was another option. With this approach, one could look at the
RFM model – recency of orders, frequency of orders, size of orders
placed (i.e. number of items in an order) as well as the monetary
value associated with the orders. Upon analyzing the data, it was
determined that this would lead to an unmanageable number of
customer segments and there was no way that models could be
built for each of these segments separately, invoked, and tested at
real time.
 Test using Monte Carlo cross-validation
 Use the purchasing behavior of a customer on
any given day and randomly split the datasets
for training and validation.
 The testing strategy was based on Monte
Carlo cross-validation and tailored to fit the
Bigbasket use case
 Create your own data set for all the questions
of the case and answer these questions using
excel or any other software.
 Date of submission is 11.01.18 by 2400 hours

Das könnte Ihnen auch gefallen