0 Bewertungen0% fanden dieses Dokument nützlich (0 Abstimmungen)
40 Ansichten49 Seiten
Bigbasket customers regularly purchase a small set of items but have to search through all products each time they log in. The company wants to create a shopping list and recommendations solution to help customers avoid forgetting items and reduce frequent, small orders. A predictive model could predict what a repeat customer is likely to purchase and remind them of any forgotten items using their purchase history and similarity measures between purchased items. The requirements are different from other e-commerce sites due to Bigbasket's higher average order sizes.
Bigbasket customers regularly purchase a small set of items but have to search through all products each time they log in. The company wants to create a shopping list and recommendations solution to help customers avoid forgetting items and reduce frequent, small orders. A predictive model could predict what a repeat customer is likely to purchase and remind them of any forgotten items using their purchase history and similarity measures between purchased items. The requirements are different from other e-commerce sites due to Bigbasket's higher average order sizes.
Bigbasket customers regularly purchase a small set of items but have to search through all products each time they log in. The company wants to create a shopping list and recommendations solution to help customers avoid forgetting items and reduce frequent, small orders. A predictive model could predict what a repeat customer is likely to purchase and remind them of any forgotten items using their purchase history and similarity measures between purchased items. The requirements are different from other e-commerce sites due to Bigbasket's higher average order sizes.
products to select the ones they need to purchase though they have a small set of regular items to purchase. Many a time, I place orders from my mobile while returning from office in the evening. Searching for the products in a small handset is painful. It is common that many customers forget grocery items and there are apps such as “out of milk” that helps customers with their shopping list. We as a technology team should create a solution that will assist customers with shopping list and avoid customers placing frequent orders due to their forgetfulness. Often, I tend to forget items and end up walking to the nearby store to get the forgotten item. E-commerce companies such as Amazon and Flipkart use product recommendations. In fact, I read in a book that Amazon earned 35% of the revenue through its product recommendations. I think we need to find a solution to our customer problems using predictive analytics. What we are trying to do is to predict what a customer is likely to buy in the future and whether the customer may have forgotten an item. We estimated that about 30% of our customers place orders through smart phones. Unlike other e-commerce companies such as Amazon, Bigbasket customers place order for several products, sometimes as high as 80 in one order depending on their purchase frequency. A few customers buy all their groceries once a week and there are customers who would place order once a month. When the basket size is high, using smart phones to place order is challenging. Q1. What is the difference in the recommender system requirements between Bigbasket and other e- commerce companies such as Amazon and Flipkart? Answer: forgetfulness Q2. What are the different types of recommender systems? Which recommender system is more appropriate for Bigbasket? knowledge-based and content-based recommendations are more suitable since we would be using an individual’s purchase history for recommendations rather than collaborative filtering in which data of other users are used for generating recommendations. This Q4. Given the context of business carried out by Bigbasket, what basic tools can be used for understanding repeat purchases? Many different models such as geometric distribution, association rules, and similarity measures can be used to solve the problem encountered by Bigbasket Geometric distribution is a discrete probability distribution in which the random variable counts the number of failures before the success occurs. The success is defined as “customer placing an order for a specific product (SKU)”. A recommendation (Smart Basket) can be created by finding the probability mass function value (or cumulative distribution function value) by arranging the SKUs in the descending order of probability and having a cut-off probability (such as include the SKU in the Smart Basket if the probability of purchase of the SKU is greater than say 0.2). To use geometric distribution, we have to estimate the probability value “p” from the data. The maximum likelihood estimate for “p” is given by: X1, X2, … Xn are sample observations (customer placing order on ith visit for a SKU). The “Did you forget?” feature can be included on the basis of probability calculated from probability mass function. Support Confidence Lift 800 customers buy milk 600 customers buy orange juice 400 customers buy milk and orange juice P (M) = 0.8, P (O) = 0.6 and P (M and O) = 0.4 Support: milk implies orange juice and orange juice implies milk both have a support of 0.4 Case 1: Antecedent: Orange Juice, Consequent: Milk Confidence: P (M/O) = P (M and O)/P (O) = 0.4/0.6= 0.667 Lift: P (M/O)/P (M) = 0.667/0.8 = 0.83375 Interpretation: A customer who purchases orange juice is 0.83375 times likely to purchase milk, than a randomly chosen customer. Case 2: Antecedent: Milk & Consequent: Orange juice Confidence: P (O/M) = P (M and O)/P (M) = 0.4/0.8 = 0.5 Lift: P (O/M)/P (O) = 0.5/0.6 = 0.83 Interpretation: A customer who purchases milk is 0.83 times likely to purchase orange juice than a randomly chosen customer. Suppose a customer base is of 1000 600 customers buy milk 400 customers buy cereal 300 customers buy milk and cereal P (M) = 0.6, P(C) = 0.4 and P (M and C) = 0.3 Support: milk implies cereal and cereal implies milk both have a support of 0.3 Case 1: Antecedent: Cereal, Consequent: Milk Confidence: P (M/C) = P (M and C)/P(C) = 0.3/0.4= 0.75 Lift: P (M/C)/P (M) = 0.75/0.6 = 1.25 Interpretation: A customer who purchases cereal is 1.25 times likely to purchase milk, than a randomly chosen customer. Case 2: Antecedent: Milk, Consequent: Cereal Confidence: P(C/M) = P (M and C)/P (M) = 0.3/0.6 =0.5 Lift: P(C/M)/P(C) = 0.5/0.4= 1.25 Total Transactions in the data set are 100,000. The data with respect to Support, Confidence, Lift and average profit per transaction is given in the following table. Rule Antecede Conse Support Confidence Lift No of Profit nt quent Transacti Per on Transac tion 1 AB C 0.02 0.8 20 INR 5 2 AC D 0.4 0.75 2 INR 4 3 CD E 0.2 0.8 5 INR 1 4 DE G 0.4 0.8 3 INR 3 5 HG C 0.45 0.75 2.9 INR 3.5 Transa Apple Orang Grapes Strawb Plums Green Banan ction e erry Apple a ID 1 1 1 1 0 1 1 1 2 0 1 0 0 0 1 1 3 0 0 0 0 0 1 1 4 1 0 0 0 1 0 0 5 1 0 0 0 1 1 1 6 0 1 1 0 0 0 1 7 0 1 1 0 0 0 1 Rule RHS Support Confiden Lift (LHS) ce Grapes Orange 0.43 1 1.8
Apple, Orange 0.14 1 1.8
Grapes Apple, Green 0.14 1 1.8 Grapes Apple Grapes Banana 0.43 1 1.2 Customer might have forgotten could be Orange, Green Apple, and Banana Transa Apple Orang Grapes Strawb Plums Green Banan ction e erry Apple a ID 1 1 1 1 0 1 1 1 2 0 1 0 0 0 1 1 3 0 0 0 0 0 1 1 4 1 0 0 0 1 0 0 5 1 0 0 0 1 1 1 6 0 1 1 0 0 0 1 7 0 1 1 0 0 0 1 Dice (Apple , Grapes)= 2*1/ (3+3) = 0.33 Apple Orange Grapes Straw- Plums Green Bana berry Apple
Apple N/A 0.29 0.33 0.00 1.00 0.57 0.44
Orange 0.29 N/A 0.86 0.00 0.29 0.50 0.80 Grapes 0.33 0.86 N/A 0.00 0.33 0.29 0.67 Straw- 0.00 0.00 0.00 N/A 0.00 0.00 0.00 berry Plums 1.00 0.29 0 0.00 N/A 1 1 Green 0.57 0.50 1 0.00 0 N/A 1 Apple Three Items in the Basket (Apple, Strawberry, Green Apple) One can use similarity to find out out which new items are most similar to the existing items in the basket Items in the Current Basket Apple Straw Green Max Rank berry Apples Similarity
Items Orange 0.29 0.00 0.50 0.50 3
not present Grapes 0.33 0.00 0.29 0.33 4 in the Basket Plums 1.00 0.00 0.57 1.00 1
Banana 0.44 0.00 0.80 0.80 2
It is given by PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) where PR(A) is the PageRank of page A,PR(Ti) is the PageRank of pages Ti which link to page A,C(Ti) is the number of outbound links on page Ti and d is a damping factor which can be set between 0 and 1. Damping factor d is usually set to 0.85, but to keep the calculation simple we set it to 0.5. The exact value of the damping factor d admittedly has effects on PageRank, but it does not influence the fundamental principles of PageRank. PR(A) = 0.5 + 0.5 PR(C) PR(B) = 0.5 + 0.5 (PR(A) / 2) PR(C) = 0.5 + 0.5 (PR(A) / 2 + PR(B)) These equations can easily be solved. We get the following PageRank values for the single pages: PR(A) = 14/13 = 1.07692308 PR(B) = 10/13 = 0.76923077 PR(C) = 15/13 = 1.15384615 How to solve these equation? Considering the example of items and baskets given above table, one can derive how the items are related to each other in the form of matrix. Conditions Q8: What testing strategy should be applied to find out how the model works? Clustering and segmentation of the customers according to various criteria and creating test strategy for each of the customer segments was one option. a) Bigbasket did not mandate users to provide demographic information when they signed up. This essentially meant that other than the user id, email and phone number, other fields that would help with customer segmentation such as gender, professional status (working/non-working), and number of people in the household, were not available. b) Creating customer segments based on the transactional data was another option. With this approach, one could look at the RFM model – recency of orders, frequency of orders, size of orders placed (i.e. number of items in an order) as well as the monetary value associated with the orders. Upon analyzing the data, it was determined that this would lead to an unmanageable number of customer segments and there was no way that models could be built for each of these segments separately, invoked, and tested at real time. Test using Monte Carlo cross-validation Use the purchasing behavior of a customer on any given day and randomly split the datasets for training and validation. The testing strategy was based on Monte Carlo cross-validation and tailored to fit the Bigbasket use case Create your own data set for all the questions of the case and answer these questions using excel or any other software. Date of submission is 11.01.18 by 2400 hours