Sie sind auf Seite 1von 12

Business Process Services

White Paper

Price Elasticity using Distributed


Computing for Big Data
About the Authors

Rajesh Kavadiki
Rajesh is part of the Analytics and Insights team at Tata Consultancy Services (TCS).
He has over ten years of experience in Big Data analytics and machine learning,
and is currently working on parallelizing machine learning algorithms on the Big
Data platform. He has worked extensively on Java, MapReduce, and Big Data
related technologies with specialization in retail and social media.
Abstract

As retailers deal with intense competition and changing customer preferences


globally, pricing remains a critical factor that can determine their success. In today's
uncertain economy, the consumer decision making cycle is complex and price is a
key influencer of their buying decisions.

Retailers are compelled to make price changes frequently due to competition,


seasonality, offers and promotions, and so on. This requires them to accurately
estimate and forecast the responsiveness of demand to these price changes. They
need to be able to gauge the price sensitivity, as an increase in price could lead to a
decline in sales. Similarly, a decrease in price could lead to excessive demand
leading to out of stock scenarios. Price elasticity is a common measure of price
sensitivity used by retailers to develop appropriate pricing strategies and enhance
sales revenues.

Predicting sales or volumes in response to price changes is particularly challenging


in the case of large retailers with a global presence and multiple stock keeping
units (SKUs). Traditional analytics tools are limited in their capabilities and unable
to support the analysis of large data sets with agility across multiple SKUs. This
paper proposes the use of log-linear models on distributed computing or Big Data
to measure price elasticity of items across millions of transactions. The aim of this
method is to help retailers assess price elasticity with agility and accuracy across all
the SKUs in their repertoire.
Contents

Is the Price Right? 5


Interpreting Price Elasticity 6
Choosing the Recommended Model and Framework 7
Addressing Price Elasticity with the Log-linear Model 7
Validating the Log-linear Model 9
Applying the Log-linear Model to Data 10
Using Analytical Models to Determine Pricing 11
Is the Price Right?
Globally, retailers acknowledge that pricing is a vital factor in consumers' decision making process.
Depending on the type of product, consumers can be very resistant to any price changes. Pricing also
has an impact on the positioning of a product. Consumers seek value in a product, which is a function of
the perceived benefits and the product's price. We may commonly assume that more consumers will
purchase a product when the price is lower. But this notion is negated when consumers assume that a
higher priced product provides more quality and value. Hence, pricing is an all-important variable from a
retailer's perspective.

Intense competition, seasonality, and other such factors compel retailers and e-commerce vendors to
amend prices of stock keeping units (SKUs) on a daily basis. There is also great pressure on store
managers to predict the sale quantity before the price changes in order to forecast demand and inform
suppliers or vendors accordingly to prevent out of stock situations.

Price elasticity is a statistical technique that can be incorporated into a software model to predict the sale
quantity depending on the percentage of the change in price. The model is designed to predict the sale
quantities depending on the price change, promotion offers, seasonality, and changes in competitors'
SKU prices.

Determining price elasticity is extremely important for supporting retailer pricing decisions aimed at
reducing price resistance. The extent to which a product is price elastic impacts the volume of sales and
the revenues of a retailer. Therefore, marketing managers need to consider price sensitivity of the market
and the impact of price change on the sales revenue.

We have the example of an e-commerce retailer who offered huge discounts, resulting in most of the
products going out of stock within minutes. This happened because the retailer was not able to judge the
surge in demand caused by the drastic reduction in price. This in turn led to customer dissatisfaction and
negative feedback all over social media. Retailers thus need to consider building price elasticity models
for every SKU across stores to avoid such scenarios.

With multiple SKUs in their portfolio, retailers are dealing with large data sets. Retail stores, particularly
those spread across countries, have millions of transactions every day with data sizes ranging in hundreds
of terabytes. To process this amount of data, traditional statistical software or systems would need to run
for days, and in most cases, would run out of memory while analyzing them. Traditionally, therefore,
retailers use models built only to handle a few sets of SKUs. Alternatively, store managers rely on their
intuition and experience to predict sales volumes. Now retailers can realize better outcomes and
accurately evaluate price elasticity of products by leveraging a statistical model and framework that uses
distributed computing or Big Data.

5
Interpreting Price Elasticity
The following formula is generally used to find if the product is price elastic, inelastic or neutral:

Price elasticity of demand = Percentage change in quantity demand / Percentage change in price

Price Elasticity of demand = (Q-Q)/[(Q+Q)]

(P-P)/[(P+P)]

where

Q is the current sold quantity after the price change

Q is the previous sold quantity before the price change

P is the current price after the price change

P is the price before the price change

This formula gives us three possible scenarios:

Scenario 1: Product is Price Elastic

If price elasticity is less than -1 it means that the product is price elastic; that is, if there is an increase in
price the total revenue falls, and if there is a decrease in price the total revenue increases.

Scenario 2: Product is Price Inelastic

If price elasticity is greater than -1 and less than 0, it represents that the product is price inelastic; that is, if
the price increases the total revenue increases, and if the price decreases the total revenue decreases.

Scenario 3: Product is Unit Price Elastic

If price elasticity is equal to -1, it represents that the product is unit elastic. This means if the price
increases the total revenue remains unchanged, and if price decreases the total revenue remains
unchanged.

The products that belong to the price inelastic category are those for which there are very few
competitive alternatives, such as unique irreplaceable spare parts. So, there is an opportunity for a retailer
to increase the price of these commodities and gain more revenue.

The products that belong to the price elastic category are those for which there are competitive
alternatives available and an increase in price could cause the customers to switch to other brands. An
example of this could be consumer goods such as personal grooming products or household products.
The total revenue here is the function of price and quantity.

The products that belong to the unit price elastic category are the ones for which a unit increase or
decrease in price would cause the same unit decrease or increase in demand. Hence a price change
would not make any impact on the revenues. 6
Price elasticity is not only applicable to retail, but also to various other industries and sectors such as
government and taxation. Cigarettes as a product category would fall under the price inelastic category,
since there are no alternatives to cigarettes as a product. So if the government tries to increase the price
of the cigarettes, the demand would almost remain the same or would decrease by a negligible margin,
thus increasing revenue. Another example would be an increase in tax by the income tax department,
which affects everyone in the supply chain differently. If there is an increase in tax on the purchase of raw
materials, the supplier might choose to pass on the price change to the manufacturer. Now the
manufacturer or retailer might decide to pass on the tax burden to the customer or may try to find
different ways to absorb the price change, which would be completely dependent on the price elasticity
of the product.

Choosing the Recommended Model and Framework

A distributed computing model using MapReduce framework can be used to measure price elasticity of
items across stores and millions of transactions. Since using traditional methods and systems (like SAS, R,
MATLAB) would be difficult to process big data sets, this approach uses Hadoop Distributed File System
(HDFS) along with MapReduce. Hadoop can scale horizontally, enabling retailers to use a cluster of
systems to build models for every SKU in the store. These predictive models can then be used to predict
the sales quantity for a price change, and thus, help store managers ensure stock availability before the
price change occurs.

Furthermore, the recommended solution uses Mahout, an open source distributed machine learning
library to overcome the limitation of existing traditional systems. The design and implementation
methodology demonstrates the feasibility of this approach to solve price elasticity model using log-linear
regression. The solution is generic and can be used with various predictor variables as per business needs.

In addition, we also suggest a parallel implementation of the log-linear regression model using the
MapReduce programming concept. Log-linear is a powerful technique in regression models, which have
become popular in solving price elasticity for retailers.

Addressing Price Elasticity with the Log-linear Model

Price elasticity generally follows log-linear models as compared to linear models, since the change in
quantity does not change linearly with price. Log-linear models are similar to linear models; however
both independent and dependent variables are log transformed. Since log is applied on both dependent
and independent variables, some researchers call it the log-log model.
Price Price

Linear Quantity Log Linear Quantity


7
Figure 1: Comparison of linear and log-linear regression
Log-linear models can be depicted as follows:

Log(sales) = +0log(target SKU price)+1 * I(TPR)+2 * I(frontPage)+3 * I(Coupons)+4 * I(ads)+


5log(competitor SKU1 price)+6log(competitor SKU2 price)+ 7log(competitor SKU3 price) 1.1

where

Total Price Reduction (TPR), front page, coupons and ads are some of the promotional offers that are
applicable for a store on a particular day

Represents base sales

n Indicates coefficients of different independent variables

I(X) Indicator function that takes value 1 or 0 depending on the promo offer

The above equation 1.1 can generally be represented as

Y=+0X0+ 1X1+ 2X2+ 3X3++ nXn 1.2

Here, Y is a single dimensional matrix with log normalized sales, and X0..n represents n dimensional matrix
with each column having unique log normalized feature. and coefficients can be found by ordinary
T -1 T
least squares formula (X X) X Y.

= (XTX)-1XTY 1.3
T -1 T
The above equation (X X) X Y has an inversion operation which is highly computation intensive for a very
large matrix. This can be avoided by parallelizable singular value decomposition.

Singular value decomposition (SVD) is the factorization of a matrix into three different matrices (U, D and
V). Multiplication of these three matrices (U, D and V) would result in the initial matrix. Equation 1.3 can
further be decomposed using SVD as:

= (V*D-1*UT) Y 1.4

Here, U is an m by m orthogonal matrix, V is an n by n orthogonal matrix, and D is an m by n diagonal


matrix with singular values (U, D, V), which can be obtained from a singular value decomposition of X.
SVD is a parallelizable algorithm in Big Data environment.

The algorithm to get and coefficients using MapReduce is as follows:

1. MapReduce job to append 1s as the initial column of X.

2. MapReduce job to calculate transpose of X to get XT


T
3. MapReduce matrix multiplication to get X X
T
4. To get inverse of the above matrix, MapReduce job to perform singular value decomposition of X X

5. MapReduce job to take the reciprocal of the diagonal elements in matrix D

8
6. MapReduce job to take the transpose of U
T
7. MapReduce job to take the transpose of V
-1 T T
8. MapReduce job to perform multiplication of V * D * U * X * Y

Validating the Log-linear Model


The data set is randomly sampled with a 70 to 30 ratio. 70 percent of the data set is used to train the
model as described in the previous section, and the remaining 30 percent is used to validate the model.
Once and coefficients are computed as described, they are substituted back into the test data sample
to check the accuracy of the model. As the solution described in this paper doesn't use any sampling
techniques and uses distributed computing to decrease the computational time, the accuracy would be
the same as the traditional systems.
Equation 1.1 can be rewritten as follows to predict sales with change of price

S = e+


(Target SKU Price) + SKU Price factor

e* I(TPR)+
e* I(Front Page)+
Promotion Offers
e* I(Coupons)+
e* I(Ads)+

(Competitor SKU Price) +

(Competitor SKU Price) + Competitor Prices

(Competitor SKU Price)

The sum of square error term is calculated by

SSE = Sqrt((Strue Spredicted ) /N)

i=1

Where

Strue is the true sales for the day and for a store

Spredicted is the predicted sales from the equation

N is the total number of records or rows in the test data. 9


If the accuracy of the predictions is found to be low, the analyst can try to plug in different independent
variables that contribute to the sales such as weekends, holidays, seasonality, cannibalization, and so on.

Applying the Log-linear Model to Data


With the objective of predicting sales quantity, the data set would have log normalized sales quantity as a
target vector with three major categories of predictor variables:

1. Price of the targeted SKU

2. Promotion offers for the targeted SKU

3. Prices of competitor SKUs

As price elasticity is solved using log-linear regression, all the continuous variables are log normalized and
indicator variables are left intact. The prices of targeted SKU and competitor SKU are log normalized. The
promotion offers are not log normalized as these are discrete. Each row of the data describes day level
characteristics for a store (see Table 1).

Store Total Front TR_SKU TR_SKU TR_SKU


SKU_Price Date TPR Coupons Ad
# Sales Page 1 Price 2 Price 3 Price

Table 1: Indicative Dataset Row with Day Level Characteristics for a Store

The total sales in Table 1 reflect the sales quantity for the SKU in a store on a particular day. Total Price
Reduction (TPR), front page, coupons and ads are some of the promotional offers that are applicable for a
store on a particular day. TR_SKU1 price, TR_SKU2 price, TR_SKU3 price are the competitor SKU prices on
that day. The recommended model writes a hive SQL script, which in turn runs a MapReduce job to join
different tables and aggregate sales on a particular day for a store from millions of transactions.

Table 1 is for representation purposes only and the real world scenarios could be different. In the real
word, in case of large multinational stores, millions of transactions are conducted in each store and there
could be thousands of such stores. In such cases, the model needs to:

Aggregate millions of transactions with respect to sales for each store and for each SKU every day

Aggregate the total number of daily sales for every SKU across different stores in each row (example:
with approximately 1000 stores and for 730 days, the model would generate (1000*730)
approximately 0.7 million rows)

Perform the above step for every SKU (example: with approximately 10,000 SKUs, the model would
have 10,000 different data sets each with approximately 0.7 million rows)
10
Perform log-linear regression for each of the datasets (approximately 10,000), which would result in
alpha and beta coefficients

Substitute the alpha and beta coefficients in real time to predict the sales quantity

Using Analytical Models to Determine Pricing


With the retail industry witnessing intensifying competition, it has become imperative to gauge the
impact of price on the sales volume. Therefore, formulating accurate pricing strategies with the help of a
scientific approach is critical from a profitability perspective, for both brick and mortar as well as online
retailers. With exponential growth in data, traditional systems are increasingly constrained in processing
huge datasets. This has resulted in the shift to distributed computing for analyzing large data volumes.

The analyses recommended in this paper were carried out using the three-stage analytical framework
including data aggregation, model building, and validation. Log regression has been employed to build
models at SKU-day level for computing sales. The columns or independent variables used in this model
are for indicative purpose and these variables can be extended to include other characteristics such as
seasonality, store level characteristics, and indicators for weekend or holiday sales. Retailers can use these
characteristics and many more depending on their business model to accurately predict sales for a given
price change. This accuracy could help separate the winners from losers in an increasingly competitive
retail environment.

11
About TCS Business Process Services Unit
Enterprises seek to drive business growth and agility through innovation in an increasingly
regulated, competitive, and global market. TCS helps clients achieve these goals by managing and
executing their business operations effectively and efficiently.

TCS Business Process Services (BPS) include core industry-specific processes, analytics and insights,
and enterprise services such as finance and accounting, HR, and supply chain management. TCS
creates value through its FORETM simplification and transformation methodology, backed by its deep
domain expertise, extensive technology experience, and TRAPEZETM governance enablers and
solutions. TCS complements its experience and expertise with innovative delivery models such as
using robotic automation and providing Business Processes as a Service (BPaaS).

TCS' BPS unit has been positioned in the leaders' quadrant for various service lines by many leading
analyst firms. With over four decades of global experience and a delivery footprint spanning six
continents, TCS is one of the largest BPS providers today.

Contact
For more information about TCS' Business Process Services Unit, visit: www.tcs.com/bps
Email: bps.connect@tcs.com

Subscribe to TCS White Papers


TCS.com RSS: http://www.tcs.com/rss_feeds/Pages/feed.aspx?f=w
Feedburner: http://feeds2.feedburner.com/tcswhitepapers

About Tata Consultancy Services (TCS)


Tata Consultancy Services is an IT services, consulting and business solutions organization that
delivers real results to global business, ensuring a level of certainty no other firm can match.
TCS offers a consulting-led, integrated portfolio of IT and IT-enabled infrastructure, engineering and
assurance services. This is delivered through its unique Global Network Delivery ModelTM,
recognized as the benchmark of excellence in software development. A part of the Tata Group,
Indias largest industrial conglomerate, TCS has a global footprint and is listed on the National Stock
Exchange and Bombay Stock Exchange in India.

For more information, visit us at www.tcs.com


TCS BPS Design Services I 03 I 15

IT Services
Business Solutions
Consulting
All content / information present here is the exclusive property of Tata Consultancy Services Limited (TCS). The content / information contained here
is correct at the time of publishing. No material from here may be copied, modified, reproduced, republished, uploaded, transmitted, posted or
distributed in any form without prior written permission from TCS. Unauthorized use of the content / information appearing here may violate
copyright, trademark and other applicable laws, and could result in criminal or civil penalties. Copyright 2015 Tata Consultancy Services Limited

Das könnte Ihnen auch gefallen