Sie sind auf Seite 1von 95

Module II Data

Warehousing and Data


Session 7

Class Activity
List down some of your daily activities
Analyse what kind of data you are generating for
companies
Visualize the quantum of data you are generating
Understand how these data is being used for CRM by
companies

Learning Objectives
Understand the different types of Customer related data
- Corporate customer data
- Structured and unstructured data
Know good quality or clean data
Get introduced to the concept of Data Warehousing
ABSA case study for understanding the application of
data warehouse in CRM

What is Customer Related Data?


Any data related to the customer which is not only
about the customer but for the customer also
Can be historical, current and future perspective
Access to data about customer not only restricted to
functional areas such as sales, marketing & customer
service but also to third parties
Foundation for the execution of CRM strategy

Corporate Customer related data


Many no. of customer related databases from different
functional areas
Different databases might record different customer
related data such as opportunities, campaigns,
enquiries, deliveries, billing etc.
Can be about individual customers, customer cohorts,
segments
Contain product information, competitor information,
regulatory data or any other information needed for
development & maintainance of customer relationships

Structured and unstructured data


Structured Data 3 different types
- Hierarchical : Oldest form and not well suited for CRM.
Allows only one parent for a child
- Network : Though better than Hierarchical as this allows
to have multiple parents but nowadays Relational
database is used
- Relational : Store data in 2 dimensional tables
comprised of rows and columns

Relational database
Concept of primary key like for sales database each
customer is assigned a unique no. which is the primary
key
Share a common structure of files, records and fields
( tables, rows and columns)

3/14/16

Amit Kumar

What is data quality


Verifying the data : The data should be entered exactly as found in the
original source
Validation of data : Data should not have misspelt names, incorrect
titles and inappropriate salutations
De-duplication of customer data Customer data should not be
duplicated like Micheal Sethi is written as Micheal S and Micheal Sethi
though both are same customers
Removing a record that should be retained
Retaining a record that should be removed
Merge and Purge : When 2 or more databases are merged ( eg.
Marketing and customer service) to avoid any duplication of customer
data

Group Activity Admin groups


Visit any Retail store/Banks or any other relevant
establishments and understand
What kind of customer data are being captured by the
retail store at their point of sales
How do you think that these data are being used by the
retail store
Can you think of any gaps in their data collection
method or the quality of data being captured
Submit your report to me in the next class

Session 8 Learning Objectives


To understand the steps of designing a customer related
database through class activity and class discussion
What are the desirable data attributes : STARTS
Understand Data Integration
Introduction to Data warehousing and Data marts

How to develop a Customer related


database
Define the database
functions
Define the information
requirements
Identify the information
sources
Select the database
technology and operating
system
Populate the database
Maintain the database

Class Activity Designing a


Customer related database
Invite 3 students who will act as Sales Manager,
Customer Service Manager and Direct Marketing
Manager
Ask them to come out with their database function
requirements for a chain of spa
List down the database function requirements
Help them to identify the tables, columns and rows
Revisit the concept of primary key and how the tables
can be connected

STARTS Desirable data attributes


S : Shareable - Multiple users/departments may require to access
same data at same time
T: Transportable - The data needs to be available wherever and
whenever required by the users
A : Accuracy - The data quality should be good and free from any
inaccuracies like duplication, misspelt etc
R: Relevant Data should be relevant to make better business
decisions
T: Timely data Users should have access to use the data at right time
for providing customer offers and making business decisions
S: Security No compromises should be made on data security about
the customers

Data Integration
Integration of multiple databases in a standardized
manner is data integration
It helps to create a single view of the customer across
different departments
Mar
sale keti
ng Cust
s
Sup
ply
Fina chai
nce n

Single view of
the customer

servi
ce

Single view of
the customer

Fun Quiz
1. What is a data warehouse
a. Name of a store of a company
b. Collection of data
c. Warehouse of data
Answer is C

3/14/16

Amit Kumar

15

Fun Quiz
2. Data in the data warehouse can be helpful for better
decision making for a business.
a. True
b. false
Answer is a

3/14/16

Amit Kumar

16

Fun Quiz
3. ETL is an abbreviation for:
a. Elevation, Transfer and Loading
b. Extraction, Transformation and Loading
Answer is b

3/14/16

Amit Kumar

17

Fun Quiz
4. OLAP is an abbreviation for:
Answer is Online Analytical Processing

3/14/16

Amit Kumar

18

Fun Quiz
5. OLTP is an abbreviation for:
Answer is Online Transaction Processing

3/14/16

Amit Kumar

19

Fun Quiz
6. ERP is an abbreviation for:
Answer is Enterprise Resource Planning

3/14/16

Amit Kumar

20

Session 9 Learning Objectives


Understand Data warehouse and different attributes of
Data warehouse
Data mart
Difference between data warehouse and DataMart
Knowledge Management

What is a data warehouse?


Date Warehouse are repositories of large amounts of
operational, historical and other customer-related data

Data Warehouse
Subject oriented data organized around the essential
subjects of the business customers and products
rather than around applications such as inventory
management or order processing
Integrated It is consistent in the way that data from
several sources are extracted and transformed
Time-variant data are organized by various time
periods
Non-volatile The warehouse data is not updated in real
time. There is periodic bulk uploading of transactional
and other data

Data Marts
Scaled down version or
subset of the data warehouse
Customized for use in
particular department
Less complex and less expensive
Volume of data is less

Differences between Data


Warehouse and Data Mart
Data Warehouse

Data Mart

Stores all the kinds of data

Stores data only specific to the


function

Bigger size

Lesser size as data stored is only


specific

Multiple sources of data


integration

Lesser sources of integration as


data is specific

More difficult and time


consuming

Relatively simpler

Management is more complex

Easier to manage

More time taken to answer a


query

Lesser time taken to answer a


query

Expensive

Less expensive

Knowledge management
Practice of consciously gathering, organizing, storing,
interpreting, distributing and judiciously applying
knowledge to fulfill the customer management goals
and objectives of the organization
The STARTS attribute is valid for knowledge also as
Knowledge needs to e shareable, transportable,
accurate, relevant, timely updated and secured

Session 10 Learning Objectives


Benefits of Data warehouse
Setting up a Data warehouse
Simple Data warehouse architecture
Case Study First Source Corporation Handout

Benefits of Data Warehouse


Data from multiple sources/databases can be stored at
a single place
Enables a single view approach for searching/using data
Historical data is available
Improves data quality by removing duplication, errors
Regular updating of data
Restructures the data for business purpose
Add value to CRM through customer analytics such as
decision support systems, queries to business questions

Setting up a DW
Identify the sources of
Data
Where are the data
stored
Extract the data from
these systems
Transform the data in
standardized and clean
format
Upload the data from
these systems
Update/Refresh the data
in the warehouse

ETL ( Extract
Transform &
Load)

Simple DW Architecture
Data
Mart

Marketing

Sales

Billing

ETL

Integrati
on Layer

Data Warehouse

DM

DM
Supply Chain

Data Configuration for CRM


Analytics
Reporting

Data Warehouse

CRM
Analytics

OLAP Analysis

Data Mining

Reporting Generate Analytical


Insight
Provides simple list of information such as key accounts,
annual revenues, product wise sales etc
Can be standardized ( pre defined ) or query based
Standardized reports are difficult to customize
Query based reports provides a selection of tools which can
be used to construct a specific report required by the users
Can be generated to users in an array of visualization tools
such as tables, charts, graphs , plots, maps etc
Standard report can be generated in an excel sheet for
further analysis

Reporting - Example

OLAP Online Analytical Processing


Allows data stored in a Data Mart for analysis and adhoc
enquiry
Uses processes such as slice and dice, drill down and
roll up
Extremely valuable to users from different functions like
sales, marketing, customer service etc who can ask
different business questions
For example A sales person can analyze their territory
for sales and profit by customer, a customer service
person can analyze call response rates and resolution
time by customer and a campaign manager can analyze
the campaign effectiveness day wise, product wise and

OLAP Online Analytical Processing


Can support decisions in real time, for example,
propensity to buy measures can be delivered to a call
centre agent when the customer is on the phone
Information delivery mechanism is improved by making
the information available on the desktop in web browser
interface with graphical layouts and drill-down
Some of the major vendors for OLAP are Qlik, Microsoft,
IBM, SAS, SAP and Oracle

OLAP How data is stored?


Data are stored in 1 or more star schema
A star schema separates data into facts and dimensions
Facts are quantitative data like sales revenues, sales
volumes
Dimensions are the ways in which facts can be
disaggregated and analyzed such as sales revenue can
broken down in dimensions of geography and time
period

OLAP An example of data storage


Product

Time
Dimensio
n
Order date,
Year,
Quarter,
Month

Customer
Dimension
Name
Address
Age
Income

Dimension
Fact
Table
Total sales
revenue
Total
quantity,
Freight
discount

Name
Category
Price

Employee
Dimension
Name
Supervisor
Department
Region
Territory

Data Mining - Introduction


Data Mining is the application of descriptive and
predictive analytics to large datasets to support
different functions like sales, marketing, service, supply
chain, finance etc
Classificat
Works
in number of ways like :
ion

Estimatio
n

Prediction
Affinity
Grouping

Clustering
Descriptio
n

Module III Data Mining


Learning Objectives
- Understand what is Data Mining through Examples and
Applications
Data Mining Tasks relevant to CRM
- Classification
- Regression
- Link Analysis
- Segmentation
- Deviation Detection

The key in business is to know something that nobody else


knows.
Aristotle Onassis

What is Data Mining ?


Extraction of actionable knowledge from extremely large
datasets where it cannot be done manually
Technology to enable data exploration, data analysis, and data
visualization of very large databases at a high level of
abstraction, without a specific hypothesis in mind
Data search capability that uses statistical algorithms to discover
patterns and correlations in data
Data Mining is the application of descriptive and predictive
analytics to large datasets to support different functions like
sales, marketing, service, supply chain, finance etc

Evaluation of Data Mining


Stage

Business Questions Enabling


Technology

Characteristics

Data Collection

What is my average
total revenue over
the last 3 years?

Static Data

Data Access

What was the volume RDBMS , SQL


sales in last January
in Mumbai?

Dynamic data
delivery at record
level

Data Navigation

What was the volume


sales area wise in last
January in Mumbai
with a population of
greater than 1 million
for different
products?

OLAP ( Online
Analytical
Processing), multi
dimensional
databases

Dynamic data at
multiple levels

Data Mining

What is the likely


volume sales to
happen in Mumbai

Advanced Algorithm,
Multiprocessor
computers, massive

Prospective,
Proactive information
delivery

MS Excel

Why Data Mining is becoming highly


relevant?
More demanding customers
Growing business competition
Large sizes of databases Gigabytes and Terabytes
Data coming from many channels Online, mobile,
offline
Quick decision making is need of the time Remember :
Right offer to the Right Consumer at Right time through
Right Channel
Decision making with maximum knowledge

Application of Data mining

Data Mining is used in almost every industry

Data Mining applications in CRM


Sales Tracking
Customer Retention
Customer Loyalty
Purchase Behaviour
Cost Efficiency
Quality Control
Issue Resolution
Fraud Prevention
Inventory Management

Data Mining Applications - Retail


Applications

Relevance

Database Marketing

Develop customer profiles to run focused


and cost effective promotions like
customers who prefers buying designer
shoes from the retail store

Sales Forecasting

Use the time series pattern of customers to


predict the next purchase

Merchandise Planning and Allocation

Demographic and psychographic data of


consumers can be used to change the retail
layout and merchandise planning

Clustering

Creating customer clusters based on the


attributes ( demography, psychographic,
purchase pattern etc.)

Basket Analysis

Which are the product the customers tend


to purchase together example : Burger
and Cola

Class Exercise - 1
Write 3 Data mining applications and their relevance in
CRM for following industry?
- Telecommunication
- Banking

Two approaches to data mining


1. Directed (supervised, predictive or targeted)

Using input data to predict a specified output


2. Undirected (unsupervised)
Exploration of data sets to see what can be
learned

Levels of Data Mining Operations


1. Aggregate or the macro level
.Used when we do not have specific individual data

2. Individual or micro level


.When customer is tracked and data is mined at the
individual level
.Used to build up detailed profile of a regular customer
.Might be expensive

Macro level mining


Is useful when
Information about individual customer is not available.
Hence characteristics of the group are extrapolated
Targeting new set of customers
We deal with those aspects of the service which affect
the majority and can not be customized.
Predicting the possibility of an action that the customer
has never undertaken

Micro level mining


Is useful when
The firm wants to customize its offerings
The firm wants to assist the purchase of a new product
based on information it has on the last purchase
The firm wants to take advantage of personal events in
a customers life
Current patterns that go against usually observed
customer behaviour.

Data mining tasks relevant to CRM


Classification: mapping a given data item into one of the
several predefined classes
Regression: Predicting the value of a dependent variable
based on values of other independent variable
Link Analysis: Establishing relationship between items or
variables in a database record to expose patterns and trends :
Association rules, Sequential patterns, Time Sequences
Segmentation: Identifying a finite set of naturally occurring
clusters or categories to describe data : Clustering
Deviation Detection: Discovering the most significant change
in the data from previously measured or expected values.

Data mining tools & techniques


Decision Trees
Rule induction
Visualization Techniques
Nearest Neighbour Technique
Clustering Algorithms

Choice of the tool depends upon


Capabilities of the tool
The question we seek to answer
Multiple tools might answer the same question but to
different degrees of satisfaction and completeness!

Decision Tree Tools and Techniques


Hierarchical collection of rules that describe how to
divide a large collection of records into successively
smaller groups of records. With each successive
division, the members of the resulting segments
become more and more similar to one another with
respect to the target

How it works?
Decision trees recursively split data into smaller and
smaller cells which are increasingly pure "in the sense
of having similar values of the target
Decision tree uses target variable to determine how
each input should be partitioned
Breaks the data into segments, defined by splitting
rules at each step.
Taken together, the rules for all the segments form the
decision tree model

Decision Tree - Example


Monthly
Recharge
Frequency
0-2'
2-5'
2-5'
0-2'
2-5'
2-5'
2-5'
0-2'

Talk time
recharge value
(Rs.)
50
50
100
150
150
50
150
50

3G data
recharge
value
200
250
200
200
200
200
250
250

Class
A
B
A
B
B
A
B
B

Decision Trees -Example


TT Recharge Value
150

50
100

Recharge Frequency 0-2, 200= B


2-5, 200 = B
{2-5 ,200 = A}
2-5, 250= B

ompletely classifies data with 100 and 150 TT recharge

0-2,
2-5,
2-5,
0-2,

200
250
200
250

=
=
=
=

A
B
A
B

Does not completely classify


TT Recharge value of 50 .
More iteration needed
58

Decision Trees - Example


TT Recharge Value
150

50
100

Recharge Frequency 0-2, 200= B


2-5, 200 = B
{2-5 ,200 = A}
2-5, 250= B

Decision tree is complete because


1. All 8 cases appear at nodes
2. At each node, all cases are in
the same class (A or B)

3G Data Recharge Value

200

250

Recharge Frequency 0-2= A Recharge Frequency 2-5= B


2-5 = A
0-2= B
59

Decision True Creating Predictive


rules
TT Recharge Value

150

50
100

3G Data Recharge Value

200

250

Decision Tree Benefits and Uses


can be used for classification, estimation and prediction
useful for data exploration and variable selection even
when you plan to use a different technique to create
final model

Class Activity I Case Study on Data


Mining
CrossSelling
in
Mail
Order
Business
1.Why do you think the response to the initial mail order campaign was not effective?
2.Which factors were added in the improved predictive model?
3.What could have been the CRM challenges in getting a higher response from the
customers without using data mining?
4. What is the data mining tools and technique used by the company for predictive
modelling?

Class Activity II Decision Table and


Decision Tree
Conditions

Value for money purchase


satisfied with customer service

Actions
Will be retained
will churn
refer customers

yes
no

no
yes
no

no
yes

yes
no
no

yes
yes

yes
no
yes

no
no

no
yes
no

Draw a decision tree for the above table and submit it ?

Regression Analysis

Regression Analysis
Predictive modeling technique
Both input and target variables should be numeric

Purpose of regression
Quantify the relationship among two or more variables.
Explain a dependent variable, from a set of predictor
variables, called the independent variables
Uses a linear additive relation between the dependent and
independent variables

Concepts
Variable
Dependent variable
Independent variable
Correlation
Correlation Coefficient
Line of best fit

Regression can
Estimate the value of target variable
Describe the relationship between variables
Residuals ( Actual Predicted)
R2 (Coefficient of Determination)

Example Scatter Plot for Relationship


between Tenure and Revenue for
Newspaper Subscribers

Example Scatter Plot for Relationship


between Tenure and Revenue for
Newspaper Subscribers
Best
Fit
Line

Class Exercise What will be the


estimated revenue for a tenure of 100?
Best
Fit
Line

Class Exercise - Answer

Multiply the tenure by


$0.56 and subtract
$10.14
$0.56X100 - $10.14 =
$45.86

Demand Analysis
Salest = a + b1 Pricet + et

Simple Regression

Yt a b1 X1t e t

Class Exercise Trend of R Square for 4


Different sets of Data

Future
Prices
Regression

Regression

Multiple Regression
Multiple independent variables

Yi ab1 X 1i b X 2i .... b X ki ei

Example
Salest = a + b1* Pricet + b2* Advt + et

Link Analysis

Link Analysis
Based on a branch of mathematics called Graph Theory,
which represents relationships between different objects
as edges in a graph.
It can be used for both directed and undirected data
mining

Graph theory
Helps in visualizing relationships
Is not applicable to all types of data
Cannot solve all types of problems
Yields good results in
Analyzing link between web pages
Analyzing telephone call patterns to find influential customers
Understanding physician referral patterns

Assembling links into a useful graph can be a data


processing challenge.
Links between web pages---HTML of the pages
Links between telephones- call detail records

Links are implicit- A data mining challenge!!

Example Website Link Analysis


Analyse the incoming links to your page
Evaluate the quality of link
Analyse the link building strategy
Improve your page rankings
Used in SEO

A Graph consists of
Nodes: (Vertices) Things in the graphs that have
relationships Eg. People , Organization, Objects
Edges: Pairs of nodes connected by
relationships

Examples of Graphs and Nodes

Planar graphs: Graphs which can be drawn on a piece


of paper without having any edge intersect.
Connected graph: When a path exists between two
nodes in a graph. (Most graphs are tightly connected
islands with few bridges)
Path : is an ordered sequence of nodes connected by
edges

Directed Graph: Edges are like one way roads


going from one direction to the other (Outgoing edge,
Incoming edge)
Undirected graph: same number of lines in each
direction
Source node: All outgoing edges, no incoming
edges.
Sink node: When all edges are incoming edges

Weighted graph: A type of graph whose all edges


have weights associated with them
Cycle: When a path starts and ends at the same node.
Cyclic graph: When a directed graph contains at least
one cycle

A common problem in link analysis


What is the shortest path between two nodes?
The definition of shortest depends upon the weights
assigned to the edges

Traveling Salesman Problem

Social Network Analysis


Networks or graphs used to represent all kinds of
relationship between people such as kinship, commerce
and even transmission of disease.

Linkedin
Facebook (predicting the home address)
Dating sites
Six degrees of Separation (Stanley Milgram, 1967)

Das könnte Ihnen auch gefallen