Sie sind auf Seite 1von 61

Data Mining & Knowledge

Discovery: Personalization
Technologies for One to
Bhagi Narahari
One Marketing

Outline of Lecture

What and Why of Data Mining and KDD?

How ?
Personalization

Importance and Applications to E-commerce

personalized one-to-one business on the internet

Part I: Overview of Personalization


Part 2: The Data Mining Process

Predictive Modelling

A black box that makes predictions about


the future based on information from the
past and present
Age
balance
income

Model
(Crystal ball?)

How much will customer


spend on next
catalog order ?

What is Data Mining?

It is the exploration and analysis by


automatic or semiautomatic means, of large
quantities of data in order to discover
meaningful patterns and rules.

Why now? (A historical


perspective)

Because data is now available (wasnt


always)
Distributed sources
Technology evolution
Competition (do what you can to outdo)

Why DM?

CRM (Customer Relationship Management) important success factor in E-commerce

price differentiation no longer enough


customer service more important

Links with suppliers already exist (B2B) - JIT,


joint forecasting, planning, procurement
Current emphasis on links with customers feedback, input in design, etc.

CRM

Identifying profitable customers


Better service for more valued customers
Retaining profitable customers

Getting a new customer costs a lot more than


retaining an existing one
takes 5X to acquire new customers (Peppers&Rogers)
An increase from 75% to 80% in retention reduces
costs by about 10%

Larger share of customer pool

CRM

Product differentiations based on price


and quality are increasingly difficult

need to differentiate based on relationships

Increasingly sophisticated mass marketing


increases probability of success

cost of mass marketing is driven down by


internet (reach)

CRM

Goal: Positively interact with your customers


and prospects

define customer segments


lights out execution of campaigns against
segments
attribution and evaluation of responses

Personalization in Ecommerce

Positive:

much better chance of personalization


customer identification
tracking across visits and within visit
ability to do what if experiments

Negative:

cost of switching is much less


is web based shopping good for touchy feely things
price differentiation across geographies not easy

Personalization

Customer Chain

Product
Discovery

Customer Service
& Support
Product
Evaluation

Order
Payment
Terms
Negotiation

Market
Research

Order
Placement
Customer Service
& Support

Market
Stimulation/
Education

Producer
Chain

Order billing
and payment
management
Terms
Negotiations

Order
Receipt

B2C Personalization Objectives

Know the customer

Determine what the customer wants

profile - registration, cookies


Ask: Questionnaires
what is the incentive for truthfulness
Deduce: click streams, history, collaborative filtering
(Amazon!!)

Deliver

Customize the look and feel


offer special promotions
offer customized products (Holy Grail)

Use of Personalization

In addition to storing and retrieving


information on the individuals profile on
the fly

can also use mining software to analyze the


information in the database to make
recommendations or comments specific to the
individual

Impact of Personalization

Customer relationship
Learn more about customers

learn and understand the why and how they


prefer to do business with your organization

In tandem with tracking provides you with a


tool to monitor your website

what works, what doesnt, what makes your


audience click

Security and Privacy as


Barrier to Personalization

Large number of customers concerned


about personalization (double click!)
will they pay more to preserve privacy?
Some falsify info to preserve privacy
customers give more info to trusted site
need secure site with clear privacy policies
stated at site

Personalization
Know the Customer
Questionnaires
Past history
Click Streams

Identify

Login
Credit Card#

Give the customer


his/her wants
Look
&feel

Profile

Mapping to
peers

Product
selection&
promotions

Extrapolation
from past

New
Product

Predicting the wants

Extrapolation
from peers (firefly.com)

Know the customer

Cookies

OPS: Open Profiling Standard

combined with eTrust certification

Registration

backlash (users do not trust them)

User certificates: logons

Key Question:

how do you know that this customer is same as that goes to


your storefront
need standard warehouse techniques like address
resolution, cred.card resolution etc.

Know the Customer:OPS

Two drivers

user should not retype again & again basic info


data is used in a trusted fashion (not leaked, other data not
see etc.) by users

Two parts

Common data
demographics (country,zip,age,gender)
Contact (name, address, CreditCard)
User agent preferences
Per-site Sections (can be shared across sites, if user
allows)

What if no profile???

Deduce

Predict behaviour

collect information: history of purchases, time


spent on pages
ask questions (offer rewards)
combine with database marketing data
buy probabilities
build customer relationship

mining is key!

Personalization: Actions to
take- Look and feel

Personalized pages

specific data
specific presentation and design
sent through various mediums
Manage Customers not products: 1-1 marketing

Strategy.com

deliver personalized pages


eg: stock portfolio, personal info including alarm,
travel reservations
use different mediums
WAP enable phones (eg: Sprint PCS Web)

Storefront Personalization

Customers visit Store Website

Howard buys ties


Rob buys Baby Products
Ray buys toys
Amy buys clothes

Provide a view of the store to these customers

present them with what they are likely to buy?


Howard: ties, and mens formal wear
Ray: Toys and gadgets
Rob: Infant, Toddler section
Amy: Womens Clothes section

More Actions: Product


Presentations & Promotions
Basic Storefront Product Hierarchy
Clothes
Mens
Shirts

Pants

Johns View

Womens

Childrens

Casuals Evening

Infants

Marys View

Kids

BroadVision.com

BroadVision One-to-One application

allows businesses to develop and manage


personalized web sites
interactively profile each visitor and dynamically
match info based on their profile and business
rules specified by providers of site & services
users do not go through hoops finding
relevant data

DM Terminology
Rule Based Systems

OLAP

Data Marts

ROLAP
Data Warehouse

SQL
Data Stores

Genetic Algorithms

Neural Networks
Data Mining

How?

Determine probability of buying as a function


of customer attributes such as age, income,
past buying patterns, ..
Target customers by ranking from highest to
lowest probabilities
Other techniques: Decision Trees, Neural
Networks, .

KDD

Knowledge Discovery in Databases


It is the process of identifying valid, novel,
potentially useful, and understandable
patterns in data (Fayyad, Piatesky-Shapiro,
and Smyth)
It involves data preparation, pattern
extraction, knowledge evaluation, and
refinement, in iteration

KDD

Data mining is a step in the KDD process


that involves the application of certain
algorithms to extract patterns
Steps in the KDD process:
Select Data
Data Cleansing and Pre-processing
Data Mining
Results interpretation
Implementation

Pre-processing in KDD

80-90% of KDD process is spent here


Why?
Operational data is incomplete, inconsistent,

in different formats across systems


DM techniques might require data in a
specific format

Data Mining Problems

Classification/Segmentation

Binary (Yes/No)
Multiple Category (Large/Medium/Small)

Forecasting (how much)


Association Rule extraction (market basket
analysis)
Sequence detection

balance increase -> missed payment -> default

Typical DM tasks

Prediction and Classification

Directed
Decision trees, Neural networks, memory based
reasoning, logistic regression
Examples:
How many units will be sold on a given day?
What will be the stock price on a given day?
Will a customer buy the product or not?

DM tasks

Affinity grouping

Undirected
Which products go together naturally?
The beer-diaper syndrome?
Market basket analysis
Examples:
Which products peak in demand
simultaneously?

DM tasks

Clustering task

Undirected
Segmenting into similar clusters
Different from classification
Examples
Customers with similar buying profiles
Products with similar demand patterns

DM success factors

Integration with data warehouses and DSS


Users should develop a good understanding
of techniques
Recognize that these tools cannot
automatically find patterns without being told
what to do
Most methods now used are extensions of
analytical methods that have been around for
decades

Legal and Ethical Issues

Privacy concerns

Often data included in the data warehouse cannot


legally be used in decision making process

becoming more important


will impact the way that data can be used and analyzed
ownership issues
European data laws have implications on US

Race, Gender, Age

Data contamination will become critical

Making Decisions
Data

Data

Data

Data

Data Warehouse?
Models

Decisions

Data Warehouse

Bill Inmon: A data warehouse is a subjectoriented, integrated, time-variant, nonvolatile collection of data in support of
management decisions.
is managed data that is situated after and
outside the operational systems

Data Warehousing

Increasing need to find, summarize, and


interpret large amounts of data effectively

Especially when data is distributed across many


different databases

Transaction processing systems not easily


accessible to other systems

Plus TP systems have time constraints

Enter the Data Warehouse

To deliver decision data to decision makers


by integrating data from various TPS to a
single storage which can then
feed a range of decision support
applications
through an OLAP interface!

Data Complications

Noise
Missing data
Transformation

numeric data
text

Need to differentiate between variables you


can control and those you cannot

Actionable: size of discount, number of offers etc.


Non-actionable: age, income ..

Data Mining Techniques

Market Basket Analysis


Memory Based Reasoning
Cluster Detection
Link Analysis
Decision Trees and Rule Induction
Neural Networks
Genetic Algorithms
OLAP

OLAP: On Line Analytical


Processing

While a data warehouse brings data together,


OLAP lets you look at data and manipulate
interactively
OLAP allows users to slice and dice data
Allows user to drill-down into detail data

Relational vs
Multidimensional

Consolidations

Multidimensional
Terminology

East, West, Central are input members of the Region


dimension. Total Region is an output member of the Region
dimension. Similarly, Nuts, Screws, Bolts, Washers, and Total
are members of the Product dimension.

Variables are typically numerical measures like Sales, Costs,


Profits, Expenses, and so forth.

Dimensions are roughly equivalent to Fields in a relational


database. Cells are roughly equivalent to Records.

Steps in DW and OLAP


Data

Data

Data

Data Loader
Data Converter
Data Scrubber
Data Transformer
Data Warehouse

OLAP Server

OLAP Interface

Cluster Detection

Undirected data mining


Finds records that are similar to each other
(clusters)
Clusters are found using geometric
methods, statistical methods, and neural
networks
Good way to start any analysis

Market Basket Analysis

Form of clustering used for finding items


that occur together (in a transaction or
market basket)
Likelihood of different products being
purchased together as rules
Planning store layouts, limiting specials to
one of the products in a set,...

Transaction data
Customer

Products

Milk, Soda

Milk, Beer,
diapers
Milk, cleaner

3
4
5

Beer, diapers,
soda
Beer, soda

Co-occurrence matrix
Beer Clean Milk Soda Diapers
er
Beer 3
0
1
2
2
Clea 0

Milk 1

Soda 2

Diap 2

Support and confidence

For a rule that says: If A then B


Support is defined as the ratio of number of
transactions that include both A and B to total
number of transactions
Confidence is defined by the ratio of the
number of transactions that include both A and
B to the number of transactions that include A.
How do you specify significant support and
confidence ?

Algorithm for Finding


Association Rules

Input is Min-Support and Min-Confidence


Find all sets of items with Min-Support
(frequent itemsets)

Frequent Itemsets Property: Every subset of a


frequent itemset must also be a frequent itemset
iterative algorithm: start with frequent
itemsets with one item, and construct larger
itemsets using only smaller frequent
itemsets.

MBA example

Using the sample data create a co-occurrence


table

Let relevant Support = 25% and Confidence= 50%:

Beer and Diapers appear in 3/5= 60%

If beer then diapers has confidence of 2/3=67%

Thus, If customer buys beer then customer buys


diapers satisfies 25% support & 50% confidence

Conclusion drawn by mining system:

Customers who buy beer also buy diapers

Applying MBA Results

Is the relationship useful ?

Beer and Diapers may not be of use


Victorias Secret transaction mining led to specific
apparel sent to specific stores -- Microstrategy
software

Who defines usefullness

only as good as rules specified by


humans/marketing workforce
NBA mining: designers of s/w did not include height
mismatches at firstcoaches made the correction

Data Mining Algorithms

Four algorithms commonly cited

Association Rule (used in over 90% of the cases!)


Nearest Neighbor
quick and easy but models get large
Decision Tree
Neural Network
difficult to interpret and large time

Decision Trees

Series of if/then rules

easy to understand, complexity in implementation

Balance<10K

Balance > 10K

yes
Age< 48
No

Age > 48

yes

CRM and Data Mining

Recall:customer segmentation is key in CRM

data mining can help improve understanding of


customer behaviour
helps located meaningful segments from
customer data
users want to turn that understanding into an
automated interactions with their customers

Integrating Data Mining &


CRM

Data mining application owns the modelling


process
CRM application owns the campaign
execution process
Goals:

minimize pain involved with using models in


campaigns
score records only when and where necessary

Integrating Mining & CRM

Step 1:

analytic user creates model using mining system


model is then exported into campaign
management system

Step 2:

Marketing user creates campaign that includes


predictive models
when campaign executes, data mining engine
scores customers dynamically

Benefits of Integration

Pre-generated model selection


Score defined segments on the fly

eliminates need to score entire database


improve efficiency of campaigns

Reduces manual intervention and error


Accelerates the market cycle

increases likelihood of reaching customers before


competitors
improves campaign results and lower costs

Summary

Using the new media of the one-to-one


future, you will be able to communicate
directly with customers individually.. Don Peppers & Martha Rogers (One-to-One
Future)
What are you afraid of?..Even if youre
not afraid of these things, the beauty is,with
proper marketing, we can make you afraid-Michael Saylor, CEO Microstrategy.

Das könnte Ihnen auch gefallen