Sie sind auf Seite 1von 85

Data Mining for Decision

M ki d CRM Making and CRM


Yuh-Jzer Joung

Dept. of Information Management
National Taiwan University National Taiwan University
Feb, 2009
2009/2/11
1
IS&A
2009
Outline
Motivations
A li ti Applications
Privacy concerns
T h l i Technology issues
Data warehouse
Case Study---Amazon
2009/2/11 2
IS&A
2009
Data Mining
The process of discovering meaningful new correlations,
patterns, and trends by sifting through large amounts of data p , y g g g
stored in repositories and by using pattern recognition
technologies as well as statistical and mathematical
techniques(TheGartner Group) techniques (The Gartner Group).
Related term: Knowledge Discovery in Databases (KDD)
2009/2/11 3
IS&A
2009
Why Data Mining
Business must turn:
Data into Information
f i i i Information into Action
Action into Value
value
action
information
data
?! data rich, information poor?!
2009/2/11 4
IS&A
2009
Applications: Business and Retail
Customers purchase habits, shopping patterns and trends
Direct-Mail Targeting ec a age g
market segmentation
sales campaigns
tobeelaboratedfurther to be elaborated further
What isthebest placetodisplayitemsfor kids What is the best place to display items for kids,
women, and men? What is the shopping route
design?
2009/2/11 5
IS&A
2009
Applications:
Business Intelligence (BI) g ( )
A set of concepts and methodologies to improve
decision making in business through use of facts or
f t b d t [G t G ] fact-based systems--[Gartner Group]
Collect Data
Prepare &
StoreData
Analyze Data
Extract Data From:
Legacy Systems
ERPs
Data Staging:
Quality Control
Integration & enrichment
Data Visualization:
OLAP
Data Mining ERP s
e-Business Systems
Archives
External sources
Integration & enrichment
Historization
Pre-aggregation
Model transformation
Loading &partition
Data Mining
Management Information Systems (MIS)
Decision Support Systems (DSS)
Executive Information Systems (EIS)
Standard Reporting Loading & partition Standard Reporting
Forecasting, Simulations
2009/2/11 6
IS&A
2009
Applications: Genomic Microarrays
Given microarray data for a number of samples
(patients), canwe (patients), can we
Accurately diagnose the disease?
Predict outcome for given treatment?
R db t t t t? Recommend best treatment?
2009/2/11 7
IS&A
2009
Molecular Biology Overview
Cell
Nucleus
Cell
Chromosome
() ()
Gene
expression
4buildingblocks 4 building blocks
(nucleotides/):
A, C, G, T.
Protein
Graphics courtesy of the National Human Genome Research Institute
Gene (DNA)
Gene (mRNA),
single strand
2009/2/11 8
IS&A
2009
Applications: Assessing Credit Risk
Situation: Person applies for a loan
Task: Should a bank approve the loan?
Banks develop credit models using variety of machine learning
methods.
Mortgage and credit card proliferation are the results of being
bl f ll di if i lik l d f l able to successfully predict if a person is likely to default on a
loan
Standard Life secured 50 million in mortgage revenue through
th f t it d l t t t ff the use of an accurate propensity model to target offers
2009/2/11 9
IS&A
2009
Applications: Finance
Consumer credit rating
Moneylaundering Money laundering
Financial crimes
Accountingfraud Accounting fraud
Enron (2001), AIG (2004),
Risk assessment
2009/2/11 10
IS&A
2009
Applications: Discovering Social
Networks
Mining emails to discover social networks and emergent
communities
Potential applications:
Marketing
Terrorism Terrorism
.
2009/2/11 11
IS&A
2009
Applications: Text/Web Mining
Document classification
Document summary/abstraction Document summary/abstraction
Document search
Patent Mining Patent Mining
Online Ads
GoogleAdWords Google AdWords
Google Analytics
http://joung.im.ntu.edu.tw
2009/2/11 12
IS&A
2009
Applications: Recommendation in e-
commerce
Collaborative Filtering
produce personal recommendations by computing the p p y p g
similarity between your preference and the one of other
people
customers who bought X also bought Y. g g
2009/2/11 13
IS&A
2009
Applications: CRM (Customer
Relationship management) p g )
Understanding customers
Quickly uncover the attributes that define customer behaviors
Profilecustomerstounderstandtheir needsanddesires Profile customers to understand their needs and desires
Results in more relevant and targeted customer communications
For example: predict that a 31-year old single male is likely to respond favorably
to a discounted travel offer every 6 months
Develop targeted offers
Identify propensities to purchase certain products
Maximize campaign results through better targeting
Analyzepast resultstopredict futureresults Analyze past results to predict future results
For example: predict that a 22-year old woman who lives in Taipei is very likely
to purchase a specific new book release
2009/2/11 14
IS&A
2009
CRM (contd.)
Match specific offers to specific individuals
Fine tune messages by marketing channel g y g
Deliver offers based on customer profile
Results in increased campaign ROI
For example: predict that a35 year oldwomanwithtwochildrenis For example: predict that a 35-year old woman with two children is
likely to purchase a new toaster every 2.5 years
Execute real-time campaigns
Assign scores based on behavior
Provide an immediate offer based on customer specifics
Resultsinincreasedresponseandlongtermcustomer value Results in increased response and long term customer value
For example: offer the money market customer on the phone a good rate
on a certificate of deposit, based on their profile
2009/2/11 15
IS&A
2009
Applications: Security and Fraud
Detection
Anti-terrorism?
?
United States National Security
A (NSA) h d t b Agency (NSA) has a database
contains over 1.9T (as of May 2006)
phone-call records in the US. The
dataarecollectedseveral months data are collected several months
before 911 attack.
2009/2/11 16
IS&A
2009
ECHELON
A global network of computers that automatically search
throughmillionsof interceptedmessagesfor pre- through millions of intercepted messages for pre
programmed keywords or fax, telex and e-mail addresses.
2009/2/11 17
Source: http://www.fas.org/irp/program/process/echelon.htm
IS&A
2009
Eagle Eye ?
2009/2/11 18
IS&A
2009
Problems Suitable for Data-Mining
require knowledge-based decisions
haveachangingenvironment have a changing environment
have accessible, sufficient, and relevant data
provideshighpayoff for theright decisions! provides high payoff for the right decisions!
Privacyconsiderationsimportant if personal dataisinvolved Privacy considerations important if personal data is involved
Demo:
2009/2/11 19
IS&A
2009
Some Poll: Successful Data Mining
Applications (July 2005) pp ( y )
Data mining has been
li di applied in many
domains!
Source:
2009/2/11 20
Source:
http://www.kdnuggets.com/polls/2005/successful_data_mining_applications.htm
IS&A
2009
Process Standardization
The data mining process must be reliable and repeatable by people
with little data mining skills g
CRISP-DM (CRoss Industry Standard Process for Data Mining)
provides a uniform framework for guidelines
experience experience
documentation
Initiative launched Sept.1996
SPSS/ISL, NCR, Daimler-Benz, OHRA SPSS/ISL, NCR, Daimler Benz, OHRA
Funding from European commission
Over 200 members of the CRISP-DM SIG worldwide
DM Vendors - SPSS, NCR, IBM, SAS, SGI, DataDistilleries, Syllogic, DM Vendors SPSS, NCR, IBM, SAS, SGI, Data Distilleries, Syllogic,
Magnify, ..
System Suppliers / consultants - Cap Gemini, ICL Retail, Deloitte &
Touche,
End Users - BT, ABB, Lloyds Bank, AirTouch, Experian, ...
CRISP-DM is flexible to account for differences
Different business/agency problems
2009/2/11 21
g yp
Different data
IS&A
2009
Life Cycle of CRISP-DM
Data
Understanding
Business
Understanding
Data
Preparation
data
Modelling
Deployment
Deployment
data
Modelling
Evaluation
2009/2/11 22
IS&A
2009
Phases and Tasks
Business
Understanding
Data
Understanding
Evaluation
Data
Preparation
Modeling
D B
Deployment
Determine Business
Objectives
Background
Business Objectives
Business Success
Criteria
Collect Initial Data
Initial Data Collection
Report
Describe Data
Data Set
Data Set Description
Select Data
Rationale for Inclusion /
Select Modeling
Technique
Modeling Technique
Modeling Assumptions
Evaluate Results
Assessment of Data
Mining Results w.r.t.
Business Success
Criteria
Plan Deployment
Deployment Plan
Plan Monitoring and
Maintenance
Criteria
Situation Assessment
Inventory of Resources
Requirements,
Assumptions, and
C i
Describe Data
Data Description Report
Explore Data
Data Exploration Report
Rationale for Inclusion /
Exclusion
Clean Data
Data Cleaning Report
Generate Test Design
Test Design
Build Model
Parameter Settings
Criteria
Approved Models
Review Process
Review of Process
Maintenance
Monitoring and
Maintenance Plan
Produce Final Report
Final Report
Constraints
Risks and
Contingencies
Terminology
Costs and Benefits
Verify Data Quality
Data Quality Report
Construct Data
Derived Attributes
Generated Records
Integrate Data
Merged Data
g
Models
Model Description
Assess Model
Model Assessment
Revised Parameter
Determine Next Steps
List of Possible Actions
Decision
p
Final Presentation
Review Project
Experience
Documentation
Determine
Data Mining Goal
Data Mining Goals
Data Mining Success
Criteria
Merged Data
Format Data
Reformatted Data
Revised Parameter
Settings
Produce Project Plan
Project Plan
Initial Asessment of
Tools and
Techniques
2009/2/11 23
Techniques
IS&A
2009
Some Techniques
Association Rules
Regression Analysis g y
Neural Networks
Clustering
Cl ifi ti Classification
2009/2/11 24
IS&A
2009
Market Basket Analysis
Retail each customer purchases different set of products,
different quantities, different times different quantities, different times
MBA uses this information to:
Identify who customers are (not by name)
Understand why they make certain purchases
Gain insight about its merchandise (products):
Fast andslowmovers Fast and slow movers
Products which are purchased together
Products which might benefit from promotion
k i Take action:
Store layouts
Which products to put on specials, promote, coupons p p p , p , p
Combining all of this with a customer loyalty card it becomes even more
valuable
2009/2/11 25
IS&A
2009
Association Rule Mining
Association rule mining:
Findingfrequent patterns associations correlations or Finding frequent patterns, associations, correlations, or
causal structures among sets of items or objects in an
information repository.
Application: finding regularities in data
What products were often purchased together? Beer
ddi ?! and diapers?!
What are the subsequent purchases after buying a PC?
What kindsof DNA aresensitivetothisnewdrug? What kinds of DNA are sensitive to this new drug?
Can we automatically classify web documents?
2009/2/11 26
IS&A
2009
Support and Confidence
Transaction-id Items bought
How good is an association rule XY ?
support (a measure of statistical
i ifi )
10 A, B, C
20 A, C
30 A D
significance)
probabilitythat a transaction contains
XY
confidence (goodness of the rule)
30 A, D
40 B, E, F
(g )
conditional probabilitythat a
transaction having X also contains Y.
The goal is to find all the rules XY with
minconfidenceandsupport min confidence and support
Let min support = 50% min conf = 50%:
Customer
buys diaper
Customer
buys both
Let min_support = 50%, min_conf = 50%:
A C (50%, 66.7%)
C A (50% 100%) C A (50%, 100%)
A,BC (25%, 100%) (low support)
Customer
buys beer
2009/2/11 27
IS&A
2009
Regression Analysis ()
Find a function to model a dataset with the least error;
that is, todeterminethefunctiony=F(x1, x2, ) tosee that is, to determine the function y F(x1, x2, ) to see
how variable y depends on variables x1, x2,
What is relation between advertising and sales?
Power demand vs. temperature.
The price of a house, given the size of the, the number of
b d th i i th ti bedrooms, the average income in the respective
neighborhood, and a subjective rating of appeal of the
house?
2009/2/11 28
IS&A
2009
Linear Regression
i i i
e x y
1 0

n i ,..., 1
Simple
i i i
y
1 0

Multiple
i mi m i i i
e x x x y ...
2 2 1 1 0
There are statistics tools (e.g. SAS, SPSS) to help you find out the ee es s cs oos(e.g. S S, S SS) o epyou dou e
parameters (even Excel offerd some), but you often need to choose
an appropriate model.
Model Examination Model Examination
Goodness of fit of the model (e.g., x-square)
Statistical significance of the estimated parameters.
2009/2/11 29
IS&A
2009
Nonlinear Regression
One variable has already numerous types of possibility
i i i i
e x x y
2
2 1 0

i i i i
e x x y log
2 1 0

i i
x
i i
e x x y
i
2
3 2 1 0
2
2009/2/11 30
IS&A
2009
Regression Analysis in Short
It has been applied widely in many domains the relationships
between different factors, and more often the relationships are , p
linear.
Since a relationship has to be chosen before analysis, for
li l ti hi it i it h ll i t fi d nonlinear relationships, it is quite challenging to find an
appropriate one.
Neural network analysis avoidsthisdifficulty; it tellsyou Neural network analysis avoids this difficulty; it tells you
what output might be given an input, without telling you
exactly what the relation is.
2009/2/11 31
IS&A
2009
Neural Networks
Analogy to Biological Systems (Indeed a great example of a
good learning system) g g y )
Massive Parallelism allowing for computational efficiency
The first learning algorithm came in 1959 (Rosenblatt) who
suggested that if a target output value is provided for a single
neuron with fixed inputs, one can incrementally change
weightstolearntoproducetheseoutputsusingtheperceptron weights to learn to produce these outputs using the perceptron
learning rule
2009/2/11 32
IS&A
2009
A Neuron
x
) ( y
n

x w f
w
0
w
1
x
0
x
1
f

) ( y
1 i

i i
x w f
Output
weights
w
n
x
n
f

Inputs
weights
f is typically an identity function, and in this case,

n
1 i
y
i i
x w
2009/2/11 33
IS&A
2009
Multi-Layer
Output nodes
Hidden nodes
Input nodes
2009/2/11 34
IS&A
2009
Example: Loan Appraiser
A l t ki fill d ith i l
2009/2/11 35
A neural network is filled with seemingly
meaningless weights
IS&A
2009
Network Training
The ultimate objective of training
obtainaset of weightsthat makesalmost all thetuplesin obtain a set of weights that makes almost all the tuples in
the training data classified correctly
Steps p
Initialize weights with random values
Feed the input tuples into the network one by one
For each unit
Compute the net input to the unit as a linear combination of all
theinputstotheunit the inputs to the unit
Compute the output value using the activation function
Compute the error
Update the weights and the bias
2009/2/11 36
IS&A
2009
Network Pruning and Rule
Extraction
Network pruning
Fully connected network will be hard to articulate y
N input nodes, h hidden nodes and m output nodes lead to h(m+N)
weights
Pruning: Removesomeof thelinkswithout affectingclassification Pruning: Remove some of the links without affecting classification
accuracy of the network
Extracting rules from a trained network
Discretize activation values; replace individual activation value by the
cluster average maintaining the network accuracy
Enumerate the output from the discretized activation values to find
rules between activation value and output
Find the relationship between the input and activation value
Combinetheabovetwotohaverulesrelatingtheoutput toinput Combine the above two to have rules relating the output to input
2009/2/11 37
IS&A
2009
Pros and Cons
When applied in well-defined domains, their ability to generalize
and learn from data mimicsa humans ability to learn from y
experience.
Neural Nets are good for prediction and estimation when:
Inputsarewell understood Inputs are well understood
Output is well understood
Experience is available for examples to use to trainthe neural net
li i ( ) application (expert system)
Drawback training a neural network results in internal weights
distributed throughout the network, which in turn making it g , g
difficult to understand why a solution is valid
2009/2/11 38
IS&A
2009
Mining Time-Series and Sequence
Data
Time-series database
Consistsof sequencesof valuesor eventschangingwith Consists of sequences of values or events changing with
time
Data is recorded at regular intervals
Characteristic time-series components
Trend, cycle, seasonal, irregular
Applications
Financial: stock price, inflation
i di l bl d Biomedical: blood pressure
Meteorological: precipitatio
2009/2/11 39
IS&A
2009
Evolutionary Computing
Neural Networks Genetic algorithms
Inspired by Darwin's theory about evolution.
Al ith i t t d ith t f l ti ( t db Algorithm is started with a set of solutions (represented by
chromosomes) called population. Solutions from one population
are taken, possibly mutated, to form a new population.
Th l i i i d d i i fi d The new population is examined to determine its fitness, and
used for next iteration if necessary.
2009/2/11 40
IS&A
2009
Clustering
Cluster: a collection of similardata objects
Clustering: classificationwithnopredefinedclasses Clustering: classification with no predefined classes
Typical applications
As a stand-alone tool to get insight into data distribution g g
Marketing and segmentation
Document classification
Social networkanalysis Social network analysis
Genomics analysis
Medicine
A i t f th l ith As a preprocessing step for other algorithms
2009/2/11 41
IS&A
2009
Principles
The basic principle is to identify the items in a feature space
andmeasuretheir relativedistances. and measure their relative distances.
e.g., how to represent a customer? a document?...
Then algorithms can be applied to group close/similar items. g pp g p
k-Means (most commonly used, to be illustrated later)
Locality-sensitive hashing
Graph-theoretic methods

David
John
Adam
Ken
Jane
2009/2/11 42
IS&A
2009
Cluster Quality Measures
Since there is no predefined classes, there is no absolute
solutiontowhichclusteringisright. solution to which clustering is right.
Still, good clustering must exhibit
high intra-class similarity g y
low inter-class similarity
Whichoneisbetter?
2009/2/11 43
Which one is better?
IS&A
2009
Simple Clustering: K-means
1) Pick a number (K) of cluster centers (at random)
2) Assigneveryitemtoitsnearest cluster center (eg using 2) Assign every item to its nearest cluster center (e.g. using
Euclidean distance)
3) Moveeachcluster center tothemeanof itsassigned 3) Move each cluster center to the mean of its assigned
items
4) Repeat steps 2,3 until convergence (change in cluster
assignments less than a threshold)
2009/2/11 44
IS&A
2009
Example: K-means Clustering
9
10
9
10
10
4
5
6
7
8
4
5
6
7
8
4
5
6
7
8
9
Update
0
1
2
3
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
0 1 2 3 4 5 6 7 8 9 10
Assign
each
objects
to most
l
Update
the
cluster
means
8
9
10
8
9
10
K=2
similar
center
reassign reassign
3
4
5
6
7
3
4
5
6
7
Arbitrarily choose K
points as initial
cluster centers
Update
the
0
1
2
0 1 2 3 4 5 6 7 8 9 10
0
1
2
0 1 2 3 4 5 6 7 8 9 10
cluster
means
2009/2/11 45
IS&A
2009
Hierarchical clustering
Bottom up
Start with single-instance clusters
h j i h l At each step, join the two closest
clusters
Design decision: distance between
l t clusters
E.g. two closest instances in clusters
vs. distance between means
Topdown Top down
Start with one universal cluster
Find two clusters
Proceedrecursivelyoneachsubset Proceed recursively on each subset
Can be very fast
Both methods produce a dendrogram
g a c i e d k b j f h
2009/2/11 46
IS&A
2009
Example: Documents Classification
document Keywords/frequency
d
1
Stock(2), IBM(1), HP(2), price(5),
d
2
Computer(3), IBM(1), HP(2),
d
3
Computer(3), Network(2), IBM(2),
3
p ( ), ( ), ( ),
d
4
Datamining(1), decision(3), split(2), Information (4)

Document Representation
d
i
: [c1, c2, c3, ] (count can be normalized or simply using the binary value)
Each document can then be viewed as a point in an N-dimensional space. Clustering
techniquescanthusbeappliedtoclassifydocuments techniques can thus be applied to classify documents.
2009/2/11 47
IS&A
2009
Example: Information Retrieval
Documents
Query
Index
database
Mechanismfor determiningthesimilarity of Mechanism for determining the similarity of
the query to the document.
Set of documents ranked by how
similar they are to the query
So, do you know customers can be classified?
2009/2/11 48
IS&A
2009
Classification
Arranges data into predefined groups
Applications Applications
Document classification
Email classification (legitimate emails vs. spams)
Medical Imaging and Medical Image Analysis
Communities establishment

Techniques
Neural Network
Support vector machines
k-nearest neighbor
Decisiontrees Decision trees
Bayesian networks
Naive Bayes classifier

2009/2/11 49
IS&A
2009
Decision Trees
A structure that can be used to divide up a large
collectionof recordsintosuccessivelysmaller setsof collection of records into successively smaller sets of
records by applying a sequence of simple decision rules
An internal node is a test on an attribute.
A branch represents an outcome of the test, e.g., age30
A leaf node represents a class
A decision tree model consists of a set of rules for
dividing a large heterogeneous population into smaller,
morehomogeneousgroupswithrespect toaparticular more homogeneous groups with respect to a particular
target variable
2009/2/11 50
IS&A
2009
Decision Trees: Example
age income student credit_rating buys_computer
<=30 high no fair no
<=30 high no excellent no
31 40 hi h f i
age?
3140 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31 40 l ll
overcast
<=30
>40 30..40
3140 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
student? credit rating? yes
<=30 medium yes excellent yes
3140 medium no excellent
3140 high yes fair
>40 medium no excellent
no yes
fair
excellent
no no yes yes
2009/2/11 51
IS&A
2009
Tree Construction and Split
At each node, available attributes are evaluated on the basis of
separating the classes of the training examples. A Goodness p g g p
function is used for this purpose.
Typical goodness functions:
Gini (populationdiversity) Gini (population diversity)
Entropy (information gain)
Information Gain Ratio
Chi-squareTest Chi square Test
good split
poor split
2009/2/11 52
poo sp
Data Warehouse
2009/2/11
53
IS&A
2009
Data Warehouse
Data . everywhere, but where is the information?
A data warehouse is asubject-oriented, integrated, time- j , g ,
variant, and nonvolatilecollection of data in support of
managements decision-making process.W. H. Inmon
A copyof transactiondata specificallystructuredfor query A copy of transaction data, specifically structured for query
and analysisRalph Kimball
Data warehousing:
The process of constructing and using data warehouses
2009/2/11 54
IS&A
2009
Data Warehouse
Enterprise

Customers
Orders
Database
Transactions
Et
Vendors Etc
Transactions
Etc
Copied,
organized
summarized
Data
Warehouse
Data Mining
DW allows an organization (enterprise) to remember what it has noticed
about its data
DataMiningtechniquesmakeuseof thedatainaDataWarehouse
2009/2/11 55
Data Mining techniques make use of the data in a Data Warehouse
IS&A
2009
From Tables and Spreadsheets to
Data Cubes
A data warehouse is based on a multidimensional data model which
views data in the form of a data cube
A data cube, such as sales, allows data to be modeled and viewed in
multiple dimensions
Dimensiontables suchasitem(item name brand type) or time(day Dimension tables, such as item (item_name, brand, type), or time(day,
week, month, quarter, year)
Fact table contains measures (such as dollars_sold) and keys to each
of therelateddimensiontables of the related dimension tables
In data warehousing literature, an n-D base cube is called a base
cuboid. The top most 0-D cuboid, which holds the highest-level of
summarization, is called the apex cuboid. The lattice of cuboids
forms a data cube.
2009/2/11 56
IS&A
2009
Cube: A Lattice of Cuboids
0-D(apex) cuboid all
1-D cuboids
time item location supplier
2-D cuboids
time,location item,location location,supplier
time,item
time,supplier
item,supplier
time,location,supplier
3-D cuboids
time,item,location
time,item,supplier
item,location,supplier
4-D(base) cuboid
time, item, location, supplier
2009/2/11 57
IS&A
2009
Multidimensional Data
Sales volume as a function of product, month, and region
Dimensions: Product Location Time Dimensions: Product, Location, Time
Hierarchical summarization paths
I d t R i Y
r
o
d
u
c
t
Industry Region Year
Category Country Quarter
P
r
Product City Month Week
Month
Office Day
2009/2/11 58
IS&A
2009
Another Sample Data Cube
Date Date
sum
TV
VCR
PC
1Qtr 2Qtr 3Qtr 4Qtr
U.S.A
o
u
n
t
r
y
sum
VCR
Canada
C
o
Mexico
sum
2009/2/11 59
IS&A
2009
Data Visualization
-A picture is worth a thousand words
2009/2/11 60
IS&A
2009
OLAP (On-Line Analytical
Processing) g)
OLAP: Interactive, ,
exploratory analysis
of multidimensional
datatodiscover data to discover
patterns
2009/2/11 61
IS&A
2009
Typical OLAP Operations
Roll up (drill-up): summarize data
by climbing up hierarchy or by dimension reduction
Drill down (roll down): reverse of roll-up
from higher level summary to lower level summary or detailed data, or
introducingnewdimensions introducing new dimensions
Slice and dice:
project and select
Pivot (rotate): Pivot (rotate):
reorient the cube, visualization, 3D to series of 2D planes.
Other operations
drill across: involving (across) more than one fact table
drill through: through the bottom level of the cube to its back-end
relational tables (using SQL) ( g Q )
2009/2/11 62
IS&A
2009
Data Mining versus OLAP
OLAP
Pro ides o itha er good ie of hat ishappening b t cannot Provides you with a very good view of what is happening, but can not
predict what will happenin the future or why it is happening
DataMining Data Mining
Forecasting what may happen in the future
Classifying people or things into groups by recognizing patterns
Clusteringpeopleor thingsintogroupsbasedontheir attributes Clustering people or things into groups based on their attributes
Associating what events are likely to occur together
Sequencing what events are likely to lead to later events
2009/2/11 63
IS&A
2009
Multi-Tiered Architecture
Monitor
OLAPServer
Analysis
&
Integrator
Metadata
other
sources
OLAP Server
Data
Warehouse
Extract
Transform
Load
y
Query
Reports
Data mining
Serve
Operational
DBs
Refresh
Data Marts
OLAP Engine
Data Sources
Front-End Tools Data Storage
2009/2/11 64
IS&A
2009
Systems
Oracle: "Business Intelligence" and "Data Mining" tools
OracleDataMining(ODM) embedsextensiveanalytical Oracle Data Mining (ODM) embeds extensive analytical
features into business processes
ODM is an optional component of Oracle 10g Database
Enterprise Edition (EE)
Oracle in Mar. 2007 acquired Hyperion Solutions for $3.3B, a
keycompanyinBusinessIntelligenceandBusinessperformance key company in Business Intelligence and Business performance
management market
Teradata: Data Warehousing & Business Intelligence Solutions
P.s. NCR spinoffed Teradata in Oct. 2007 (a hardware and
software vendor specializing in data warehousing and analytic
applications, acquired by NCR in Feb. 1991). pp , q y )
2009/2/11 65
IS&A
2009
How to Choose a DM/DW System?
System issues
Centralizedor distributed?Web-based? Centralized or distributed? Web-based?
Scalability
SystemPerformance System Performance
Visualization tools
Querylanguagesandinterfaces Query languages and interfaces
2009/2/11 66
Case Study: Amazon.com y
"Earth's Biggest Selection"
Jeff Bezos
Warehouse in Fernley, Nev., 2006
warehouse in Milton Keynes, England, Nov. 2006.
Amazon.com warehouse in New Castle, Del.
Stocks of the novel "Harry Potter and the Order of the
Jeff Bezos
Amazon.com headquarter at Seattle's Beacon Hill
2009/2/11
67
y
Phoenix" at an Amazon warehouse in Milton Keynes,
England, in 2003. (over 1.3M copies sold)
IS&A
2009
Earth's Biggest Selection
Bhutan: A Visual Odyssey y y
Across the Last Himalayan
Kingdom, $30,000
Cabinet Saw
Harry Potter and the Deathly
H ll 2 illi i
sales ranked #1, 20080214
Hallows, over 2 million copies
preordered in 2007.
Segway
2009/2/11 68
And many others
IS&A
2009
New Business
the iPod of reading
$399
Lunched in Aug 2007, and available in
the Seattle area.
2009/2/11
69
IS&A
2009
Online Penetration Rate by Category
in US
Source: http://www.wikinvest.com/stock/Amazon.com_(AMZN), 2009.02.08
2009/2/11 70
IS&A
2009
Some Background
Founded in 1994 by J eff Bezos
Market Cap: $28.52B (2009/02/08)
$16.62B (2007/02/15)
Revenue:$19.17B (2007)
$14.84B (2007)
$10.71B (2006)
$8.49B (2005)
Net income:$645M (2007) ( )
:$476M (2007)
$190M (2006)
$359M (2005) $ ( )
Employees: 20,700 (2009.02)
13,900 (2007)
MainCompetitors: Main Competitors:
Barnes & Noble Inc.
eBay
2009/2/11 71
IS&A
2009
Market Performance in the Financial
Tsunami
Amazon
Dow Jones
Nasdaq
2009/2/11 72
IS&A
2009
Some Background (contd.)
On Nov 21, 2005, Amazon entered the S&P 500 index, replacing
AT&T (merged with SBC Communications).
On Dec 31, 2008, Amazon entered the S&P 100 index, replacing
Merrill Lynch (taken over by Bank of America).
2009/2/11 73
IS&A
2009
Amazon.com: The King of
E-Tailingg
The opportunity
Amazonlaunchedin1995byJ eff Bezosasanonline Amazon launched in 1995 by J eff Bezos as an online
bookstore, but has soon diversified its product lines, adding
The company has continually enhanced its business models
and electronic store by:
expanding product selection
DVDs musicCDs computer software videogames electronics DVDs, music CDs, computer software, video games, electronics,
apparel, furniture, food, toys and more.
Digital video downloading service in Sep. 2006
OnlinestorageserviceinMar 2006 Online storage service in Mar. 2006
Online music store in Sep. 2007 (selling MP3 without DRM)
(currently available only in US)
AmazonFreshinAug 2007 AmazonFreshin Aug. 2007
improving the customers experience
adding services and alliances
2009/2/11 74
recognizing the importance of order fulfillment and warehousing
IS&A
2009
Amazon.com: The King of
E-Tailing (cont.) g ( )
Technology used
Amazoncomhasexpandedinavarietyof directions: Amazon.com has expanded in a variety of directions:
offers specialty stores (professional and technical store)
expands its editorial content through partnerships with experts in
i fi ld certain fields
increases product selection with the (used and out-of-print titles)
expands its offerings beyond books (J une 2002 became an p g y (
authorized dealer of Sony Corp. selling Sony products online)
Books and other media still account for about 59% of its revenue
[2008.01.31]. [ ]
software development centers across the globe.
offers web services [Launched in J uly 2002] for access to its catalog
aswell asfor integrationwithretailerslikeTarget andMarks& as well as for integration with retailers like Target and Marks &
Spencer.
2009/2/11 75
IS&A
2009
Amazon.com: The King of
E-Tailing (cont.) g ( )
Key features of the Amazon.com superstore are:
easy browsing, searching, and ordering y g, g, g
useful product information, reviews, recommendations
broad selection
l i low prices
secure payment systems
efficient order fulfillment
personalization
2009/2/11 76
IS&A
2009
Amazon.com: The King of
E-Tailing (cont.) g ( )
Enjoyable features:
Gift Ideas section features seasonally appropriate gift ideas and y pp p g
services
Community section provides product information and
recommendationssharedbycustomers recommendations shared by customers
E-Cards section, free animated electronic greeting
ability for users to submit reviews to the web page of each
product product

2009/2/11 77
IS&A
2009
Amazon.com: The King of
E-Tailing (cont.) g ( )
Marketplace services (launched in 2001):
allows sellers to offer their goods alongside Amazon's offerings, g g g ,
used or new
Replace its zShops (auction) service lunched in Aug. 1999.
thestrategycreatesaone-stopshoppingdestinationwithaconsistent the strategy creates a onestop shopping destination with a consistent
experience for the customer.
help dramatically increase Amazons product selection and availability
30%of itemssoldonAmazonaresoldbythirdparties(asof 2008)! 30% of items sold on Amazon are sold by third parties (as of 2008)!
main rival:
eBay's Half.com service
P d li i d CD DVD b k d id Products limited to CDs, DVDs, books, and video games.
2009/2/11 78
IS&A
2009
Amazon.com: The King of
E-Tailing (cont.) g ( )
Amazon.com is recognized as an online leader in CRM
informativemarketingfront ends informative marketing front ends
one-to-one advertisements
free posting of restaurant menus from thousands of restaurants
Welcome back, Sarah Shopper with recommendations of
new books from the customers preferred genre based on
previous purchases
Sends purchase recommendations via e-mail to cultivate
repeat buyers
Efficient searchengineandother shoppingaids Efficient search engine and other shopping aids
Customers can personalize their accounts and manage orders
online with the patented One-Click order feature including
anelectronic wallet an electronic wallet
2009/2/11 79
IS&A
2009
Amazon.com: The King of
E-Tailing (cont.) g ( )
In 1997, Amazon.com started an extensive affiliates program
by2003 thecompanyhadmorethan1millionassociatesthat refer by 2003, the company had more than 1 million associates that refer
customers to Amazon.com, accounting for 40% of the sales (that
include third party sellers who list and sell products on the Amazon
websites) websites.)
1.3 million sellers sold products through Amazon's worldwide web sites
in 2007
Amazonpaysa3to5%commissiononanyresultingsale Amazon pays a 3 to 5% commission on any resulting sale
Associates can access the Amazon catalog directly on their websites
by using the Amazon Web Services (AWS) XML service.
C di ll i ll li Carsdirect.com allows it to sell cars online
Drugstore.com connects to health and beauty aids
AT&T, Nextel and others suggest service plans for wireless phones , gg p p
2009/2/11 80
IS&A
2009
Amazon.com: The King of
E-Tailing (cont.) g ( )
The Results
$157millionrevenuein1996 to$600millionin1998 toabout $4billionin $15.7 million revenue in 1996, to $600 million in 1998, to about $4 billion in
2002, to about 8.5 billion in 2005.
In J anuary 2002, Amazon.com declared its first ever profitfor the 2001 fourth
quarter to16billionin2005 quarter, to 1.6 billion in 2005.
In 2003 the site offers over 17 million book, music, and DVD/video titles to some
20 million customers
Off l f t f i t ti l t Offers several features for international customers.
Recent News:
AmazonmakesIT spendingcool[ZDNet 20070725] Amazon makes IT spending cool[ZDNet, 20070725]
In a Well-Worked Pattern, Amazons Revenue Rises and Its Profit Drops[NY
Times, 20070202]
St H lid S Lift A ' R [NY Ti 20080131] Strong Holiday Season Lifts Amazon's Revenue [NY Times, 20080131]
2009/2/11 81
IS&A
2009
Battle in China
J oyo was acquired by Amazon.com in 2004 for $75M, and
became its 7th regional website (US, Canada, France, g ( , , ,
Germany, J apan, and UK).
Market share: 12% (J oyo) vs 18% (Dongdong) (as of 2007).
2009/2/11 82
IS&A
2009
Subsidiaries
p.s. Amazon spent more than half a billion each year in technology (as of 2007).
2009/2/11 83
IS&A
2009
Discussion 1
Visit Amazon.com, exploit its features in CRM, and
summarizethem. Then, select ane-commerceWebsite summarize them. Then, select an ecommerce Web site
(Amazons competitors preferred), examine its features
for CRM, and make a comparison with Amazon.com.
2009/2/11 84
IS&A
2009
Discussion 2
Amazon has spent a lot of money in technology,
particularlyinWebservices(over $2B tobuildthe particularly in Web services (over $2B to build the
infrastructure, technical knowledge, and operational
excellence to operate a world class web-scale computing p p g
platform.)
First, study the technology Amazon has emphasized.
Then, discuss why Amazon needs to aim at this direction?
Which companies are its competitors?
And is this promising?
2009/2/11 85

Das könnte Ihnen auch gefallen