Sie sind auf Seite 1von 79

“This presentation is for informational purposes only and may not be incorporated into a contract or agreement.


The following is intended to outline our general product direction. It
is intended for information purposes only, and may not be
incorporated into any contract. It is not a commitment to deliver any
material, code, or functionality, and should not be relied upon in
making purchasing decision. The development, release, and timing
of any features or functionality described for Oracle’s products
remains at the sole discretion of Oracle.
Copyright © 2007 Oracle Corporation
Oracle 11g DB
<Insert Picture Here>
Data Warehousing Oracle Data Mining
ETL

OLAP Statistics

Data Mining

Charlie Berger
Sr. Director Product Management,
Data Mining Technologies & Life Sciences & Healthcare Industries
Oracle Corporation
charlie.berger@oracle.com

Copyright © 2007 Oracle Corporation


“The following is intended to outline our general product
direction. It is intended for information purposes only, and
may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality,
and should not be relied upon in making purchasing
decision. The development, release, and timing of any
features or functionality described for Oracle’s products
remains at the sole discretion of Oracle.”

Copyright © 2007 Oracle Corporation


What is Data Mining?
• Automatically sifts through data to
find hidden patterns, discover new insights,
and make predictions
• Data Mining can provide valuable results:
• Identify factors more associated with a business
problem (Attribute Importance)
• Predict customer behavior (Classification)
• Predict or estimate a value (Regression)
• Find profiles of targeted people or items (Decision Trees)
• Segment a population (Clustering)
• Determine important relationships and “market baskets”
within the population (Associations)
• Find fraudulent or “rare events” (Anomaly Detection)

Copyright © 2007 Oracle Corporation


Business Intelligence & Analytics
Query
and Reporting OLAP Data Mining
Extraction of Summaries, Knowledge discovery
detailed and trends and of hidden patterns
roll up data forecasts

“Information” “Analysis” “Insight & Prediction”

Who purchased What is the Who will buy a mutual


mutual funds in average fund in the next 6
the last 3 years? income of months and why?
mutual fund
buyers, by
region, by year?

Copyright © 2007 Oracle Corporation


Example Data Mining Applications
Financial Services Database Marketing
– Combat attrition (churn) – Buy product x
– Fraud detection – More targeted & successful
– Loan default (Basel II) campaigns
– Identify selling opportunities – Identify cross-sell & up-sell
opportunities
Telecommunications Insurance, Government
– Identify customers likely to leave – Flag accounting anomalies
Target highest lifetime value (Sarbanes-Oxley)
customers – Reduce cost of investigating
– Identify cross-sell opportunities suspicious activity or false claims

Retail Life Sciences


– Loyalty programs – Find factors associated with
– Cross-sell healthy or unhealthy patients
– Discover gene and protein targets
– Market-basket analysis
– Identify leads for new drugs
– Fraud detection

Copyright © 2007 Oracle Corporation


Data Mining
Overview (Classification)

Model
Input Attributes Target
Historic Data Buy Product?
Functional
Name Income Age . . . . . . . 1 =Yes, 0 =No Relationship:
Jones 30,000 30 1
Smith
Y = F(X1, X2, …, Xm)
55,000 67 1
Lee 25,000 23 0
Cases Rogers 50,000 44 0
New Data
Campos 40,500 52 ? 1 .85
Hornick 37,000 73 ? 0 .74
Habers 57,200 32 ? 0 .93
Berger 95,600 34 ? 1 .65

Prediction Confidence

Copyright © 2007 Oracle Corporation


Data Mining Provides
Better Information, Valuable Insights and Predictions
Cell Phone Churners vs. Loyal Customers

Segment #3:
IF CUST_MO > 7 AND
INCOME < $175K, THEN
Prediction = Cell Phone
Churner, Confidence =
83%, Support = 6/39
Income

Insight
Segment #1:
IF CUST_MO > 14 AND
INCOME < $90K, THEN
Prediction = Cell Phone
Churner, Confidence =
100%, Support = 8/39

Customer Months

Copyright © 2007 Oracle Corporation


Example
Simple, Predictive SQL

• Select customers who are more than 60% likely to


purchase a 6 month CD and display their marital status

Copyright © 2007 Oracle Corporation


In-Database Data Mining
Advantages Oracle 11g DB
• ODM architecture provides greater Data Warehousing

performance, scalability, and data security ETL

• Data remains in the database at all OLAP Statistics

times…with appropriate access security Data Mining

control mechanisms—fewer moving parts


• Straightforward inclusion within interesting
and arbitrarily complex queries
• Real-world scalability—available for mission critical appls
• Enables pipelining of results without costly materialization
• Performant and scalable:
• Fast scoring:
• 2.5 million records scored in 6 seconds on a single CPU system
• Real-time scoring:
• 100 models on a single CPU: 0.085 seconds

Copyright © 2007 Oracle Corporation


Oracle Data Mining 11g
Oracle in-Database Mining Engine
Oracle 11g
11g DB
• Data Mining Functions (Server) Data Warehousing

• PL/SQL & Java APIs ETL

• Develop & deploy predictive analytics applications OLAP Statistics

Data Mining

• Wide range of DM algorithms


• Classification & regression
• Clustering
• Anomaly detection
• Attribute importance
• Feature extraction (NMF)
• Association rules (Market Basket analysis)
• Structured & unstructured data (text mining)
• Oracle Data Miner (GUI)
• Simplified, guided data mining using wizards
• Predictive Analytics
• “1-click data mining” from a spreadsheet

Copyright © 2007 Oracle Corporation


11g Statistics & SQL Analytics
FREE (Included in Oracle SE & EE)

• Ranking functions • Descriptive Statistics


• rank, dense_rank, cume_dist, percent_rank, ntile • average, standard deviation, variance, min, max, median
(via percentile_count), mode, group-by & roll-up
• Window Aggregate functions • DBMS_STAT_FUNCS: summarizes numerical columns
(moving and cumulative) of a table and returns count, min, max, range, mean,
• Avg, sum, min, max, count, variance, stddev, stats_mode, variance, standard deviation, median,
first_value, last_value quantile values, +/- n sigma values, top/bottom 5 values

• LAG/LEAD functions • Correlations


• Direct inter-row reference using offsets • Pearson’s correlation coefficients, Spearman's and
Kendall's (both nonparametric).
• Reporting Aggregate functions
• Sum, avg, min, max, variance, stddev, count, • Cross Tabs
ratio_to_report • Enhanced with % statistics: chi squared, phi coefficient,
Cramer's V, contingency coefficient, Cohen's kappa
• Statistical Aggregates
• Correlation, linear regression family, covariance • Hypothesis Testing
• Student t-test , F-test, Binomial test, Wilcoxon Signed
• Linear regression Ranks test, Chi-square, Mann Whitney test, Kolmogorov-
• Fitting of an ordinary-least-squares regression line Smirnov test, One-way ANOVA
to a set of number pairs.
• Frequently combined with the COVAR_POP,
• Distribution Fitting
COVAR_SAMP, and CORR functions. • Kolmogorov-Smirnov Test, Anderson-Darling Test, Chi-
Squared Test, Normal, Uniform, Weibull, Exponential

Note: Statistics and SQL Analytics are included in Oracle


Database Standard Edition

Copyright © 2007 Oracle Corporation


Industry Analysts
Oracle data mining: not only good but affordable too!
Published: 30th November, 2006, Bloor Research 2007
“…the Oracle data mining option is one of the great bargains
available today because it is affordable and … is a real Rolls Royce
of capability and features. …redesigned from scratch and put the
algorithms into the database to ensure that, not only is the execution
of the algorithms efficient, but the vast amounts of data handling
that typifies traditional datamining is minimised. … Oracle are
leaving the database in situ and mining it there, which saves a lot of
effort and will greatly increase productivity.
This is a fully featured, highly sophisticated data mining capability to
enable professionals to operate against Oracle data sets with
productivity and precision. Oracle data mining has a broad range of
available algorithms, which enable it to undertake virtually every
kind of business and scientific analysis that one can think of. …
…Oracle are giving the data mining professional a real alternative to
SAS and SPSS with an offering that is equally as well featured, but
which promises to outperform any standalone offering.

Copyright © 2007 Oracle Corporation


Industry Analysts
PREDICTIVE ANALYTICS: Extending the Value of Your
Data Warehousing Investment, By Wayne W. Eckerson

“…According to our survey, most organizations plan to significantly


increase the analytic processing within a data warehouse
database in the next three years, particularly for model building
and scoring, which show 88% climbs. The amount of data
preparation done in databases will only climb 36% in that time, but it will
be done by almost two-thirds of all organizations (60%)—double the
rate of companies planning to use the database to create or score
analytical models.”
“…it’s surprising that about one-third of organizations plan to build
analytical models in databases within three years.”
“‘We leverage the data warehouse database when possible,’ says
one analytics manager. He says most analysts download a data sample
to their desktop and then upload it to the data warehouse once it’s
completed. ‘Ultimately, however, everything will run in the data
warehouse,’ the manager says.”

http://download.101com.com/pub/tdwi/Files/PA_Report_Q107_F.pdf
Copyright © 2007 Oracle Corporation
metagroup.com
Copyright © 2004
META Group, Inc.
All rights reserved.
METAspectrum 60.1

Copyright © 2007 Oracle Corporation


Customer References
"...Because data mining algorithms and the data are
housed together in the Oracle database, we don't have to
move huge data sets to external programs to run the
algorithms and learn something about our data…The fact
that it cost about 75 percent less than the leading
competitor didn't hurt either… "
--Tracy E. Thieret, Ph.D. Principal Scientist Xerox Innovation Group Imaging and
Solutions Technology Center

Walter Reed Medical Center


“…Using … Oracle Data Mining, medical researchers are
discovering trends and patterns that will improve the health
care for millions of people around the globe.”
--Dr. Carolyn Hamm, Director of Decision Support, Walter Reed Medical Center.
“Saving Lives with Oracle”

IRS
– Detecting taxpayer noncompliance

Copyright © 2007 Oracle Corporation


• With 200 million retail customers including 146 million credit card
accounts, Citigroup's Global Consumer Group (GCG) is among the
very largest consumer franchises in the world. Like its financial
service peers, Citigroup is seeking to better understand and serve
customers across portfolios of products and services.
Built For A Purpose, From BI Review Magazine January 2006 Issue, By
Jim Ericson

Copyright © 2007 Oracle Corporation


Customer References

"With over 3,500 submissions per month for payment (or... about
$200 million of legal invoices per month..), we at Stuart Maue, (a St.
Louis based legal services firm and winner of the 2006 DM Review
Business Intelligence award), sought to automate and improve our
review, categorization and investigation of possibly non-compliant
legal submissions process. Besides this being a very labor
intensive process, spotting potential fraudulent or erroneous
submissions for payment can mean millions of dollars of savings to
Stuart Maue and our clients. Oracle Data Mining allows us to mine
our structured and unstructured Oracle-based data, automate the
process, respect security schemes, scale to large volumes, and
most importantly, saves us time and money."

-- Bradley Maue, CIO Stuart Maue, Inc.

Copyright © 2007 Oracle Corporation


Customer References

"Keeping one's data in the warehouse/mart, and not having to


extract data for modeling can greatly simplify model development,
model scoring and general security issues. This capability is
relatively new for most data mining tools. I've been impressed with
the data mining functionality that Oracle has recently built into 10g.
They even have a nice GUI that can be used to access all the
algorithms, as well as to manipulate the output (all the usual stuff
you'd expect from a data mining tool: interactive lift curves, etc.).
The integration of data mining the database doesn't get any easier
than doing the mining right in the database. And in terms of
functionality and total cost of ownership, Oracle Data Miner is a
great option, especially for companies that are already storing their
data in Oracle warehouses/marts.”
-- Karl Rexer, Ph.D., Rexer Analytics www.RexerAnalytics.com

Copyright © 2007 Oracle Corporation


Customer References

"The Oracle Data Mining and Oracle's SQL statistical functions have
enabled us to take patient segmentation studies to a new, more
granular level. Healthcare research involves notoriously complex
data. The number of possible variables is essentially unlimited. The
number of encounters of our population with healthcare providers is
voluminous. Oracle Data Mining working with an Oracle database
handles both with aplomb. A rich feature set, the ability to control
whether false positives or false negatives are to be avoided, and the
easy ability to apply models to new datasets make ODM an important
tool in our arsenal. Given that many companies manage their data in
Oracle, it is now easy to also analyze the data there too.”
-- K.C. Cerny, Managing Partner. Management Information Analysis,
www.mia-consulting.com

Copyright © 2007 Oracle Corporation


Analytics vs.
1. In-Database Analytics Engine 1. External Analytical Engine
Basic Statistics (Free) Basic Statistics
Data Mining Data Mining
Oracle 11g DB
Text Mining Text Mining (separate: SAS EM for Text)
Data Warehousing
Advanced Statistics
ETL

2. Development OLAP Statistics 2. Development


Platform Data Mining Platform
Java (standard) SAS Code (proprietary)
SQL (standard)
J2EE (standard)
3. Costs (ODM: $20K cpu) 3. Costs (SAS EM: $150K/5 users)
Simplified environment Annual Renewal Fee
(~40% each year)
Single server
Security

Copyright © 2007 Oracle Corporation


Oracle 11g DB Oracle Data Mining
<Insert
DataPicture
Warehousing Here>
Release 11g
New Features
ETL

OLAP Statistics

Data Mining

Copyright © 2007 Oracle Corporation


Oracle Data Mining Oracle 11g DB

11g Server New Features


Data Warehousing

ETL

• General Linear Models OLAP Statistics

Data Mining
• Logistic Regression
• Multiple Regression
• Simplify and automate data mining
• Automatic Data Preparation for each model
• Embedded + Automatic Data Prep + ODM Model Æ “Super Models”
• Predictive Analytics (fully automated data mining)
• PROFILE PL/SQL procedure
• Easier administration
• Data Mining models attain additional 1st class Database object
characteristics – privileges, catalog views, etc.
• Improved security for model access, use and tracking
• Java API (JSR-73) for Oracle Data Mining 11g

Copyright © 2007 Oracle Corporation


Oracle Data Miner
11g GUI New Features
• Improved ease of use
• Project folders to organize data + models
• More graphs (x-y scatter, histograms with group-by) for data exploration
• More & improved model results viewers e.g Tree display
• General Linear Models
• Logistic regression
• Multiple regression
• Simplify and automate data mining
• Automatic data preparation
• Ability to rapidly build multiple models
• Support for multi-model comparison
• PROFILE Spreadsheet Add-in
• Accelerate model deployment and application development
• Automatically generate PL/SQL model code
• Embedded + Automatic Data Prep + ODM Model Æ “Super Models”

Copyright © 2007 Oracle Corporation


PROFILE

• Find the different customer profiles that explain why a


customer is likely (or not) to respond to an affinity card
promotion…

BEGIN
DBMS_PREDICTIVE_ANALYTICS.PROFILE(
data_table_name => 'customers',
explain_column_name => 'affinity_card',
result_table_name => 'explain_res');
END;

Copyright © 2007 Oracle Corporation


PROFILE

•PROFILE finds
targeted
customer
segments, size
and confidence
and “rules”

Copyright © 2007 Oracle Corporation


•PROFILE
alternative pie
chart display

Copyright © 2007 Oracle Corporation


Oracle Data Mining
Oracle 11g DB
Algorithms
<Insert Picture Here>
Data Warehousing &
OLAP
ETL

Statistics
SQL Statistical
Data Mining Functions

Copyright © 2007 Oracle Corporation


Oracle Data Mining
Algorithm Summary 11g

Problem Algorithm Applicability


Classification Logistic Regression (GLM) Classical statistical technique
Decision Trees Popular / Rules / transparency
Naïve Bayes Embedded app
Support Vector Machine Wide / narrow data / text

Regression Multiple Regression (GLM) Classical statistical technique


Support Vector Machine Wide / narrow data / text

Anomaly Detection One Class SVM Lack examples


Attribute reduction
Attribute Importance Minimum Description
Identify useful data
Length (MDL) Reduce data noise
Market basket analysis
Association Rules Apriori Link analysis

Clustering Hierarchical K-Means Product grouping


Text mining
Hierarchical O-Cluster Gene and protein analysis
Text analysis
Feature Extraction NMF Feature reduction

Copyright © 2007 Oracle Corporation


Oracle Data Mining 10gR2
Decision Trees
• Classification, Prediction, Patient “profiling”
Age Simple DM model:
Can include
>45 <45 unstructured data (e.g.
text comments),
transactions data (e.g.
Status Age purchases), etc.

No Infection Infection >35 <=35

Temp Gender Days ICU


<100 >100 F M >4 <=4

Risk = 0 Risk = 1 Risk = 0 Risk = 1 Risk = 0 Risk = 1

IF (Age > 45 AND Status = Infection AND Temp = >100)


THEN P(High Risk=1) = .77 Support = 250

Copyright © 2007 Oracle Corporation


Oracle Data Mining 11g
Anomaly Detection
Problem: Detect rare cases

• “One-Class” SVM Models


• Fraud, noncompliance
• Outlier detection
• Network intrusion detection
• Disease outbreaks
• Rare events, true novelty

Copyright © 2007 Oracle Corporation


Oracle Data Mining
Algorithms & Example Applications

Attribute Importance
• Identify most influential attributes
for a target attribute
• Factors associated with high costs,
responding to an offer, etc. A1 A2 A3 A4 A5 A6 A7
Classification and Prediction Income

• Predict customers most likely to: >$50K <=$50K


Gender Age
• Respond to a campaign or offer
M F >35 <=35
• Incur the highest costs Status Gender HH Size

• Target your best customers Married Single F M >4 <=4

Buy = 0 Buy = 1 Buy = 0 Buy = 1 Buy = 0 Buy = 1


• Develop customer profiles
Regression
• Predict a numeric value
• Predict a purchase amount or cost
• Predict the value of a home

Copyright © 2007 Oracle Corporation


Oracle Data Mining
Algorithms & Example Applications
Clustering
• Find naturally occurring groups
• Market segmentation
• Find disease subgroups
• Distinguish normal from non-normal behavior

Association Rules
• Find co-occurring items in a market basket
• Suggest product combinations
• Design better item placement on shelves

Feature Extraction
• Reduce a large dataset into representative
new attributes
• Useful for clustering and text mining
F1 F2 F3 F4

Copyright © 2007 Oracle Corporation


Oracle Data Mining
Algorithms & Example Applications

Text Mining
• Combine data and text for better models
• Add unstructured text e.g. physician’s notes to
structured data e.g. age, weight, height, etc., to
predict outcomes
• Classify and cluster documents
• Combined with Oracle Text to develop
advanced text mining applications e.g. Medline

Copyright © 2007 Oracle Corporation


Copyright © 2007 Oracle Corporation
Descriptive Statistics
• MEDIAN & MODE
• Median: takes numeric or datetype values and returns the middle
> SQL value
• Mode: returns the most common value

A. SELECT STATS_MODE(EDUCATION) from CD_BUYERS;

B. SELECT MEDIAN(ANNUAL_INCOME) from CD_BUYERS;

C. SELECT EDUCATION, MEDIAN(ANNUAL_INCOME) from


CD_BUYERS GROUP BY EDUCATION;

D. SELECT EDUCATION, MEDIAN(ANNUAL_INCOME) from


CD_BUYERS GROUP BY EDUCATION ORDER BY
MEDIAN(ANNUAL_INCOME) ASC;

Copyright © 2007 Oracle Corporation


One-Sample T-Test
STATS_T_TEST_*
The t-test functions are:
STATS_T_TEST_ONE: A one-sample t-test
STATS_T_TEST_PAIRED: A two-sample, paired t-test (also known as
a crossed t-test)
STATS_T_TEST_INDEP: A t-test of two independent groups with the
same variance (pooled variances)
STATS_T_TEST_INDEPU: A t-test of two independent groups with
unequal variance (unpooled variances)

http://download-west.oracle.com/docs/cd/B19306_01/server.102/b14200/functions157.htm

Copyright © 2007 Oracle Corporation


Independent Samples T-Test
Query compares
the mean of
AMOUNT_SOLD
between MEN and
WOMEN within
CUST_INCOME_L
EVEL ranges Comparison results grouped
by gender with statistical
level of significance

Copyright © 2007 Oracle Corporation


Customer Example

"..Our experience suggests that Oracle 10g Statistics and Data Mining
features can reduce development effort of analytical systems by an
order of magnitude."
Sumeet Muju
Senior Member of Professional Staff, SRA International (SRA supports NIH bioinformatics
development projects)

Copyright © 2007 Oracle Corporation


Oracle 11g DB

Data Warehousing

ETL
<Insert Picture Here> Oracle Data Mining
OLAP Statistics
D E M O N S T R A T I O N

Data Mining

Copyright © 2007 Oracle Corporation


Oracle Data Mining Oracle Data Mining provides
summary statistical information
prior to data mining

Copyright © 2007 Oracle Corporation


Oracle Data Mining
Oracle Data Mining provides
model performance and
evaluation viewers

Oracle Data
Mining’s
Activity
Guides
simplify &
automate
data mining
for business
users

Copyright © 2007 Oracle Corporation


Oracle Data Mining

Apply model
viewers

Additional model
evaluation viewers

Copyright © 2007 Oracle Corporation


Integration with Oracle BI EE

Oracle Data Mining results


available to Oracle BI EE
administrators
Oracle BI EE defines
results for end user
presentation

Copyright © 2007 Oracle Corporation


Integration with Oracle BI EE

Likelihood to buy

Oracle

Create Customers Categories

Copyright © 2007 Oracle Corporation


Integration with Oracle BI EE ODM provides likelihood
of fraud and other
important questions.

Copyright © 2007 Oracle Corporation


Spreadsheet Add-In for Predictive Analytics

• Enables Excel
users to “mine”
Oracle or Excel
data using “one
click” Predict and
Explain predictive
analytics features
• Users select a table
or view, or point to
data in Excel, and
select a target
attribute

Copyright © 2007 Oracle Corporation


Oracle Data Mining & Oracle Text
• Oracle Data Mining
mines “text” to build
classification and
clustering models
• Oracle Text
(included in Oracle Database
Standard Edition)
preprocesses
unstructured text
• Handles large
volumes of
“documents” or text

Copyright © 2007 Oracle Corporation


Oracle 11g DB

Data Warehousing

ETL
<Insert Picture Here> Oracle Data Miner
OLAP Statistics
Code Generation
Data Mining

Copyright © 2007 Oracle Corporation


Oracle Data Miner (gui)
Code Generation

• PL/SQL code
generation for
Mining Activities

Copyright © 2007 Oracle Corporation


Oracle Data Miner (gui)
Code Generation

Copyright © 2007 Oracle Corporation


Oracle Data Miner (gui)
Code Generation

Copyright © 2007 Oracle Corporation


Oracle Data Miner (gui)
Code Generation

Copyright © 2007 Oracle Corporation


Oracle Data Miner (gui)
Code Generation

Copyright © 2007 Oracle Corporation


Oracle 11g DB

Data Warehousing

ETL
In-Database Analytics
<Insert Picture Here>
OLAP Statistics
SQL Examples
Data Mining

Copyright © 2007 Oracle Corporation


Example #1:
Simple, Predictive SQL

• Select customers who are more than 60% likely to


purchase a 6 month CD and display their marital
status

SELECT * from(
SELECT A.CUST_ID, A.MARITAL_STATUS,
PREDICTION_PROBABILITY(CD_BUYERS19644_DT, 1
USING A.*) prob
FROM CBERGER.CD_BUYERS A)
WHERE prob > 0.6;

Copyright © 2007 Oracle Corporation


Example #2
Better Insights & Information
• Select all customers who have a high propensity to attrite (> 80%
chance) and have a customer value rating of more than 90 and
have had a recent conversation with customer service regarding a
Checking Plus account.

SELECT A.cust_name, A.contact_info


FROM customers A
WHERE PREDICTION_PROBABILITY(tree_model,
‘attrite’ USING A.*) > 0.8
AND A.cust_value > 90
AND A.cust_id IN
(SELECT B.cust_id
FROM call_center@HQ_DB B
WHERE B.call_date BETWEEN ’01-Jan-2005’
AND ’30-Jun-2005’
AND CONTAINS(B.notes,‘Checking Plus’, 1)> 0);

Copyright © 2007 Oracle Corporation


Real-time Prediction
with
records as (select
178255 ANNUAL_INCOME,
0 CAPITAL_GAIN,
On-the-fly, single record
83
246
SAVINGS_BALANCE,
AVE_CHECKING_BALANCE,
apply with new data (e.g.
30 AGE, from call center)
'Bach.' EDUCATION,
'SelfENI' WORKCLASS,
'Married' MARITAL_STATUS,
'Sales' OCCUPATION,
'Husband' RELATIONSHIP,
'White' RACE,
'Male' SEX,
70 HOURS_PER_WEEK,
'?' NATIVE_COUNTRY,
98 PAYROLL_DEDUCTION from dual)
select s.prediction prediction, s.probability probability
from (
select PREDICTION_SET(CD_BUYERS76485_DT, 1 USING *) pset
from records) t, TABLE(t.pset) s;

Copyright © 2007 Oracle Corporation


Real-time Prediction Multiple Models
¾ with records as (select
178255 ANNUAL_INCOME,
0 CAPITAL_GAIN,
83 SAVINGS_BALANCE,
246 AVE_CHECKING_BALANCE,
On-the-fly, single record
30 AGE,
'Bach.' EDUCATION,
'SelfENI' WORKCLASS,
apply with multiple
'Married' MARITAL_STATUS,
'Sales' OCCUPATION, models; sort by
'Husband' RELATIONSHIP,
'White' RACE,
'Male' SEX,
expected revenues
70 HOURS_PER_WEEK,
'?' NATIVE_COUNTRY,
98 PAYROLL_DEDUCTION from dual)
select t.*
from (
select 'CAR_MODEL' MODEL, s1.prediction prediction, s1.probability probability,
s1.probability*25000 as expected_revenue from (
select PREDICTION_SET(NBMODEL_JDM, 1 USING *) pset
from records ) t1, TABLE(t1.pset) s1
UNION
select 'MOTOCYCLE_MODEL' MODEL, s2.prediction prediction, s2.probability probability,
s1.probability*2000 as expected_revenue from (
select PREDICTION_SET(ABNMODEL_JDM, 1 USING *) pset
from records ) t2, TABLE(t2.pset) s2
UNION
select 'TRICYCLE_MODEL' MODEL, s3.prediction prediction, s3.probability probability,
s1.probability*50 as expected_revenue from (
select PREDICTION_SET(TREEMODEL_JDM, 1 USING *) pset
from records ) t3, TABLE(t3.pset) s3
UNION
select 'BICYCLE_MODEL' MODEL, s4.prediction prediction, s4.probability probability,
s1.probability*200 as expected_revenue from (
select PREDICTION_SET(SVMCMODEL_JDM, 1 USING *) pset
from records ) t4, TABLE(t4.pset) s4
) t
order by t.expected_revenue desc;

Copyright © 2007 Oracle Corporation


Predictive Analytics: Explain
PL/SQL Package

BEGIN
DBMS_PREDICTIVE_ANALYTICS.EXPLAIN(
data_table_name => 'CD_BUYERS',
explain_column_name => 'CD_BUYER',
result_table_name => 'explain_result41');
END;
/
SELECT * FROM explain_result41;

Copyright © 2007 Oracle Corporation


Predictive Analytics: Predict
PL/SQL Package
SET serveroutput ON

DECLARE
v_accuracy NUMBER(10,9);
BEGIN
DBMS_PREDICTIVE_ANALYTICS.PREDICT (
ACCURACY => v_accuracy,
DATA_TABLE_NAME => 'CD_BUYERS',
CASE_ID_COLUMN_NAME => 'CUST_ID',
TARGET_COLUMN_NAME => 'CD_BUYER',
RESULT_TABLE_NAME => 'predict_result28');

DBMS_OUTPUT.PUT_LINE('Accuracy = ' || v_accuracy);


END;
/

SELECT * FROM predict_result28 WHERE rownum <= 100;

Copyright © 2007 Oracle Corporation


Example #3
Test a Marketing Campaign

• Given a previously built response model


(classification), …predict who will respond to
the campaign, …and why

Copyright © 2007 Oracle Corporation


Example #3
Predict Responders

select cust_name,
prediction(campaign_model using *)
as responder,
prediction_details(campaign_model using *)
as reason
from customers;

Copyright © 2007 Oracle Corporation


Example #3
Combine with Relational Data

• In addition to predicting responders, …find


out how much each customer has spent
for a period of 3 months before and after
the start of the campaign

Copyright © 2007 Oracle Corporation


Example #3
Combine with Relational Data

select cust_name,
prediction(campaign_model using *) as
responder,
sum(case when purchase_date < 15-Apr-2005 then
purchase_amt else 0 end) as pre_purch,
sum(case when purchase_date >= 15-Apr-2005
then
purchase_amt else 0 end) as post_purch
from customers, sales
where sales.cust_id = customers.cust_id
and purchase_date between 15-Jan-2005 and 14-Jul-
2005
group by cust_id, prediction(campaign_model using *);

Copyright © 2007 Oracle Corporation


Example #3
Multi-Domain, Multi-DB data

• In addition to predicting responders, …find


out how much each customer has spent on
DVDs for a period of 3 months before and
after the start of the campaign

Copyright © 2007 Oracle Corporation


Example #3
Multi-Domain, Multi-DB data

select cust_name,
prediction(campaign_model using *) as responder,
sum(case when purchase_date < 15-Apr-2005 then
purchase_amt else 0 end) as pre_purch,
sum(case when purchase_date >= 15-Apr-2005 then
purchase_amt else 0 end) as post_purch
from customers, sales, products@PRODDB
where sales.cust_id = customers.cust_id
and purchase_date between 15-Jan-2005 and 14-Jul-2005
and sales.prod_id = products.prod_id
and contains(prod_description, ‘DVD’) > 0
group by cust_id, prediction(campaign_model using *);

Copyright © 2007 Oracle Corporation


Example #3
Test Effectiveness / Significance

• In addition to predicting responders, find out how


much each customer has spent on DVDs for a
period of 3 months before and after the start of
the campaign, and…
• …Compare the success rate of predicted
responders and non-responders within different
regions and across the company
• Is the success statistically significant?

Copyright © 2007 Oracle Corporation


Example #3
Test Effectiveness / Significance
select responder, cust_region, count(*) as cnt,
sum(post_purch – pre_purch) as tot_increase,
avg(post_purch – pre_purch) as avg_increase,
stats_t_test_paired(pre_purch, post_purch) as
significance
from (
select cust_name,
prediction(campaign_model using *) as responder,
sum(case when purchase_date < 15-Apr-2005 then
purchase_amt else 0 end) as pre_purch,
sum(case when purchase_date >= 15-Apr-2005 then
purchase_amt else 0 end) as post_purch
from customers, sales, products@PRODDB
where sales.cust_id = customers.cust_id
and purchase_date between 15-Jan-2005 and 14-Jul-2005
and sales.prod_id = products.prod_id
and contains(prod_description, ‘DVD’) > 0
group by cust_id, prediction(campaign_model using *) )
group by rollup responder, cust_region order by 4 desc;

Copyright © 2007 Oracle Corporation


Example #3
Launch & Evaluate a Marketing Campaign

1.Given a previously select responder, cust_region, count(*) as cnt,


built response sum(post_purch – pre_purch) as tot_increase,
avg(post_purch – pre_purch) as avg_increase,
model,…predict stats_t_test_paired(pre_purch, post_purch) as
who will respond to significance
a campaign, from (
…and why select cust_name,
prediction(campaign_model using *) as responder,
2.…find out how sum(case when purchase_date < 15-Apr-2005 then
much each purchase_amt else 0 end) as pre_purch,
customer spent 3 sum(case when purchase_date >= 15-Apr-2005 then
purchase_amt else 0 end) as post_purch
months before and
from customers, sales, products@PRODDB
after the campaign where sales.cust_id = customers.cust_id
3.…how much for and purchase_date between 15-Jan-2005 and 14-Jul-2005
just DVDs? and sales.prod_id = products.prod_id
and contains(prod_description, ‘DVD’) > 0
4.Is the success group by cust_id, prediction(campaign_model using *) )
statistically group by rollup responder, cust_region order by 4 desc;
significant?

Copyright © 2007 Oracle Corporation


Oracle 11g DB

Data Warehousing

Partners
ETL
<Insert Picture Here>
OLAP Statistics

Data Mining

Copyright © 2007 Oracle Corporation


SPSS Clementine
• NASDAQ-listed, top 25
software company
• 35+ year heritage in
analytic technologies
• Operations in over 60
countries
• 95+% of FORTUNE
1000 are SPSS customers
• Combine SPSS Clementine
ease of use with ODM
in-Database functionality
& scalability
• Build, store, browse and score
models in the Database for
optimal performance
• For more information :
• SPSS – Mike Bittner, Strategic Alliance Manager, 770.329.3870 or mbittner@spss.com
• Oracle – Alan Manewitz, TBU, (925) 984-9910 or alan.manewitz@oracle.com
• Oracle – Charlie Berger, Product Management (781) 744-0324 or charlie.berger@oracle.com

Copyright © 2007 Oracle Corporation


InforSense -- A Single Optimized Environment for
Real Time Business Analytics within the Database

Oracle Deploy the analytic workflow


Decision Tree as a service embedding to
Model
BPEL, SFA, CRM
Oracle Data
Sources Interact with (visualize) data
at any step in the workflow

InforSenseService

Deployment

Oracle Deploy the analytic workflow


Functionalities: as an Oracle Portal
Data Mining
Preprocess
Statistics
Text
OLAP
Scheduler

SAS free analytics: leverage Oracle analytics Integrative analytics: unified analytical environment
SQL free analytics: drag-drop application build Automated analytics: deploy to Oracle Portal and BPEL
Visual analytics: interactive visualisation

Copyright © 2007 Oracle Corporation


Benefits of Oracle’s Approach
In-Database Analytics Benefit
• Platform for Analytical • Eliminates data movement and
Applications security exposure
• Fastest: DataÆInformation

• Wide range of data mining • Supports most analytical


algorithms & statistical problems
functions
• Runs on multiple platforms • Applications may be developed
and deployed

• Built on Oracle Technology • Grid, RAC, integrated BI,…


• SQL & PL/SQL available
• Leverage existing skills

Copyright © 2007 Oracle Corporation


Q U E S T I O N S
A N S W E R S
More Information:

Oracle Data Mining 11g


•<Insert
oracle.com/technology/products/bi/odm/index.html
Picture Here>
Oracle Statistical Functions
• http://www.oracle.com/technology/products/bi/stats_fns/index.html
Oracle Business Intelligence Solutions
• oracle.com/bi

http://search.oracle.com
oracle data mining

Contact Information: Email: Charlie.berger@oracle.com

Copyright © 2007 Oracle Corporation


“This presentation is for informational purposes only and may not be incorporated into a contract or agreement.”

Das könnte Ihnen auch gefallen