Beruflich Dokumente
Kultur Dokumente
”
The following is intended to outline our general product direction. It
is intended for information purposes only, and may not be
incorporated into any contract. It is not a commitment to deliver any
material, code, or functionality, and should not be relied upon in
making purchasing decision. The development, release, and timing
of any features or functionality described for Oracle’s products
remains at the sole discretion of Oracle.
Copyright © 2007 Oracle Corporation
Oracle 11g DB
<Insert Picture Here>
Data Warehousing Oracle Data Mining
ETL
OLAP Statistics
Data Mining
Charlie Berger
Sr. Director Product Management,
Data Mining Technologies & Life Sciences & Healthcare Industries
Oracle Corporation
charlie.berger@oracle.com
Model
Input Attributes Target
Historic Data Buy Product?
Functional
Name Income Age . . . . . . . 1 =Yes, 0 =No Relationship:
Jones 30,000 30 1
Smith
Y = F(X1, X2, …, Xm)
55,000 67 1
Lee 25,000 23 0
Cases Rogers 50,000 44 0
New Data
Campos 40,500 52 ? 1 .85
Hornick 37,000 73 ? 0 .74
Habers 57,200 32 ? 0 .93
Berger 95,600 34 ? 1 .65
Prediction Confidence
Segment #3:
IF CUST_MO > 7 AND
INCOME < $175K, THEN
Prediction = Cell Phone
Churner, Confidence =
83%, Support = 6/39
Income
Insight
Segment #1:
IF CUST_MO > 14 AND
INCOME < $90K, THEN
Prediction = Cell Phone
Churner, Confidence =
100%, Support = 8/39
Customer Months
Data Mining
http://download.101com.com/pub/tdwi/Files/PA_Report_Q107_F.pdf
Copyright © 2007 Oracle Corporation
metagroup.com
Copyright © 2004
META Group, Inc.
All rights reserved.
METAspectrum 60.1
IRS
– Detecting taxpayer noncompliance
"With over 3,500 submissions per month for payment (or... about
$200 million of legal invoices per month..), we at Stuart Maue, (a St.
Louis based legal services firm and winner of the 2006 DM Review
Business Intelligence award), sought to automate and improve our
review, categorization and investigation of possibly non-compliant
legal submissions process. Besides this being a very labor
intensive process, spotting potential fraudulent or erroneous
submissions for payment can mean millions of dollars of savings to
Stuart Maue and our clients. Oracle Data Mining allows us to mine
our structured and unstructured Oracle-based data, automate the
process, respect security schemes, scale to large volumes, and
most importantly, saves us time and money."
"The Oracle Data Mining and Oracle's SQL statistical functions have
enabled us to take patient segmentation studies to a new, more
granular level. Healthcare research involves notoriously complex
data. The number of possible variables is essentially unlimited. The
number of encounters of our population with healthcare providers is
voluminous. Oracle Data Mining working with an Oracle database
handles both with aplomb. A rich feature set, the ability to control
whether false positives or false negatives are to be avoided, and the
easy ability to apply models to new datasets make ODM an important
tool in our arsenal. Given that many companies manage their data in
Oracle, it is now easy to also analyze the data there too.”
-- K.C. Cerny, Managing Partner. Management Information Analysis,
www.mia-consulting.com
OLAP Statistics
Data Mining
ETL
Data Mining
• Logistic Regression
• Multiple Regression
• Simplify and automate data mining
• Automatic Data Preparation for each model
• Embedded + Automatic Data Prep + ODM Model Æ “Super Models”
• Predictive Analytics (fully automated data mining)
• PROFILE PL/SQL procedure
• Easier administration
• Data Mining models attain additional 1st class Database object
characteristics – privileges, catalog views, etc.
• Improved security for model access, use and tracking
• Java API (JSR-73) for Oracle Data Mining 11g
BEGIN
DBMS_PREDICTIVE_ANALYTICS.PROFILE(
data_table_name => 'customers',
explain_column_name => 'affinity_card',
result_table_name => 'explain_res');
END;
•PROFILE finds
targeted
customer
segments, size
and confidence
and “rules”
Statistics
SQL Statistical
Data Mining Functions
Attribute Importance
• Identify most influential attributes
for a target attribute
• Factors associated with high costs,
responding to an offer, etc. A1 A2 A3 A4 A5 A6 A7
Classification and Prediction Income
Association Rules
• Find co-occurring items in a market basket
• Suggest product combinations
• Design better item placement on shelves
Feature Extraction
• Reduce a large dataset into representative
new attributes
• Useful for clustering and text mining
F1 F2 F3 F4
Text Mining
• Combine data and text for better models
• Add unstructured text e.g. physician’s notes to
structured data e.g. age, weight, height, etc., to
predict outcomes
• Classify and cluster documents
• Combined with Oracle Text to develop
advanced text mining applications e.g. Medline
http://download-west.oracle.com/docs/cd/B19306_01/server.102/b14200/functions157.htm
"..Our experience suggests that Oracle 10g Statistics and Data Mining
features can reduce development effort of analytical systems by an
order of magnitude."
Sumeet Muju
Senior Member of Professional Staff, SRA International (SRA supports NIH bioinformatics
development projects)
Data Warehousing
ETL
<Insert Picture Here> Oracle Data Mining
OLAP Statistics
D E M O N S T R A T I O N
Data Mining
Oracle Data
Mining’s
Activity
Guides
simplify &
automate
data mining
for business
users
Apply model
viewers
Additional model
evaluation viewers
Likelihood to buy
Oracle
• Enables Excel
users to “mine”
Oracle or Excel
data using “one
click” Predict and
Explain predictive
analytics features
• Users select a table
or view, or point to
data in Excel, and
select a target
attribute
Data Warehousing
ETL
<Insert Picture Here> Oracle Data Miner
OLAP Statistics
Code Generation
Data Mining
• PL/SQL code
generation for
Mining Activities
Data Warehousing
ETL
In-Database Analytics
<Insert Picture Here>
OLAP Statistics
SQL Examples
Data Mining
SELECT * from(
SELECT A.CUST_ID, A.MARITAL_STATUS,
PREDICTION_PROBABILITY(CD_BUYERS19644_DT, 1
USING A.*) prob
FROM CBERGER.CD_BUYERS A)
WHERE prob > 0.6;
BEGIN
DBMS_PREDICTIVE_ANALYTICS.EXPLAIN(
data_table_name => 'CD_BUYERS',
explain_column_name => 'CD_BUYER',
result_table_name => 'explain_result41');
END;
/
SELECT * FROM explain_result41;
DECLARE
v_accuracy NUMBER(10,9);
BEGIN
DBMS_PREDICTIVE_ANALYTICS.PREDICT (
ACCURACY => v_accuracy,
DATA_TABLE_NAME => 'CD_BUYERS',
CASE_ID_COLUMN_NAME => 'CUST_ID',
TARGET_COLUMN_NAME => 'CD_BUYER',
RESULT_TABLE_NAME => 'predict_result28');
select cust_name,
prediction(campaign_model using *)
as responder,
prediction_details(campaign_model using *)
as reason
from customers;
select cust_name,
prediction(campaign_model using *) as
responder,
sum(case when purchase_date < 15-Apr-2005 then
purchase_amt else 0 end) as pre_purch,
sum(case when purchase_date >= 15-Apr-2005
then
purchase_amt else 0 end) as post_purch
from customers, sales
where sales.cust_id = customers.cust_id
and purchase_date between 15-Jan-2005 and 14-Jul-
2005
group by cust_id, prediction(campaign_model using *);
select cust_name,
prediction(campaign_model using *) as responder,
sum(case when purchase_date < 15-Apr-2005 then
purchase_amt else 0 end) as pre_purch,
sum(case when purchase_date >= 15-Apr-2005 then
purchase_amt else 0 end) as post_purch
from customers, sales, products@PRODDB
where sales.cust_id = customers.cust_id
and purchase_date between 15-Jan-2005 and 14-Jul-2005
and sales.prod_id = products.prod_id
and contains(prod_description, ‘DVD’) > 0
group by cust_id, prediction(campaign_model using *);
Data Warehousing
Partners
ETL
<Insert Picture Here>
OLAP Statistics
Data Mining
InforSenseService
Deployment
SAS free analytics: leverage Oracle analytics Integrative analytics: unified analytical environment
SQL free analytics: drag-drop application build Automated analytics: deploy to Oracle Portal and BPEL
Visual analytics: interactive visualisation
http://search.oracle.com
oracle data mining