Beruflich Dokumente
Kultur Dokumente
Recommendation Engine at
Oracle OpenWorld Conference
2008
2009
Recommend conference sessions to attendees
Deduction
Query refinement
Users specify what they want to retrieve
Induction
Model-based recommendation engine
Recommend sessions most relevant to attendee profile
Improve likelihood of finding sessions of interest
2008 Data
- Sessions
- Attendees Model
- Attendance
Build
Apply
2009 Data
- Sessions Ranked Session
- Attendees Recommendations
for each Attendee
New attendee registers
and completes survey
Approach – 30,000 ft.
2009 Session
recommendations
filtered by user
Attendee logs into criteria
Schedule Builder
Ranked Sessions
retrieved
Ranked Session
Recommendations
for Attendees
Success Metrics
Conversion rate
% attendees who used at least 1 recommendation
Enrollment vs. actual attendance
Test Metrics
Enrichment curve
Global measure of merrit
Sessions (1850+)
Title, abstract, track(s)
Attendees (41700+)
Survey questions, position, product usage
Attendance (206700+)
Who attended which sessions
Sessions
Concatenate relevant columns to facilitate text mining
Attendance
Remove duplicates
Attendees
Synonyms in attribute values, e.g., state = OH and Ohio
Incomplete data, e.g., region = null
Multi-valued attributes requiring parsing,
e.g., member of user groups separated by „;‟ or „/‟
Map data columns between 2008 and 2009
e.g., Advanced customer services split between Apps and Tech
Free form columns, e.g., job title = Vice President, V.P., VP
Build
Cluster classification
Sessions model to predict
clusters for
attendees, then
score attendees
for each cluster
.86
Vector multiply each
attendee‟s cluster
x =
scores against each .73
session‟s cluster
scores for total
.66
…
order ranking of
recommendations
…
New 2009 Attendee New 2009 Sessions
Cluster Scores Cluster Scores
Vector Vectors
New 2009 Attendees New 2009 Sessions
Model Building and Scoring Details
Cluster sessions
Concatenate all session-related text
Text Mining data preparation – create text index
Lexer with stemming
Custom “stopword” list
integrate
Abstract: In this session, learn how to integrate
Oracle Imaging and Process Management with your
Oracle Financials Accounts
account Payable system by
utilize Oracle Imaging and Process Management
utilizing
and Oracle BPEL Process Manager. See how a
paperless, Web-based solution was developed
develop to
automate the processing
process of invoices.
invoice
XX XX integrate
Abstract: In this session, learn how to integrate
X X XX
Oracle Imaging and Process Management with your
X
Oracle Financials Accounts X
account Payable system by
X X
utilize Oracle Imaging and Process Management
utilizing
X X X XX
and Oracle BPEL Process Manager. See how a
X X
paperless, Web-based solution was developed
develop to
X
automate the processing X
process of invoices.
invoice
ctx_ddl.create_preference('oow_lexer', 'BASIC_LEXER');
ctx_ddl.set_attribute('oow_lexer','index_stems','ENGLISH');
ctx_ddl.set_attribute('oow_lexer','index_text','true');
ctx_ddl.create_stoplist('oow_stoplist', 'BASIC_STOPLIST');
ctx_ddl.add_stopword('oow_stoplist', 'your'); /*…*/
ctx_ddl.add_stopword('oow_stoplist', 'oracle');
Integrate .23
Account .04
Payable .26
Imaging .62
Process .09
Management .05
Technology .17
Content .08
Collaboration .43
…
Consider
A session, S1, title and abstract containing 100 words
Word „mining‟ appears 6 times in S1
Term frequency (TF) for „mining‟ in S1 is 6/100, or 0.06
Integrate .23
Specify the maximum
Account .04 number of terms
Payable .26 to represent entire corpus
Imaging .62 to represent the document
Process .09
Management .05
Technology .17
Content .08
Collaboration .43
…
Cluster sessions
Concatenate all session-related text
Text Mining data prep – create text index
Lexer with stemming
Custom stop word list
1000 max terms in corpus
30 max terms per document
Build k-Means model with 20 clusters (themes)
Score 2008 and 2009 sessions to identify theme probabilities
ORA_JDE
ORA_PS
ORA_SIEBEL DEV_EN_TEXT_EDITOR 1
PROFIT_MAGAZINE_SUBSCRIPTION
UG_MEM_APOUC DEV_EN_VI 1 Predict themes
UG_MEM_EOUC
GEOGRAPHIC_REGION Americas
UG_MEM_HEUG
UG_MEM_IOUG
UG_MEM_OAUG INDUSTRY Aerospace (clusters) for
UG_MEM_ODTUG
UG_MEM_OHUG
ORACLE_PARTNER Yes “Joe”
UG_MEM_QIUG
UG_INFO_APOUC
JOB_TITLE_DBA 1
UG_INFO_EOUC
UG_INFO_HEUG
JOB_TITLE_SENIOR 1
UG_INFO_IOUG
UG_INFO_OAUG
UG_INFO_ODTUG
UG_INFO_OHUG
UG_INFO_QIUG
UG_INFO_DO_NOT_SEND_ORA_INFO
JOB_TITLE_MANAGER
JOB_TITLE_PARTNER
JOB_TITLE_PROJECT_LEAD
JOB_TITLE_MARKETING
JOB_TITLE_PRESIDENT
JOB_TITLE_VICE
JOB_TITLE_DIRECTOR
JOB_TITLE_ARCHITECT
JOB_TITLE_ANALYST
JOB_TITLE_DBA
JOB_TITLE_DEVELOPER
JOB_TITLE_SALES
JOB_TITLE_PROD_MGR
JOB_TITLE_CHIEF_OFFICER
JOB_TITLE_CONSULTANT
JOB_TITLE_SENIOR
JOB_TITLE_STUDENT
How Does This Session Rank for Joe?
Probability
)
order by attend_id, score desc
Session 1
Session N
…
to same attendees
Build
Build the
models
using
these datasets
Linear behavior of
recommendations
Point of maximum
Enrichment Score
Recommendation
enrichment
Model-ranked sessions
Model score
Model-ranked sessions Model-ranked sessions
Model-ranked sessions
Model score
Model-ranked sessions
Model score
PM Model
Random Model
P(NE)
NE
Normalized Enrichment
Agenda
Data preparation
Focus on tracks, tags, categories
Tokenize targeted terms from title and abstract fields
E.g., “Oracle Data Mining” “OracleDataMining”
…
Score each session
Cluster against each
Sessions theme (cluster)
Ranked
Related
Sessions
.95
Vector multiply each
session‟s cluster
x =
scores against all .81
other sessions‟ cluster
scores for total
…
.67
…
order ranking of
…
related sessions
…
2009 Session Other 2009 Sessions
Cluster Scores Cluster Scores
2009 Sessions Vector Vectors
2009 Themes
(200 clusters)
Agenda
Conversion rate
percentage of attendees who used at least 1 recommendation
Circa 2004
OOW‟08 Recommendation Engine Results
Detail
search.oracle.com
or
oracle.com
www.oracle.com/technology/products/bi/odm/index.html
The preceding is intended to outline our general
product direction. It is intended for information
purposes only, and may not be incorporated into any
contract. It is not a commitment to deliver any
material, code, or functionality, and should not be
relied upon in making purchasing decisions.
The development, release, and timing of any
features or functionality described for Oracle‟s
products remains at the sole discretion of Oracle.