Sie sind auf Seite 1von 32

2012 IBM Corporation

Pushing the Frontiers of Analytics


Brenda Dietrich, IBM Fellow & VP
CTO, Business Analytics
2012 IBM Corporation 2
Global Technology Outlook Objectives
GTO identifies significant technology trends
early. It looks for high impact disruptive
technologies leading to game changing
products and services over a 3-10 year
horizon.
Technology thresholds identified in a GTO
demonstrate their influence on clients,
enterprises, & industries and have high
potential to create new businesses.
2012 IBM Corporation 3
Global Technology Outlook 2012
Uncertain data and analytics are major themes
Systems of People
The Future of Analytics
The Future Watson
Managing Uncertain Data at Scale
Measuring,
Modeling and
Managing
RISK
2012 IBM Corporation 4
Analytics is broadly defined as the use of data and computation to make
smart decisions
Data
Historical
Simulated
Text Video, Images Audio
Data instances
Reports and queries on
data aggregates
Predictive models
Answers and confidence
Feedback and learning
Decision point Possible outcomes
O
p
t
io
n
1
Option 2
O
p
t
i
o
n

3
2012 IBM Corporation 5
The value of analytics grows by incorporating new sources of data,
composing a variety of analytic techniques, spanning organizational silos,
and enabling iterative, user-driven interaction
S
o
u
r
c
e
s

a
n
d

t
y
p
e
s

o
f

d
a
t
a
New format or
usage of data
Structured or
standardized
Scope of decision
Low High
Multi-modal
demand forecasting Intent-to-buy trends
Segmentation-
based
market impact
estimates
Price-based
demand forecasting
(own & competitors)
Sales-based
demand
forecasting
2012 IBM Corporation 6
Analytic solutions will apply multiple methods to multiple forms of data
Example: Utility Vegetation Management
Effective Right of Way vegetation management is critical to streamlined utility operations
Traditional Right of Way programs are mainly static-scenario driven on a six year cycle
Static and rigid models lead to predominantly reactive operations, which are expensive
Focus on narrow corridor widths fails to address severe weather impact
A multimodal analytics approach can overcome these shortcomings
Structured data (e.g. transmission line maps) and unstructured data (e.g. LIDAR sensor)
Advanced modeling to perform a dynamic scenario-driven analysis
3-Dimensional
Model
Recovery
Right-of-Way
Dynamic
Forecasting
Model
Schedule
Generator
Visualization
ELECTRIC
TELECOMMUNICATIONS
RAIL
ROAD
OIL S
o
l
u
t
i
o
n

F
r
a
m
e
w
o
r
k
SENSORS
UTILITY DATA
MAPS
WEATHER
Preprocessor
Preprocessor
Preprocessor
Preprocessor
6
2012 IBM Corporation 7
7
IBM Applies Analytics Internally for Growth, Operational
Effectiveness and Agility
Sales Process
Risk
Management
Manufacturing Development
HR Resource
Planning
Finance
Matching engine
analyzes
unstructured
job and resume info
to identify best staff
for open positions
on projects
Next-gen 300M
semiconductor
facility uses real-time
analysis of
manufacturing
process data to
improve yields
Simulating future
performance of
development portfolio
and investment mix
Consolidating and
analyzing data from
disparate sources
to automate
compliance issues
and pinpoint risks
Optimizing trade-
offs between
interest rate
expense and
liquidity thru
modeling and
simulation
Applying science to
the art of sales
prospecting,
planning and
opportunity
identification
2012 IBM Corporation 8
Financier Value and risk-based project portfolio analysis
Problem Statement
Understand relationships between Value (NPV, ROI) and Risk.
Current valuation techniques treat predicted value as a discrete entity. To
understand risks, need to use random variables.
Approach
Use Monte Carlo techniques
to reason about value and
risk.
Adapted for software
development project analysts
Why is it hard?
Capturing and presenting
appropriate datasignificant
usability and automation
issues.
Consumability of analytics:
especially for project funding
options.
2012 IBM Corporation 9
Data Acquisition
Analytics solution development requires several interacting design steps
Streaming data
Text data
Multi-dimensional
Time series
Geo spatial
Relational
Data mining
& statistics
Optimization
& simulation
Fuzzy
matching
Network
algorithms
Composition and
Packaging
Core Analytics
Filtering and
Extraction Validation
Social network
Video
& image
Semantic
analysis
Business Rules Engine
Data Evaluation and Fusion
Algorithm Composition and Invention
Testing and Execution Optimization

Deployment
New
algorithms
2012 IBM Corporation 10
Analytics toolkits will be expanded to support ingestion and interpretation of
unstructured data, and enable adaptation and learning
Extended from: Competing on Analytics, Davenport and Harris, 2007
Standard Reporting
Ad hoc Reporting
Query/Drill Down
Alerts
Forecasting
Simulation
Predictive Modeling
In memory data, fuzzy search, geo spatial
Causality, probabilistic, confidence levels
High fidelity, games, data farming
Larger data sets, nonlinear regression
Rules/triggers, context sensitive, complex events
Query by example, user defined reports
Real time, visualizations, user interaction
Report
Decide and Act
Understand
and Predict
Collect and
Ingest/Interpret
Learn
T
r
a
d
i
t
i
o
n
a
l

N
e
w

D
a
t
a

N
e
w

M
e
t
h
o
d
s

Optimization
Optimization under Uncertainty
Decision complexity, solution speed
Quantifying or mitigating risk
Adaptive Analysis
Continual Analysis Responding to local change/feedback
Responding to context
Entity Resolution
Annotation and Tokenization
Relationship, Feature Extraction
People, roles, locations, things
Rules, semantic inferencing, matching
Automated, crowd sourced
Decide what to count;
enable accurate counting
In the context of the
decision process
10
2012 IBM Corporation 11
IT enabling people-centric processes generates large amounts of data
CRM Claims
Delivery
Records
Patents &
Publications
Clients served
Products sold
Sales patterns
Productivity
In the last couple of
weeks, Ive talked to ABC
bank, XYZ and at a
security conference.
Status: Working
Expert: Security
Engagements
worked
Team info
Work specs
Tasks
accomplished
Productivity
Innovation
Products
Technical
leadership
Status updates alone on Facebook amount to more
than ten times more words than on all blogs
worldwide - David Kirkpatrick, The Facebook Effect
Status: At conference
Influencer
Rich information (e.g. expertise, work patterns, response to incentives, digital reputation)
is flowing through on-line collaboration and enterprise systems
Capturing this information enables analytics to be applied to people-centric processes
2012 IBM Corporation 12
Strength of Sales Force Index is an example of what is possible with a rich
representation of people
TODAY
Years selling
Job change
Salary band
PBC
FUTURE
True skills and expertise
Disciplines
Clients served
Products sold
Team experiences
Connections
Incentives and responses
Career path

SSFI mines sales force data to understand
which attributes of a seller (e.g. skills,
experiences), sales team (e.g. team
composition, territories) or sales process
(e.g. incentives, coverage model) are
driving sales performance (quota
attainment, win rates, productivity)
SSFI identifies:
Reasons for performance disparities (at
individual or group level), and the best set of
actions to drive performance
Why is our sales force in Region X not
performing at par with other regions or
competition?
What actions can we take to improve sales
performance?
What are the incentives that truly drive
performance?
2012 IBM Corporation 13
13
Executing on SoP vision depends on three key capabilities
Develop capabilities to create
a representation of a
persons skills, experiences,
preferences, digital
reputation
In a structured and
organized way, so it can be
used for the purpose of
running a business
Implement capabilities for
people-centric process
optimization within an
analytics platform for rapid,
on-demand deployment
matching, talent cloud
crowdsourcing, predictive markets
simulation of workforce trends
performance analytics
behavior modeling
Incorporate capabilities that
adapt content for situations
and needs, and enhance
communication over many
devices, across diverse
pools of talent
context-aware
cognitive load management
translation, transcription
text-to-speech, voice
People Content PEOPLE ANALYTICS PEOPLE ENABLEMENT
2012 IBM Corporation 14
The future of Watson: Efficient decision support over unstructured (and
structured) content
Unstructured Data
Broad, rich in context
Rapidly growing, current
Invaluable yet under utilized
S
Q
L
/
X
Q
u
e
r
y
E
x
i
s
t
i
n
g

B
I
I
n
f
e
r
e
n
c
e
/
R
u
l
e
s
Structured Data
Precise, explicit
Narrow, expensive
Jeopardy! Challenge
Deeper Understanding but Brittle
High Precision at High Cost
Narrow Limited Coverage
Shallow Understanding
Low Precision
Broad Coverage
Deeper Understanding,
Higher Precision and Broader,
Timely Coverage at lower costs
K
e
y

W
o
r
d
S
e
a
r
c
h
R
e
l
e
v
a
n
c
e

R
a
n
k
i
n
g
O
p
e
n
-
D
o
m
a
i
n

Q
u
e
s
t
i
o
n
-
A
n
s
w
e
r
i
n
g
2012 IBM Corporation 15
Learning Understanding Interacting Explaining
Specific Questions
The type of murmur
associated with this
condition is harsh,
systolic, and increases
in intensity with
Valsalva
From specific
questions
to rich, incomplete
problem
scenarios
(e.g. EHR)
Rich Problem
Scenarios
Entire
Medical
Record
Question-In/Answer-Out
Evidence
analysis and
look-ahead,
drive interactive
dialog to refine
answers and
evidence
Interactive Dialog
Teach Watson
Refined Answers, Follow-up
Questions
Input, Responses
Dialog
Batch Training Process
Scale domain
learning and
adaptation rate
and efficiency
Continuous Training
& Learning Process
Answers,
Corrections, Judgements
Responses, Learning
Questions
Precise Answers
& Accurate Confidences
Move from
quality answers
to quality
answers and
evidence
Comparative
Evidence Profiles
Taking Watson beyond Jeopardy!
2012 IBM Corporation 16
Massive Scale Analytics
yr mo wk day hr min sec ms s
Traditional
Data warehouse &
Business Intelligence
Data in Motion
D
a
t
a

a
t

R
e
s
t
Up to
10,000
times
larger
Up to 10,000
times faster
D
a
t
a

S
c
a
l
e
Decision Frequency
Occasional Frequent Real-time
Amount of data
Hundreds of Tera
Bytes in large cases
Internet Scale,
PetaBytes
Affordability
Only very big
Enterprises
Pay-as-you-go through
cloud expansion
Deep Analytics
Statistic modeling
Sampling
Predictive, Fine Grained
Analytics
Type of data
Mostly Relational
& structured
Unstructured, Human and
Sensor Generated
Insight gained
Customer
segmentation
Service that knows you
Traditional Data Big Data
A new market is emerging around Big Data
Technology for handling such Big Data is now entering the enterprise
2012 IBM Corporation 17
Applications for Big Data Analytics are Endless
Corporate Knowledge
Q&A, Search
100s GB data
Deep Analytics
Sub-scnd
response
Traffic, Water
250K probes/sec
630K segments/sec
2 ms/decision,
Cities Pharmas
Churn
100K records/sec
9B/day
10 ms/decision
Government Wall Street Telco Companies
Drug, Treatment
Millions of SNPs
1000s patients
From weeks to
days
Risk, Stability
PTBs of data
Deeper analysis
Nightly to hourly
Cyber Sec.
600,000 docs/sec
50B/day
1-2 ms/decision
Credit Card Vendors Consumer Products Media & Ent.
Digital Rights
500B photos/year
70K TB/year
media
Low-latency
filtering
Consumer Insight
100Ms documents
Millions of Influencers
Daily re-analysis
Fraud Detection
15TB per year
1 week -> 3
hours
2012 IBM Corporation 18
* Truthfulness, accuracy or precision, correctness
The fourth dimension of Big Data: Veracity handling data in doubt
Volume Velocity Veracity* Variety
Data at Rest
Terabytes to
exabytes of existing
data to process
Data in Motion
Streaming data,
milliseconds to
seconds to respond
Data in Many
Forms
Structured,
unstructured, text,
multimedia
Data in Doubt
Uncertainty due to
data inconsistency
& incompleteness,
ambiguities, latency,
deception, model
approximations
2012 IBM Corporation 19
Forecasting a hurricane
(www.noaa.gov)
Fitting a curve to data
Model Uncertainty
All modeling is approximate
Process Uncertainty
Processes contain
randomness
Uncertainty arises from many sources
Uncertain travel times
Semiconductor yield
Intended
Spelling Text Entry
Actual
Spelling
GPS Uncertainty
?
?
?
Rumors
Contaminated?
{John Smith, Dallas}
{John Smith, Kansas}
Data Uncertainty
Data input is uncertain
Ambiguity
{Paris Airport} Testimony
Conflicting Data
?
?
?
2012 IBM Corporation 20
G
l
o
b
a
l

D
a
t
a

V
o
l
u
m
e

i
n

E
x
a
b
y
t
e
s
S
e
n
s
o
r
s
(
I
n
t
e
r
n
e
t

o
f

T
h
i
n
g
s
)
Multiple sources: IDC,Cisco
100
90
80
70
60
50
40
30
20
10
A
g
g
r
e
g
a
t
e

U
n
c
e
r
t
a
i
n
t
y

%
V
o
I
P
9000
8000
7000
6000
5000
4000
3000
2000
1000
0
2005 2010 2015
By 2015, 80% of all available data will be uncertain
Enterprise Data
Data quality solutions exist for
enterprise data like customer,
product, and address data, but
this is only a fraction of the
total enterprise data.
By 2015 the number of networked devices will
be double the entire global population. All
sensor data has uncertainty.
S
o
c
i
a
l

M
e
d
i
a
(
v
i
d
e
o
,

a
u
d
i
o

a
n
d

t
e
x
t
)
The total number of social media
accounts exceeds the entire global
population. This data is highly uncertain
in both its expression and content.
2012 IBM Corporation 21
Requires specific business process and industry context
How to reduce uncertainty in processes, models, and data
Constructing context for better understanding
Extract as much information as feasible from each source
Combine (fuse) data from multiple sources
More data from more sources is better
Gathers more evidence for statistical methods
Using statistical methods scaled for Big Data
Stochastic techniques efficiently reason about uncertainty
Monte Carlo techniques explore many possible scenarios
in order to gain insight
2012 IBM Corporation 22
Required: tight integration to maximize
context discovery
Required: common practices followed
by multiple standards for representing
uncertain data and uncertainty of all
types, provenance, and lineage and
other metadata
Required: common APIs to enable
sharing across the uncertainty
management pipeline
No such common practices,
standards or APIs exist today
Condensing data reduces uncertainty by constructing context
Customer at Mall
Customer in Store #42
Correlation
Data finds
Data
Sense Making
Fact
Discovery
Son
Mother
Birthday
Date
Spatial Reasoning
A
&
Temporal
Reasoning
&
Corroboration
(Evidence Combination)
ETC.
Michael
San Jose, CA
Credit
Loyalty
Influencers
Buying
DSLR
today !
Buying
DSLR
today !
Intent
C
O
N
D
E
N
S
E
$999 $560
In-Store Pricing
And Discounts
Maximum Context
For
Minimum Uncertainty
$999
$560
OR
Buying
a DSLR
today !
NY
2012 IBM Corporation 23
Enterprise Risk Management Cycle
A well operated Enterprise
Risk Management practice
follows a general cycle of
identifying risks,
assessing and planning
for the risk events,
deploying and monitoring
the solutions and reporting
and reviewing the events
as they occur. A feedback
process then drives
adaptation as warranted by
going back to the identify &
assess steps by validating,
correlating, and prioritizing
events, plans, and
experiences.
Governance Across Risks
Managing Individual Risks
Managing Risk Events
Identify
Assess
Deploy
Monitor
Assert
3
rd
party
attestation
Report
Crisis Mgt
Review
Business Recovery
Expected
Event occurred
Training /
Exercise
Plan
& Develop
Unexpected
Event occurred
ERM
Governance
External
Regulators,
Auditors
Validate,
Correlate,
Prioritize
each cycle
starts here
23
2012 IBM Corporation 24
24
Risk Area Imperative
Financial Risk
Understand risk exposure across silos in order to make rapid
risk decisions consistent with your firms risk appetite
Fraud Risk
Instrument organizations to detect and interrupt
crime patterns (Medicare claims, Social Security payments, etc)
IT & Operational Risk
Anticipate and mitigate potential risk from failed internal processes,
people or systems
Governance & Compliance
Comply with voluntary and mandated regulations while differentiating
your competitive position
IBMs Risk Management Capabilities
Organized around a Holistic View of Both Financial and Non-financial Risk
Financial
Risk
Fraud
Risk
Operational
& IT Risk
Governance &
Compliance
Integrated Risk Management
2012 IBM Corporation 25
25
Sources: PO & invoice history
Refresh with most current time period of history
(Planned: past forecast of compliance vs. actual)
Investigate overall compliance levels & trends
Focus on biggest non-compliance groupings
Load Normalized Data
Client review &
agreement
Select best options
Assign ownership
Secure commitment
Drive actions to secure
desired outcomes
Compliance Visibility
Define
compliance
objectives
Identify focus
areas
Run advanced
visibility
analytics
Identify Corrective
Actions and Costs
Compliance Optimization
Prescribe
actions for
optimal
returns
Run
compliance
optimizer
Formulate
compliance
model
Compliance Dashboard
Visualization
of impacts
Scenario
manager
Analytical
report service
Track
Results
Compliance Analytics Tool is used to identify opportunities for
improvement, select the optimal actions and track the results
Compliance Analytics Tool (CAT)
2012 IBM Corporation 26
26
The Compliance Analytics Tool (CAT) uses
user defined parameters to mine client
transactional data to find clusters of
non-compliant transactions
Clusters of opportunity are identified
automatically using advanced statistical
techniques saving time and manual effort
Compliance visibility
Illustration of Clustering Technique
Supplier /
Business unit
Business unit /
Size of spend
Geography /
commodity
Business unit /
Requestor
Legend:
Compliant
Non-compliant
Compliance Visibility uncovers opportunities for improved
compliance?
Compliance Visibility
Define
compliance
objectives
Identify focus
areas
Run advanced
visibility
analytics
2012 IBM Corporation 27
27
CAT uses advanced IBM software assets in an innovative way to
perform compliance analytics on client transactional spend data
CAT Architecture
2012 IBM Corporation 28
Business Analytics for ERM an organizational game-changer
The challenge:
Many organizations talk about risks, make lists, define mitigation strategies use expert
judgment (ALL IMPORTANT ACTIVITIES)
A smaller number use hard data to quantify risk or support decision-making
Analytical tools can be used to:
Quantify risks based on historical data, information from similar organizations, or data from
third party organizations
Use information about patterns of historical risks to predict future risks
Build models to support investment decisions which take cost and risk probabilities into
consideration
Develop risk models that help when there is little or no historical information and while your
organization begins the process of obtaining the data for the future
Alternatively, provide more structured and objective approaches to elicit expert when there is
no data
Use behavioral models to study emergent behaviors to help understand future risk modalities
IBM Research has created such analytics for our own business
28
2012 IBM Corporation 29
Analytics for Risk Integrated Enterprise Planning
IBMs Approach
Embed statistical forecasts in planning tools to assist users in setting baselines
Incorporate capability to understand risks associated with the plan, given uncertainties such as
market conditions and competition
Provide business leaders with ability to make risk-aware decisions in the face of uncertainty using
probabilistic range forecasting
Provide ability to identify key sensitivities to operating levers in order to drive insightful actions
Uncertainty Analysis for Enterprise Planning
Reflect probabilistic nature of uncertainty in enterprise planning by
enabling range and/or distributional projections for key
assumptions
Use Monte-Carlo method to simulate thousands of experiments to
explore and understand the universe of possible outcomes and
their likelihood of occurrence
Quantify variability in the planned outcomes
Sensitivity Analysis for Enterprise Planning
Enable specification of best and worst case values for key
assumptions (i.e. the expected range of possible values)
For one assumption at a time, use Monte-Carlo method to sample
values spanning the range to explore the corresponding range of
impacts on key performance metrics
Identify sensitivities of business outcomes to key assumptions
through, e.g., a tornado diagram showing business outcomes
prioritized by their variation in response to the range of assumption
values
Business Optimization for Enterprise Planning
Input target values for business outcomes and appropriate
constraints
Use Monte-Carlo method to simulate sets of scenarios designed
to narrow down the space of assumption values that meet the
outcome criteria
Identify optimum values of key assumptions to satisfy business
outcomes
Statistical Hierarchical Forecasting for
Enterprise Planning
Exploit the hierarchical nature of the data.
Generate consistent multilevel forecasts for planning. Optimized
algorithms determine appropriate levels for forecasting top
down, bottoms up, middle out.
Spread the results to the entire cube. Aggregate up and allocate
down.
29
2012 IBM Corporation 30
Next generation Strategic Risk toolkit: Risk Modeler will develop a
structured, consistent and repeatable process for quantifying
strategic enterprise risks and linking them to the enterprise plan
Common risk framework
Library of risk factors
Leverage of Open Pages, where
possible
Templates for risk scenarios based
on Bayesian Networks
Distributed expert elicitation of
probabilistic risk input
Specification of mitigation actions
Complex scenario analysis / optimization
Efficient simulation-based
optimization over high dimensional
space
Novel visualizations to communicate about risks
Risk maps linked to key enterprise
business processes
Risk Assessment
Analytics
(BBN)
Risk
Elicitation
Enterprise Plan
Risk
Repository
Uncertainty
Simulation
Sensitivity
(Tornado)
Goal Seek/
Optimization
Business Entity
Process / Sub-Process
Risk
Control/ Control Eval.
Risk Self Assessment
KRI
KPI
Questionnaire(s)
Assumption
Probabilities
Existing (2011)
New Risk
Modeler (2012)
New Other
SEE Streams (2012)
Risk Templates
2012 IBM Corporation 31
C
o
m
p
u
t
e
r

I
n
t
e
l
l
i
g
e
n
c
e
Time
We are Entering a New Era
Tabulating
Era
Computing Era
Smart Systems Era
2012 IBM Corporation 32

Das könnte Ihnen auch gefallen