Beruflich Dokumente
Kultur Dokumente
Please Note:
IBMs statements regarding its plans, directions, and intent are subject to change or
withdrawal without notice at IBMs sole discretion.
Information regarding potential future products is intended to outline our general product
direction and it should not be relied on in making a purchasing decision.
The information mentioned regarding potential future products is not a commitment,
promise, or legal obligation to deliver any material, code or functionality. Information
about potential future products may not be incorporated into any contract. The
development, release, and timing of any future features or functionality described for our
products remains at our sole discretion.
Analytic Applications
BI /
Exploration / Functional Industry Predictive Content
BI /
Reporting Visualization
App
App
Analytics Analytics
Reporting
Application
Development
Systems
Management
Accelerators
Hadoop
System
Stream
Computing
Data
Warehouse
Development tooling
Analytic Accelerators
Application and industry accelerators
Visualization
Hadoop
System
Better management
Full POSIX compliance, supports multiple storage technologies
Better security
Kernel-level file system that can exploits OS-level security
Workload Optimization
Optimized performance for big data analytic workloads
Adaptive MapReduce
Task
Map
Adaptive Map
Reduce
(optimization
order small units of work)
(many results to a
single result set)
height:
640
width:
480
data:
height:
1280
width:
1024
data:
height:
640
width:
480
data:
Video analytics
Example Contour Detection
Original Picture
Contour Detection
Capabilities
Massive parallel processing engine
High performance OLAP
Mixed operational and analytic workloads
Data
Warehouse
Speed:
Simplicity:
Minimal administration
and tuning
Scalability:
Smart:
High-performance
advanced analytics
Dedicated High
Performance
Disk Storage
Blades With
Custom FPGA
Accelerators
Eclipse
Client
nzAdaptors
nzAnalytics
Partner
ADE or IDE
Host
Hosts
nzAdaptors
nzAnalytics
nzAdaptors
nzAnalytics
Partner
Visualization
nzAdaptors
Disk
Enclosures
S-Blades
Partner
ADE or IDE
Network
Fabric
Clients
Model
LARGE
DATA SET
Analytics
Model
Analytic
Workbench
Building
Model
Building
Host
Model
LARGE
DATA SET
Analytics
Model
LARGE
DATA SET
Analytics
S-Blades
Disk Enclosures
Structured
Unstructured
Streaming
Big Data
Platform
Benefits
Transform and aggregate
any volume of information
Deliver data in batch or real
time through visually
designed logic
Hundreds of built-in
transformation functions
Metadata-driven
productivity, enabling
collaboration
16
Dataflows can be
arbitrary DAGs
Target
Clean 1
Import
Analyze
Merge
Clean 2
Configuration File
Deployment and Execution
Analyze
Inter-node communications
Parallelization of operations
We connect to EVERYTHING
RDBMS
General Access
Legacy
Sequential File
Complex Flat File
File / Data Sets
Named Pipe
FTP
External Command Call
Parallel/wrapped 3rd party apps
WebSphere MQ
Java Messaging Services (JMS)
Java
Distributed Transactions
XML & XSL-T
Web Services (SOAP)
Enterprise Java Beans (EJB)
EDI
EBXML
FIX
SWIFT
HIPAA
ADABAS
VSAM
IMS
IDMS
Datacom/DB
Enterprise Applications
CDC / Replication
JDE/PeopleSoft
EnterpriseOne
Oracle eBusiness Suite
PeopleSoft Enterprise
SAS
SAP R/3 & BI
SAP XI
Siebel
Salesforce.com
Hyperion Essbase
And more
Data Lineage
Traceability across business and IT domains
Show relationships between business terms, data model
entities, and technical and report fields
Allows business term relationships to be understood
Shows stewardship relationships on business terms
Lineage for DataStage Jobs is always displayed initially at a
summary Job level
20
Extended
Data Source
21
Extended
Mapping
InfoSphere DataStage
Job Design
Visualization
& Discovery
Application
Development
Systems
Management
Extract
Explore
Iterate
Document-level info
Cleanse, normalize
Object-oriented APIs
for implementing dataparallel algorithms
class MyAlgorithm {
initializeTask()
beginIteration()
processRecord()
mergeTasks()
endIteration()
}
Mapper
Initialize
Serialize
Start
Map
Reduce
Incorporate
Post
Begin
Phase
Data-Scan
Hadoop
Another
Iteration?
Phase
Algorithm
Algorithm
Data
Job
Iteration?
Processing
Scan
Object
Object
Results
the
parallel
file
system
would be
but
the
algorithm
serialization
NIMBLE
closest
data
aggregating
return
partitions
true
center
mappers
if the
not
and
new-center
converged
update objects
the would
remain
the data
samepartitions
between
corresponding
accumulators
across
accumulators
Hadoop-based and Netezzabased implementations
Mapper
HDFS / GPFS
class MyAlgorithm {
initializeTask()
initializeTask()
class MyAlgorithm {
initializeTask()
beginIteration()
beginIteration()
beginIteration()
processRecord()
processRecord()
mergeTasks()
endIteration()
mergeTasks()
mergeTasks()
}
endIteration()
endIteration()
}
class MyAlgorithm {
initializeTask()
beginIteration()
processRecord()
mergeTasks()
endIteration()
}
class MyAlgorithm {
initializeTask()
beginIteration()
processRecord()
mergeTasks()
endIteration()
}
Mapper
class MyAlgorithm {
initializeTask()
beginIteration()
processRecord()
mergeTasks()
endIteration()
}
Mapper
class MyAlgorithm {
initializeTask()
beginIteration()
processRecord()
mergeTasks()
endIteration()
}
Reducer
and reducers
would be replaced
with UDAPs (UserDefined Analytic
Processes),
class MyAlgorithm {
initializeTask()
beginIteration()
processRecord()
mergeTasks()
endIteration()
}
Reducer
Transpose
BasicOnePassTask
Can execute in Mapper or Reducer
Transpose
MM (matrix multiply)
BasicOnePassMergeTask
Has Map and Reduce components
B
MAP
MM
MM
B*A
A*C
REDUCE
Add
Multiply
Binary hop
Multiply
Binary hop
Divide
Binary lop
Multiply
R1
Group lop
Binary lop
B
M1
Divide
MR Job
Group lop
C
Language
HOP Component
LOP Component
Runtime
Multiple low-level
operators combined in a
MapReduce job
Clustering
K-Means Clustering
Kernel K-Means
Fuzzy K-Means
Iclust
Dimensionality Reduction
Graph Algorithms
Regression Modeling
Linear Regression
Regularized Linear Models
Logistic Regression
Transform Regression
Conjugate Gradient Solver
Conjugate Gradient Lanczos Solver
Miscellaneous
k-Nearest Neighbors
Outlier Detection
In response to a simple
processing request
MARIO automatically assembles analytics
into a variety of real-time situational
applications
29
In response to a simple
processing request
MARIO automatically assembles analytics
into a variety of real-time situational
applications
Deploys application components across
multiple platforms, establishes interplatform dataflow connections
Initiates continuous processing of flowing
data
Manages crossplatform operation
Accelerators
Retail Customer
Intelligence
Customer Behavior and
Lifetime Value Analysis
Finance
Public transportation
Data mining
Streaming statistical
analysis
Analytic Applications
BI /
Exploration / Functional Industry Predictive Content
Reporting Visualization
App
App
Analytics Analytics