You are on page 1of 5

rittmanmead

SERVICES
TRAINING
ACCELERATORS
ABOUT
BLOG
MENU
Blog
SUBSCRIBE
Got it!
This site creates and uses cookies. Click here to Learn More
ODI11g in the Enterprise Part 2 : Data Integration using Essbase, Messaging, and
Big Data Sources and Targets
30 DECEMBER 2012 on Technical, Oracle Data Integrator, Oracle EPM, Big Data
In this five part series over the Christmas break, I'm taking a look at some of
the lesser-known features in Oracle Data Integrator 11g, focusing on "enterprise
" features such as build automation, scripting, data quality and middleware inte
gration. If you've arrived at this post via a Google search, here's the links to
the other posts in this short series:
ODI11g in the Enterprise Part 1: Beyond Data Warehouse Table Loading
ODI11g in the Enterprise Part 2 : Data Integration using Essbase, Messaging, and
Big Data Sources and Targets
ODI 11g in the Enterprise Part 3: Data Quality and Data Profiling using Oracle E
DQ
ODI 11g in the Enterprise Part 4: Build Automation and Devops using the ODI SDK,
Groovy and ODI Tools
ODI 11g in the Enterprise Part 5: ETL Resilience and High-Availability
So the first post in this series was titled "Beyond Data Warehouse Table Loading
", and that's where we'll start this week's look into ODI 11g in the Enterprise.
If you've had any exposure to ODI you'll probably notice that it's primarily a
relational database data integration tool, using concepts such as joins, filters
and expressions to transform data together using table-like data objects called
"data stores". But ODI can extract from, transform and load into much more than
relational databases, supporting sources and targets such as flat file and XML
files, web services, JMS queues and pretty-much anything that provides a JDBC dr
iver. In the context of Oracle projects though, there are three types of non-rel
ational sources and targets that are particularly relevant; Oracle Essbase and H
yperion Planning multidimensional databases; message and service bus integration
in a Service-Orientated Architecture (SOA) environment; and big data sources an
d targets such as Hadoop and NoSQL database types.
Prior to the Oracle acquisition, there were a number of different routes availab
le to Essbase and Hyperion Planning customers looking to load one of their datab
ases or applications. At the most basic level, "rules files" can still be used t
o define SQL queries to load data into Essbase outlines and databases, either wr
itten manually or auto-generated using tools such as Essbase Studio. But rules f
iles aren't the best place to embed ETL logic and transformations (see this pres
entation by Cameron Lackpour for some good background on the topic), and over th
e past few years Oracle have transitioned from an Informatica-based solution for
Essbase ETL towards Oracle Data Integrator, which is now the "strategic" ETL op
tion for loading Essbase databases and planning applications. As shown in the di
agram below, ODI 11g can be used to load Essbase databases, Hyperion Planning ap
plications along with HFM applications through their own individual APIs, with O
DI providing a set of code template "knowledge modules" that allows developers t
o write to, and read from, abstracted data objects and with ODI under the covers
issuing the correct API calls to load the various applications.
ODI and Hyperion

Loading an Essbase database using ODI isn't the most intuitive process for Essba
se developers as ODI represents the Essbase database as a set of table/column vi
ews, but some good groundwork has been done in the developer community to establ
ish best practices and standard workflows, and it'll be interesting to see how E
ssbase support evolves over time within ODI. In the screenshots below, you can s
ee and Essbase database outline translated into a generic ODI "Model", and an Es
sbase account dimension transformed into a generic datastore and columns. At the
bottom is a set of target loading properties setting out how rules files and lo
gs are used, whilst on the right you can see a typical Hyperion Planning interfa
ce "flow" moving data from a staging data into the Planning Essbase database and
metadata tables.
Essbase and Planning within ODI
Some good sources of information on Essbase and Planning ETL using ODI are, for
example:
"Slay the Evil of Bad Data in Essbase with ODI" - Oracle Openworld 2012 presenta
tion by Cameron Lackpour on data quality and ETL in the context of Essbase, usin
g ODI 11g
"ODI and Essbase - There and Back Again" - Links to a series of blog posts by Jo
hn Goodwin on ODI and Essbase - essential reading, very comprehensive
"ETL Loads on Essbase & Planning with Oracle Data Integrator 11g" - Oracle Openw
orld 2010 presentation by Venkatakrishnan J on the various ETL options for Essba
se products
Another area we're seeing lots of uptake of ODI within is Fusion Middleware proj
ects, particularly ones that use messaging and other SOA-type features to move d
ata around the enterprise. Most SOA projects only have a requirement to move sma
ll, message-size bits of information around from service to service, and would t
ypically use products such as BPEL and Oracle Enterprise Service Bus to provide
communication between services, but when large sets of data need to be moved the
n Oracle Data Integrator comes in handy.
To take a typical example, in the diagram below data from an Orders database (1)
can take two routes through a BPEL process depending on the size of the message
payload. For regular, small messages the standard approach of handling the mess
age through BPEL steps is used (2), whereas when the payload is large, it's pass
ed by reference to ODI (3), which then loads it (4), transforms (5) and sends it
where it's required (6), notifying the calling BPEL/ESB routine when the job is
complete (7), allowing the wider BPEL routine to complete (8). ODI has a full w
eb service interface that allows packages and other ETL processes to be called f
rom BPEL invoke activities, and can read from queues, web services and other sou
rces as part of a wider SOA/Fusion Middleware deployment.
ODI for Transforming Large Payloads
There's actually a couple of other data integration products within Fusion Middl
eware 11g that you might come across when looking at SOA environments; ODI 11g i
s mainly for bulk-data movements as we've seen already, whereas another product
called "Oracle Data Service Integrator", brought across from the BEA acquisition
, provides a means to perform data service federation within a SOA project. For
anyone familiar with OBIEE, data federation is a familiar concept and is all abo
ut integrating data "in-place", in real-time, in the case of OSDI providing call
ing applications with a federated, transformed and combined view of data without
actually staging it into a database en-route. OSDI is quite an interesting prod
uct and is due for an update shortly, and is typically called from products like
BPEL or ESB (or as in the diagram below, potentially from a BI tool like OBIEE)
to provide a current, real-time, updatable view of the enterprise's data, typic
ally based on object-orientated data sources that don't readily translate into t
he table-and-column structured favoured by ODI.

OSDI and SOA Architectures


What this then leads to in more complex Fusion Middleware 11g / SOA deployments
is ODI being used for bulk-data movement, called and communicated with using web
service calls, and Oracle Data Service Integrator then used to create abstracte
d, updatable services over live data for more general messaging and data integra
tion. Note also the use of Oracle Enterprise Data Quality, which we'll cover in
tomorrow's post in this series.
OSDI and ODI in a SOA Environment
The other area that has seen a lot of innovation around ODI is around big data;
specifically, the ability to work with Hadoop clusters to leverage their ability
to process huge unstructured datasets in parallel, as a kind of supercharged "s
taging area" for a more conventional Oracle-based data warehouse. I'll cover Ora
cle and big data in more detail early in 2013, but at a high-level Oracle has ha
d a lot of activity in the big data space over the past couple of years, partner
ing with Cloudera to resell their Cloudera Hadoop distribution and management to
ols, creating a new engineered system called Oracle Big Data Appliance, and crea
ting new tools and products such as their own NoSQL database, connectors from Ha
doop and HDFS to the Oracle database, extending open-source R to create Oracle R
Enterprise, and investing in new products such as Oracle Endeca Information Dis
covery that major on unstructured, textual sources.
The role that ODI plays in this big data architecture is as a means to get data
and insights out of Hadoop and NoSQL databases into a regular Oracle data wareho
use, and as a way to load Hadoop clusters with additional data from an Oracle sy
stem to complement big data sources such as log files, meter readings, tracking
data and other high-volume, low-granularity non-traditional data sets. The graph
ic below from Oracle sets out where ODI sits in this type of arrangement, with O
DI typically using a set of Hadoop-specific knowledge modules along with the Ora
cle Connector for Hadoop to gain access to these new sources and targets.
ODI and Big Data
So when most people talk about "big data", what they are generally referring to
is Hadoop, an open-source project for co-ordinating data processing jobs across
large clusters of commodity servers, and MapReduce, a programming approach that
provides a framework and API for creating massively-parallel data analysis routi
nes. We'll cover Hadoop and MapReduce in a lot more detail in some postings earl
y in 2013, but suffice to say that whilst Hadoop and MapReduce have greatly simp
lified the process of creating parallel-processing Java applications, creating M
apReduce routines is still typically beyond most end-users and developers and so
over the past few years, sub-projects within Apache Hadoop have come about to p
rovide a SQL-style interface over Hadoop/MapReduce (Apache Hive), data loading r
outines (Apache Sqoop), scripting languages to create MapReduce programs (Apache
Pig) and so forth. Oracle also partner with Cloudera to provide the Hadoop elem
ents of their Big Data Applicance, and there are other vendors such as Hortonwor
ks and MapR who are trying to make Hadoop more accessible to end-users, and more
suited to deployment in the enterprise - see this blog post by Curt Monash on t
he different "enterprise Hadoop" vendors in the market earlier in 2012 for more
details on who's who and what products you should look at.
The way that ODI 11g accesses Hadoop clusters and MapReduce routines then, is to
use the Apache Hive SQL (HiveQL) and metadata layer to provide a relational-lik
e view over data in the Hadoop cluster, with Hive in turn generating MapReduce r
outines to transform its data and return data back to ODI for further processing
and integration. Together with a set of Hadoop/Hive-specific knowledge modules,
a JDBC driver for Hive and Oracle Loader for Hadoop, a utility for getting data

out of Hadoop and into and Oracle database, ODI can generate SQL-like data extr
action and transformation code, and have Hive do the work of creating and runnin
g the MapReduce routines used to filter, aggregate and transform data in the Had
oop cluster, as shown in the diagram below.
ODI and Hive/MapReduceTo take the canonical example of web log analysis (the fir
st major Hadoop use case, developed by Yahoo and Google to analyze thousands of
web search logs in parallel to determine most popular sites and search terms at
a particular point in time), ODI11g along with the Oracle Data Integration Adapt
or for Hadoop can work with Hadoop sources in four main ways. After creating a c
onnection to the Hadoop cluster and the Hive server using the Hive JDBC driver a
nd reverse-engineering the Hive table data model into the ODI repository, you ca
n use the IKM File to Hive knowledge module to load data into a previously-defin
ed Hive table, the first step that you need to perform before analyzing data usi
ng Hive's SQL-like interface.
NewImage
Similar features exist for transforming data within Hive using either regular SQ
L expressions, or custom transformation code, with ODI generating the HiveQL que
ry language understood by Hive, and Hive then turning this into MapReduce routin
es that sort, sift, aggregate and process data across servers in the Hadoop clus
ter, typically prior to loading the results into the main part of your data ware
house.
ODI Hive transformation
ODI also makes use of the Oracle Loader for Hadoop, which uses Hadoop and MapRed
uce to do the "heavy lifting" and then loads the results into an Oracle database
, using direct path loads and other optimisations. Other connectors and integrat
ors also exist including Oracle Direct Connector for Hadoop Distributed File Sys
tem (HDFS), typically used to provide file sources for Oracle external tables, a
nd Oracle R Connector for Hadoop, for linking R statistical and modelling routin
es to data stored in Hadoop, to allow larger sets of data to be process and mode
lled within R routines. We'll cover all of these new products early in 2013, so
keep an eye on the blog for more details and use cases, and in the meantime Orac
le's David Allan has been putting together a series of articles on Hive and ODI
over on the Oracle Data Integration blog:
ODI - Reverse Engineering Hive Tables
ODI - Basic Hive Queries
ODI - Hive External Tables, reverse engineering
ODI - Hive and Complex JSON
So there's definitely more to ODI than loading and transforming relational data.
But what if there's issues with the quality of the data you're working with? We
'll look at how ODI integrates with Oracle's newest data integration product, Or
acle Enterprise Data Quality, in the next post in this series.
Subscribe to Rittman Mead
Get the latest posts delivered right to your inbox.
Your email address
Subscribe
or subscribe via RSS with Feedly!
Mark Rittman
Read more posts by this author.
Share this Post

TECHNICAL INSIGHTS BUSINESS INSIGHTS RITTMAN MEAD LIFE


Search the Blog
Sign Up for Our Newsletter
email address
SUBSCRIBE
Recent Posts
Rittman Mead at Oracle OpenWorld 2016
Using logdump to Troubleshoot the Oracle GoldenGate for Big Data Kafka Handler
Oracle Data Integrator: Upgrade to ODI 12c - A Case Study
Becky's BI Apps Corner: OBIA Back-to-Beginnings - Naming Conventions and Jargon
How To Poke Around OBIEE on Linux with strace (Working with Unsupported ODBC Sou
rces in OBIEE 12c)
ODI 11g in the Enterprise Part 3: Data Quality and Data Profiling using Oracle E
DQ
In this special Christmas Break series of postings on Oracle Data Integrator 11g
, we've been looking at how ODI
ODI11g in the Enterprise Part 1: Beyond Data Warehouse Table Loading
Most developers who've used Oracle Data Integrator in the past have used it to l
oad data warehouse tables, typically
About Us
Rittman Mead consults, trains, and innovates within the world of Oracle Business
Intelligence, data integration, and analytics. We work globally with clients, p
artners and Oracle Corporation to deliver world-class solutions based around Ora
cle and related BI technologies.

Get in Touch
Rittman Mead Consulting Ltd.
Suite B, First Floor,
Moore House, 11-13 Black Lion Street,
Brighton, BN1 1ND, United Kingdom
P: +44 (0) 1273 911 268
Rittman Mead America Inc.
5280 Avalon Boulevard
Alpharetta, Ga 30009
P: (888) 631-1410
2010 - 2016 Rittman Mead. All rights reserved.
Privacy Policy | Manage Your Cookie Settings
Subscribe to Our
Monthly Newsletter!
* indicates required
Email Address *
First Name
Last Name
SUBSCRIBE
Don't Show This Again