Sie sind auf Seite 1von 10

Data Integration:

Moving Beyond ETL

A DataFlux White Paper

With la
arge amountss of business data generate
ed every day,, its not surprrising that
organiizations can become
b
overw
whelmed by t he task of maanaging it. Wh
hether theyve
e
created their own disparate
d
data
a stores over tthe years or in
nherited inforrmation
gh mergers orr acquisitions,, businesses ffrequently struggle to find reliable,
throug
consisttent ways to move
m
and ma
anipulate dataa. As a result, their data be
ecomes too
disorganized and unwieldy to be
e of any busin
ness use.
nterprise markketplace thats as technolo
ogy-driven as it is today,
Thats why, in an en
gration tools are
a not just a cconvenience theyre a ne
ecessity. Yet
effective data integ
any organizattions gravitate
e toward dataa integration p
processes tha
at arent rightt
too ma
for the
em. Whether its
i because theyre hesitan
nt to let go o f their old me
ethods, or
theyre
e simply unaw
ware of the altternatives, maany businesse
es settle for generic, onesize-fitts-all solutionss. They spend
d too much tim
me and mone
ey on data inttegration
effortss that dont de
eliver the besst outcome fo
or their goals.
Fortun
nately, there are
a better opttions available
e. Solutions th
nly flexible
hat are not on
and co
ost-effective, but
b can mana
age data acro
oss the enterp
prise. Most im
mportant, the
right solution can perform
p
the prrecise tasks a n organizatio
on requires to increase
ess, grow reve
enue and mitigate risk.
busine
Whatss the first step
p to finding a better data i ntegration syystem? Organ
nizations
should
d start by anallyzing their ne
eeds and askiing themselve
es the following questions:

Why do we
e need to integrate our datta?

What do we
e need our da
ata to do?

Once a purpose and goal have been


b
defined , its easier to
o find the righ
ht data
integra
ation tool. Perhaps an orga
anization nee
eds to achieve
e a virtual view
w of its data
for ope
erational repo
orting purposses. Or make real-time que
eries to impro
ove customerservice
e efforts. Data
a federation, in-database i ntegration orr real-time integration
might work best in those
t
circumsstances insttead of staid aapproaches liike extract,
orm and load (ETL) alone. Therefore,
T
itss important to
o realize that traditional
transfo
metho
ods of data inttegration mayy no longer b
be the optimal choice for every businesss.
Finding the right method will do
o more than ju
ust get the job
b done it ca
an take an
organiization to the next level of business succcess.

Everry Busine
ess Size, in Every IIndustry, Needs a
an
Effective Datta Integra
ation Solu
ution
Integra
ating and leve
eraging data can be challe
enging for businesses that need to
respon
nd to rapidly evolving
e
markkets and chan
nging custom
mer demands. But its not
just an
n issue limited
d to large organizations. A solid data strrategy is impo
ortant for
every business
b
size,, in every indu
ustry regard
dless of the myriad ways ea
ach business
uses itts data.
Major organizationss, for example
e, may have sseveral data sttores and ma
assive amountts
acy data or theyve comp
pleted a merg
ger or acquisittion and now
w need to
of lega

processs inherited in
nformation for their own bu
usiness purpo
oses. Smaller organizationss
may no
ot have as mu
uch informatio
on to manage
e, but must usse their existing
inform
mation to achie
eve transpare
ency and boosst efficiency. Simply put, almost every
busine
ess organization faces some type of dat a challenge and none arre immune to
the necessity of datta integration.
Figure 1: To
op Data Inte
egration Challlenges

S
Source: Data Integ
gration: Using ETL,
E
EAI and EII Tools to
o Create an
In
ntegrated Enterprise, by Colin White,,
T
TDWI Report Seriess, November 2005.

Figure 1 illustrate
es the change
es taking place
e in the deployyment of data
a integration
initiativ
ves, driving an
n ever-increasiing need for b
better solution
ns.

Find
ding the
e Right Approac
A
ch
Data in
ntegration can no longer be
b an aftertho
ought its a kkey compone
ent of data
govern
nance. But the
eres no clearr answer when
n it comes to choosing the
e right
approa
ach. Enterprisse organizatio
ons have seve
eral options fo
or data integrration
processses and soluttions, from integration me
ethods (conso
olidation, prop
pagation and
federa
ation) to web-based and so
oftware-as-a-sservice (SaaS)) models. But before
choosiing a data integration mod
del, an organiization must ffirst analyze itts needs.

Dete
ermining How Datta Will Be
e Used
Prior to
o implementing a method, organization
ns need to co
onsider their b
business
objecttives, IT requirements or acctivities that d
drive their datta integration
n needs. This
require
es broadening the scope beyond
b
the trraditional que
ery, database and reporting
tools to include ope
erational and analytic need
ds. For examp
ple, organizattions may
need the
t ability to:

Match, link and consolid


date multiple data sources to create the
e best possible
view of all required
r
data assets acrosss both IT and business

Deliver high
h-quality information to ne
ew target locaations

Apply busin
ness rules to ensure
e
all datta is fit for bussiness require
ements

Its also important to


t determine data latency tthresholds fo
or individual in
ntegration
e projects can meet their g oals integratiing batch datta, others musst
effortss. While some
have re
eal-time (or near-time)
n
datta integration . Likewise, the level of acccess an
organiization require
es might varyy; information may need to
o be fit for purrpose for
develo
opment, or co
onsumption by
b analytic app
plications or both.

Choo
osing a Strategy
S
Data in
ntegration can be as straig
ghtforward o
or as complexx as an orga
anization
deman
nds. It can be defined as moving
m
data frrom source to
o target (single scope), or
as a multi-layered process
p
incorp
porating a varriety of techn iques, approa
aches and
metho
odologies, all in an effort to
o deliver custo
ng enterprise
omized solutiions for varyin
require
ements. Once
e the processs is complete, data can be accessed, co
onsolidated,
profile
ed, improved and made avvailable to a vaariety of users as a single vview for
operattional needs and
a analysis.
One of the most im
mportant stepss to successfu
ul data integraation is choossing a
gy. Many orga
anizations rarely look beyo
ond ETL histtorically the m
most common
n
strateg
metho
od, but not ne
ecessarily the best choice ffor every type
e of business. Other option
ns
for datta integration
n include:

Data federa
ation

ELT (in-data
abase/SQL p
pushdown)

Real-time data
d
integratio
on employing
g Enterprise A
Application In
ntegration
(EAI)

Heress an example: Generic National Bank re alizes the datta warehouse its been
using for
f the past decade
d
is no longer sufficie
ent; it has too
o many disparrate silos of
data to
o manage. Sin
nce it would be
b too labor-iintensive (and
d costly) to manually sort
throug
gh the data not to mentio
on maintain itt GNB need
ds a more advvanced,
moderrnized system
m to reduce itss number of ssilos and untaangle the messs.
The ba
ank has several end-goals in mind. First , it wants to aachieve a uniffied view of
custom
mers on a sing
gle platform so
s it can refine
e marketing e
efforts and im
mprove
custom
mer service. On
O an IT level,, it wants to in
ntegrate the ssame rules accross differentt
applications to drive better busin
ness intellige nce analytics.. It also needss to remain
easing govern
nment regulaations, so it ne
eeds an easy, accurate wayy
compliant with incre
nage data witthout violating
g privacy or s ecurity laws.
to man
Which data integrattion method best hits the mark? Its fairr to say that e
each one
w
but the goal
g
here is to
o find a system
m that can acccomplish
would technically work,
ntly and accurately. The fo
ollowing
exactlyy what the bank needs quickly, efficien
section
ns will explore
e how each data
d
integratio
on method might work for the bank so iit
can realize its data managementt goals, both now and in th
he future.

Figurre 2: Preferre
ed Technolog
gies for Data
a Integration
n

Sou rce: TDWI Operational Data


Inte
egration Report, by Philip
Russsom, 2009.

Many
y business orga
anizations hav
ve traditionallyy used ETL as their go-to da
ata integration
n
solutio n.

ETL: A Common
n Choice fo
or Data Inttegration
Extract, transform and
a load is a technology
t
arrchitecture that gathers da
ata usually in
n
warehouse,
batch mode from various data sources into a single data store (data w
m
repository) by integra
ating the dataa and providin
ng it with a co
ommon
data mart,
structu
ure. And since
e it typically in
nvolves IT exp
perts doing th
heir own custom coding,
ETL is one of the most common data integrattion methods used in the m
marketplace
mited.
yet also the most lim
Grante
ed, ETL has its benefits. It can
c handle laarge quantitie
es of complexx data as well
as data
a transformattions that requ
uire multiple passes. Plus, its useful wh
hen an
organiization require
es database synchronizatio
s
on, data transsformation, frequent
accesss, analytical prrocessing or historical
h
repo
orting. Organ
nizations migh
ht employ an
ETL ap
pproach when
n they need to
o integrate no
on-transactional data that involves high
hlatencyy and heavy transformation
ns for large vo
olumes of info
ormation.
For ma
any enterprise
e organization
ns, however, ETLs drawbaacks outweigh
h its benefits.
Some of the more significant
s
issu
ues include:

Its a poor fit


f for synchro
onization due to its inabilityy to address h
high
concurrenccy, low latencyy data needs

Hand-codin
ng greatly red
duces efficien
ncies of scope
e and scale; siince theres n
no
rigidly defin
ned process, data integrattion can be inaccurate or in
ncomplete

In addition to the risk of human error,, the hours an


nd effort requ
uired to
integrate data manually drive up the cost of labor

Its large pro


ocessing overrhead carries a heavy infrastructure foottprint that can
affect the entire
e
enterprise architectu
ure

With th
hese limitatio
ons in mind, ETL
E may not b
be an ideal daata integration
n solution forr
Generric National Bank, which ne
eeds to stream
mline its integ
gration processes to drive
data management
m
efficiency
e
and
d improve cusstomer service. In fact, effe
ective

custom
mer service re
equires real-time data thats regularly up
pdated. Accurate, timely
data not only impro
oves internal analysis
a
and rreporting, it iss a competitivve
entiator when it comes to retaining busi ness since it e
enables organizations to
differe
better understand customers.
c
To
o achieve a si ngle custome
er view, the banks master
m
hub needs to
o be able to p
pull data from many source
e systems. Its
data management
data in
ntegration approach needss to have the ability to upd
date the hub with any
changes in the data
a. Since these
e tasks and o
others are o
outside the sccope of ETL, itt
or Generic Na
ational Bank.
may be a poor fit fo

In-Da
atabase/EL
LT: Quality,, Fit-for-Usse Data De
elivered Fa
aster and
More
e Efficiently
y
As an alternative to
o traditional ET
TL, some org
ganizations miight choose the extract,
d, also known
n as in-databaase transformation. With
load and transform (ELT) method
o the data tra
ansformation work occurs after the data
a has been
this process, most of
d into a target database orr repository; w
while data is sstill in its raw fformat, it is
loaded
transfo
ormed and moved to table
es before its m
made availab
ble to users.
On the
e surface, the main differen
nce between in-database iintegration an
nd the more
traditio
onal ETL is the transposed
d order of tran
nsforming the
e data but itts much more
e
than th
hat. Transform
ming the data
a after its reacched its destiination helps optimize
performance and minimize
m
cost. For example
e, since in-dattabase transfo
ormation
ons occur at the infrastructure level (as o
opposed to E
ETL, in which ttransformatio
on
functio
functio
ons are perforrmed at the in
ntegration se rver level), itss loading perfformance is
enhanced. Addition
nally, the in-database meth
hod takes advvantage of fea
atures built
he DBMS infra
astructure, helping to spee
ed processes and keep cossts in check.
into th
With ELT,
E
an organization can:

Reduce nettwork traffic

Increase the reporting capability and flexibility

Make bette
er use of embedded servicce-oriented arrchitecture (SO
OA) businesss
intelligence
e and analyticcs in many datta warehouse
es

Control cossts as a result of the system


ms central de
evelopment p
practice and
reduction of
o core integra
ation expense
es

al Bank mainta
ains informatiion on a singlle customer in
n a master
As Generic Nationa
d in multiple ssources. For
data hub, the data it relies on to create this viiew is located
ple, a custome
er might have
e a mortgage , a checking aaccount and a savings bon
nd
examp
at the same bank, yet
y her name is spelled in sslightly differe
ent ways for e
each account
M
Anne Jon
nes, Mary Ann
n Jones, Maryy A. Jones). T
To avoid send
ding this
(e.g, Mary
custom
mer duplicate corresponde
ence, GNB ne
eeds a way to correct data both at the
source
e and in the data hub so it has one view
w of the custom
mer. The in-d
database
option
n leverages EL
LTs inherent processing caapabilities to make it a mo
ore efficient,
cost-efffective choicce for master data
d
manage
ement.

Data Federatio
on: Providin
ng A Single
e Virtual V
View of Entterprise
Information
Data fe
ederation allo
ows for a virtu
ual view of datta across mulltiple data sto
ores, without
the req
quirements of actually movving or copyin
ng the data. W
While ETL mo
oves data into
o
a singlle repository, a federated view
v
enables the data to re
emain in the ssource
repository where it can remain physically
p
unch
hanged. Whe
en an organiza
ation needs to
ery processing
g capability to create a
accesss the data for business use, it uses a que
snapsh
hot of the info
ormation requ
uired. In othe r words, a use
er simply specifies the
inform
mation he or sh
he needs to know,
k
and its delivered im
mmediately in one neat,
clean package
p
without ever movving the origin
nal data.
Federation
Figure 3: Data F

Data federation is an integration


n method thatt gives users q
query access to
o current data
a
without needing
n
to phy
ysically consol idate it into a nother data sttore.

The ad
dvantage to this approach is that it delivvers an interm
mediate layer between the
data query
q
and the source. Its useful for lightt-duty and reaad-only appliccations, wherre
a user needs quick, ad hoc reporting to extra ct only certain
n elements fo
or a specific
ompany. Its also more costt-effective forr organization
ns that need to
glimpsse into the co
use da
ata primarily for these imprromptu, up-to
o-date tasks.
Despitte these bene
efits, a purely federated ap
pproach canno
ot take the pllace of ETL orr
ELT in order to pop
pulate a data warehouse.
w
T
Transforming data that com
mes from
ple systems prresents a majo
or performan ce challenge,, plus federattion prevents
multip
data frrom being altered (its read
d-only, not wrrite-to). While
e organization
ns are able to
get a clear
c
view of combined
c
data elements ffrom a varietyy of sources, they wouldnt
be able to change or
o update the
e information..

Generric National Bank might benefit from a d


data federatio
on approach ffor several
reason
ns, including:

It helps the
em remain com
mpliant in the
e face of incre
easing govern
nment
regulation. If a law preve
ents a financiaal institution ffrom co-ming
gling certain
ple, GNB coulld still get a snapshot of th
he information
n
types of data, for examp
g a copy of
it needs forr compliance purposes witthout moving or creating
the data

effective met hod of conso


olidating data views;
It provides a more cost-e
q
are less expensive than ETL nettwork, storage
e and
federated queries
maintenancce costs incurrred to move or copy data

It offers a more
m
efficient and faster op
ption than oth
her integratio
on methods
that require
e a copy of he
eterogeneouss data and co
ontent, both structured and
d
unstructure
ed

nk doesnt nee
ed to alter daata or doesn
nt always
As long as Generic National Ban
u
nute information on custom
mers, data fed
deration rema
ains a viable
need up-to-the-min
option
n for data inte
egration. Having the flexibiility to attain q
quick, specific snapshots o
of
custom
mer data enab
bles the bankk to make info
ormed decisio
ons about marketing
campa
aigns, for exam
mple or eve
en help determ
mine how (orr where) it sho
ould plan for
future expansion.

Real-Time (or Right-Time)


R
) Data Inte gration: Fa
ast, Accura
ate
Information forr Decision-Making
When it comes to trransactional or
o event-drive
en application
ns, real-time d
data
ation is the on
nly clear choicce. Because o
of its distinct benefits, this method is
integra
often required
r
in op
perational applications like
e call center o
operations or supply chain
system
ms used in ma
anufacturing. Real-time datta is also nece
essary for tacttical and
strateg
gic applications, when users need up-to
o-date data th
hats available
e as soon as
its generated.
Real-time data integ
gration is abo
out timelinesss and accuraccy; its where d
data
ation meets business
b
and operations
o
ap
pplications. W
Whether an orrganization
integra
needs to monitor up-to-the-minu
ute purchase orders, overssee compliance regulation
ns
mply with its customer
c
service model byy catching datta mistakes be
efore they
or com
reach the
t customerr, real-time da
ata integration
n is a key requirement of tthe data
manag
gement solution.
The ba
anking industry has practiccally built its ccustomer servvice business around accesss
to time
ely, accurate information. As
A Generic N ational Bank works to maintain existing
g
custom
mer relationsh
hips (and bring in new one s), it needs acccess to integ
grated, lowlatencyy data to feed
d solutions that provide a ssingle custom
mer view.

A customer data hu
ub identifies unique
u
data e
elements that provide desccriptive detail.
des in the sou
urce systems,, real-time datta integration
n makes a
While this data resid
gs
single view of the customer posssible. For exa mple, if a cusstomer openss a new saving
nt on top of her
h checking account
a
and ccalls a few ho
ours later to assk about her
accoun
balancces, the GNB customer serrvice rep need
ds real-time aaccess to her information in
n
order to
t see a comp
plete picture of her accoun
nt. Real-time data integrattion enables
Generric National Bank to remain
n accurate and
omers, even
d informed about its custo
with la
ate-breaking transactions
t
that occur in m
multiple syste
ems.

Pracctical Solu
utions forr Real-Wo
orld Busin
ness Nee
eds
As with
h most thingss, theres rarely a universal solution thatt solves every type of
proble
em, all the tim
me. The same can be said ffor data integ
gration. Organ
nizations have
e
differe
ent data integ
gration needs,, but few und erstand how to choose the
e right system
m
to reso
olve their specific challenges. They end up turning to
o traditional ssolutions like
ETL to
o get the job done
d
even if its not done
e particularly well.
Thats where a provvider of data management
m
t solutions like
e DataFlux co
omes in.
h
an organ
nization analyyze its businesss needs, it m
matches those
e
DataFlux not only helps
ation solution
n. By deliverin
ng alternativess like data
needs with the right data integra
ation, in-datab
base integratiion and real-ttime integration, DataFlux enables
federa
organiizations to cre
eate an enterp
prisewide datta manageme
ent platform tthat helps
them achieve
a
their business goals, work more
e efficiently an
nd stay ahead
d of the
compe
etition.

To lea
arn more ab
bout data integration
n, visit:
dataflux.co
om/knowled
dgecenter/d
di

www.dataflux.com

Corporate Headquarters

DataFlux Corporation
940 NW Cary Parkway
Suite 201
Cary, NC 27513-2792
USA
877 846 3589 (USA & Canada)
919 447 3000 (Direct)
info.us@dataflux.com

DataFlux United Kingdom


Enterprise House
1-2 Hatfields
London
SE1 9PG
+44 (0)20 3176 0025
info.uk@dataflux.com

DataFlux Germany
In der Neckarhelle 162
69118 Heidelberg
Germany
+49 (0) 69 66 55 42 04
info.de@dataflux.com

DataFlux France

Immeuble Danica B
21, avenue Georges Pompidou
Lyon Cedex 03
69486 Lyon
France
+33 (0) 4 72 91 31 42
info.fr@dataflux.com

DataFlux Australia
300 Burns Bay Road
Lane Cove, NSW 2066
Australia
+61 2 9428 0553
info.au@dataflux.com

DataFlux and all other DataFlux Corporation LLC product or service names are registered trademarks or trademarks of, or licensed to, DataFlux Corporation LLC in the USA and
other countries. Copyright 2010 DataFlux Corporation LLC, Cary NC, USA. All Rights Reserved. Other brand and product names are trademarks of their respective companies.

Das könnte Ihnen auch gefallen