Sie sind auf Seite 1von 10

This research note is restricted to the personal use of Stephen Oudet (Soudet@deloitte.fr).

Does the 21st-Century "Big Data"


Warehouse Mean the End of the Enterprise
Data Warehouse?
25 August 2011| I!"0021#0$1
%ar& A. 'e(er | onald )ein*erg
The ideal enterprise data +arehouse has *een en,isaged as a centrali-ed repositor( for 25
(ears. *ut the ti/e has co/e for a ne+ t(pe of +arehouse to handle 0*ig data.0 This
0logical data +arehouse0 de/ands radical realign/ent of practices and a h(*rid
architecture of repositories and ser,ices.
Overview
The ne+ data +arehouse needed for the infor/ation /anage/ent de/ands of the 21st
centur( is not a replace/ent for e1isting practices. 2ather. it in,ol,es a funda/ental
realign/ent of al/ost e,er( e1isting practice in order to pro,ide specific functionalit(
+ithin a rest(led architecture that capitali-es on the greatest strength of e,er( techni3ue.
approach and strateg(. At the sa/e ti/e. it introduces fresh techni3ues and architectural
capa*ilities to /eet the de/and. created *( 0*ig data.0 cloud utili-ation. operational
technolog( and social /edia. for deli,er( of data to traditional. readil( a,aila*le and
consu/er4st(le anal(tics tools. The focus is on the data4processing or infor/ation
/anage/ent logic. not the ph(sical infrastructure 5 this is a 0logical data
+arehouse0 (67).
Key indings
8 The ,ast /a9orit( of organi-ations (9udging fro/ o,er :5; of the data +arehouse
in3uiries recei,ed fro/ "artner clients) select a single deplo(/ent st(le for +hat
the( ter/ an enterprise data +arehouse (<7). In doing so the( create a
co/pro/ised en,iron/ent that fails to deli,er on so/e aspect of the associated
S6A.
8 Organi-ations that deplo( an <7 al/ost all create second and third data
+arehouses or /arts to support additional user needs (9udging fro/ up to =0; of
the data +arehouse in3uiries recei,ed fro/ "artner clients). despite strict
instructions to use the <7.
8 The architectural st(le of a data +arehouse is usuall( deter/ined *( the a,aila*le
s&ills and tools. and secondaril( *( ti/e4to4deli,er(. in preference to the anticipated
future fle1i*ilit( or e1tensi*ilit( of the solution.
!e"o##endations
8 Start (our e,olution to+ard a 67 *( identif(ing data assets that are not easil(
addressed *( traditional data integration approaches and>or easil( supported *( a
0single ,ersion of the truth.0 ?onsider all technolog( options for data access and do
not focus onl( on consolidated repositories. This is especiall( rele,ant to 0*ig data0
issues.
8 Identif( pilot pro9ects in +hich to use 67 concepts *( focusing on highl( ,olatile
and significantl( interdependent *usiness processes.
8 @se an 67 to create a single. logicall( consistent infor/ation resource independent
of an( se/antic la(er that is specific to an anal(tic platfor/. The 67 should
/anage reused se/antics and reused data.
$a%&e of Contents
Anal(sis
<nding the <ra of eficient ?o/pro/ise
Ser,ice 6e,el and 'enefit <1pectations 5 2e,isited
A ?o/*ined Ser,ices and Infor/ation Asset %anage/ent Alatfor/
The 6ogical ata 7arehouse Architecture
<,ol,ing To+ard the 6ogical ata 7arehouse
Bo+ <1isting Technolog( ?an )it In
Page 1 of 10 Print Document
15/01/2014 http://my.gartner.com/portal/server.pt/gateway/PT!"#$0$2%%&5&'$'5'$25&$2'50...
Ta*le 1. ata 7arehouse Architecture Arinciples. Ser,ice ri,ers and
Ari/ar( 6i/itations
)igure 1. Su//ar( of Standard ata 7arehouse Ser,ice ?ontracts
)igure 2. Infor/ation ?apa*ilities )ra/e+or& %anage/ent and Se/antic
Ser,ices ?ategories
)igure #. Traditional ata 7arehouse and 'usiness Intelligence
Infrastructure
)igure C. Ser,ices4Oriented Anal(tics Infor/ation %anage/ent
2eco//ended 2eading
'ist of $a%&es
'ist of igures
(na&ysis
This docu/ent +as re,ised on 5 Septe/*er 2011. )or /ore infor/ation. see the
?orrections page on gartner.co/.
ata +arehouse architecture is undergoing an i/portant e,olution. as co/pared +ith the
relati,e stasis of the pre,ious 25 (ears. 7hile the ter/ 0data +arehouse0 +as coined
around 1=$=. the architectural st(le predated the ter/ (at A/erican Airlines. )rito46a( and
?oca4?ola).
At its core. a data +arehouse is a negotiated. consistent logical /odel that is populated
using predefined transfor/ation processes. O,er the (ears. the ,arious options 5
centrali-ed <7. federated /arts. hu*4and4spo&e arra( of central +arehouse +ith
dependent /arts. and ,irtual +arehouse 5 ha,e all ser,ed to e/phasi-e certain aspects of
the ser,ice e1pectations for a data +arehouse. The co//on thread running through all
st(les is that the( +ere repositor(4oriented. This. ho+e,er. is changing! the data
+arehouse is e,ol,ing fro/ co/peting repositor( concepts to include a full( ena*led data
/anage/ent and infor/ation4processing platfor/. This ne+ +arehouse forces a co/plete
rethin& of ho+ data is /anipulated. and +here in the architecture each t(pe of processing
occurs to support transfor/ation and integration. It also introduces a go,ernance /odel
that is onl( loosel( coupled +ith data /odels and file structures. as opposed to the ,er(
tight. ph(sical orientation pre,iousl( used.
This ne+ t(pe of +arehouse 5 the 67 5 is an infor/ation /anage/ent and access
engine that ta&es an architectural approach +hich de4e/phasi-es repositories in fa,or of
ne+ guidelines!
8 The 67 follo+s a se/antic directi,e to orchestrate the consolidation and sharing of
infor/ation assets. as opposed to one that focuses e1clusi,el( on storing integrated
datasets.
8 The se/antics are descri*ed *( go,ernance rules fro/ data creation and use case
*usiness processes in a data /anage/ent la(er. instead of ,ia a negotiated. static
transfor/ation process located +ithin indi,idual tools or platfor/s.
8 Integration le,erages *oth stead(4state data assets in repositories and ser,ices in a
fle1i*le. audited /odel ,ia the *est a,aila*le opti/i-ation and co/prehension
solution a,aila*le.
Ending the Era of Defi"ient Co#pro#ise
So/e +ould sa( that the result of co/pro/ise is that e,er(one is e3uall( unhapp(. The
ne+ data +arehouse is e1pected to /eet all pre,ious data +arehouse ser,ice4le,el
e1pectations and to deli,er all the originall( intended *enefits of a +arehouse or
integration platfor/ 5 *ut +ithout an( artificial li/itations *ased on use cases or deficient
technolog(. At the sa/e ti/e. the ne+ +arehouse /ust integrate ,er( non4traditional
infor/ation assets.
<,er( data +arehouse is deplo(ed essentiall( to /eet specific ser,ice4le,el e1pectations
for the deli,er( and /anage/ent of data. These e1pectations ha,e *een /et using a +ide
,ariet( of architectures and approaches. The *asic pre/ise *ehind the ne+ data
+arehouse is that it +ill co/*ine the strengths of e,er( engineering approach pre,iousl(
used to create a ,ariet( of architectural st(les into a ne+ /odel that supports eas(
s+itching *et+een st(les or a h(*rid of di,erse deli,er( approaches. <1isting architectures
/ust *e altered radicall( to /eet these ne+ de/ands.
There are /an( co/ponents and e1pectations associated +ith each of the traditional
+arehouse approaches. 'ut for each of the traditional approaches. there is a principal
ser,ice e1pectation. a pri/ar( design dri,er and so/e predo/inant li/itation (other+ise
Page 2 of 10 Print Document
15/01/2014 http://my.gartner.com/portal/server.pt/gateway/PT!"#$0$2%%&5&'$'5'$25&$2'50...
alternati,es +ould not ha,e *een necessar(). Ta*le 1 co/pares traditional data +arehouse
architectures and the 67.
$a%&e 1) Data Warehouse (r"hite"ture *rin"ip&es+ ,ervi"e Drivers and
*ri#ary 'i#itations
Warehouse
(r"hite"ture
*rin"ipa& ,ervi"e-
'eve& E-pe"tation
*ri#ary Design Driver *ri#ary 'i#itations
Centra&i.ed
repository/
Dor/ali-ed or
slightl(
denor/ali-ed
data in a single
data*aseE the
traditional
0enterprise data
+arehouse.0
Integrate and a*stract
data for reuse in
anal(tics. or ser,e as a
data4sharing platfor/
for transactional
s(ste/s.
Deed to resol,e si/ilar
or the sa/e data that
+as designed and
deplo(ed in different
applications and
s(ste/s. *ecause in
those s(ste/s the data
+as designed specificall(
to support transactional
roles.
8 Aerfor/ance
opti/i-ation is often
difficult due to the
/ore nor/ali-ed
nature of the data.
8 ?o/prehension and
usage *arriers arise
due to usersF lac& of
fa/iliarit( +ith third
nor/al for/
approaches.
8 Inherited data
go,ernance fro/
authoring
applications /a&es
ongoing
rationali-ation and
e1tensions difficult.
ederated
#arts/
%ultiple indi,idual
data /odels +ith
9oin ta*les or
,ie+s of selected
infor/ation
deplo(ed in one
or /ore
data*ases.
Isolate cost and
deplo(/ent for rapid
deplo(/ent. +hile
producing /ore
co/prehensi*le
reports in a short ti/e
4to4deli,er( /odel.
The de/and for d(na/ic
reporting +ithin a +ell4
descri*ed *usiness
process. *ased on one
*usiness process or
*usiness unitFs specific
infor/ation go,ernance
de/ands. <na*les
anal(sis *( drilling do+n
into +ell4organi-ed
reports.
8 Aerpetuates
parochial definitions
and data design.
/erel( deferring
costs for
rationali-ation of
so/eti/es
inco/pati*le data
/odels.
8 )orces /ultiple
/aintenance points
+ithout actuall(
integrating disparate
data.
0irtua&
warehouse/
A ,ie+ or
se/antic la(er
o,er the top of
transactional
s(ste/s data.
usuall( +ithout a
dedicated
repositor( *ut
so/eti/es using
cache
technolog(.
Aer/it the a*straction
of disparate /odels
fro/ disparate
locations +ithout
actuall( /o,ing the
data.
Allo+ for consolidated
reporting across /ultiple
s(ste/s +ithout ha,ing
to add to the storage
en,iron/ent. +hile also
a,oiding significant
additions to the
co/pute>processing
en,iron/ent or ser,er
de/ands.
8 ependent on
e1ternal li/itations
for data ,olu/e.
net+or& capacit(
and source
a,aila*ilit(.
8 Aressured *( desired
end4user and
application
connections. Often
disrupted *(
do+nti/e fro/
these issues.
8 <,en the *est4
designed ,irtual
+arehouse often has
to resort to so/e
for/ of ph(sicall(
stored cache.
1u%-and-spo2e
array/
Su//aries.
aggregates and
e,en ,ariants of
si/ilar
di/ensions. all
deri,ed fro/ a
central repositor(
of transfor/ed.
re/odeled and
relocated
transactional
data. A second
,ariant of the
Aro,ide for integration
of designated su*sets
of data. +hile
deli,ering high4
perfor/ance and
co/prehension4
opti/i-ed data access.
The desire for /ultiple
renderings of the sa/e
data for different use
cases. each opti/i-ed
for perfor/ance.
8 Ti/e4to4deplo(/ent
re3uires phased
rollout. and poor
planning of initial
rollouts often forces
a radical redesign
t+o to fi,e iterations
later.
8 Sa/e issues as for
the centrali-ed
repositor(.
Page ' of 10 Print Document
15/01/2014 http://my.gartner.com/portal/server.pt/gateway/PT!"#$0$2%%&5&'$'5'$25&$2'50...
,our"e/ 3artner 4(ugust 25116
,ervi"e 'eve& and Benefit E-pe"tations 7 !evisited
<,er( data +arehouse is e1pected to /eet +ell4esta*lished and persistent ser,ice4le,el
e1pectations as part of industr( *est practices in order to deli,er the desired *enefits of
deplo(ing that +arehouse. In the past. /an( of the architectural. design and engineering
approaches used to deplo( +arehouses e3uated to a series of co/pro/ises that fa,ored
so/e of these 0ser,ice contracts0 to the detri/ent of others. or e,en sacrificed so/e
re3uire/ents due to ti/e4to4/ar&et pressures. )igure 1 su//ari-es the ser,ices contracts
of a data +arehouse (see Dote 1).
igure 1) ,u##ary of ,tandard Data Warehouse ,ervi"e Contra"ts
,our"e/ 3artner 4(ugust 25116
The ne+ +arehouse has the sa/e ser,ice e1pectations. *ut is not specificall( a repositor(
and it no+ includes a series of infor/ation /anage/ent ser,ices. So. +hat precisel( is the
ne+ architectural for/G
( Co#%ined ,ervi"es and 8nfor#ation (sset Manage#ent *&atfor#
The 67 incorporates *est practices for ser,ice4oriented architecture. infor/ation
go,ernance. data +arehouses and infor/ation /anage/ent. It shifted the de*ate and the
focus of data +arehouse design fro/ choosing *et+een fi1ed i/ple/entation and
architectural st(les to appl(ing *est practices in /ultiple IT deli,er( areas for the /ost
appropriate use.
The 0old0 data +arehouse usuall( fa,ored one specified engineering approach. often using
procedural processing to e1tract data fro/ designated repositories. ,alidating the
transfor/ations against so/e+hat static *usiness rules. and then loading the data. )or
e1a/ple. traditional e1traction. transfor/ation and loading (<T6) identifies the ta*le and
colu/n +here the source data is and /o,es it in so/e t(pe of processing strea/ to a
target. The for/at and content is &no+n at *oth source and target 5 such as +hen
Warehouse
(r"hite"ture
*rin"ipa& ,ervi"e-
'eve& E-pe"tation
*ri#ary Design Driver *ri#ary 'i#itations
0enterprise data
+arehouse.0
'ogi"a& data
warehouse
?o/*ine the *enefits
of pre,ious
approaches in a 0*est
fit0 architecture. Add
support for distri*uted
data assets and
parallel distri*ution of
processing
re3uire/ents +ith
predicta*le and
repeata*le results.
+hile continuing to
support data
centrali-ation +hen
appropriate. Support
all pre,ious for/s of
data +arehouse
architecture +ith eas(
s+itching *et+een
data /anage/ent and
deli,er( st(les.
The need to account for
the reuse of infor/ation
transfor/ation. 3ualit(
and access ser,ices.
regardless of
infor/ation>data
for/ats or locations. to
support data
redistri*ution or
anal(tics. Also. the need
to support ne+ and
di,erse data t(pes at
the sa/e ti/e.
8 %ore of a *arrier
than a li/itation.
gi,en that e1isting
+arehouse platfor/s
and architectures
+ere designed +ith
centrali-ed
processing as an
underl(ing
assu/ption 5 e,en
for federated
approaches 5 and
that e1isting
se/antics and data
processing code +ill
*e difficult to adapt
and reuse.
Page 4 of 10 Print Document
15/01/2014 http://my.gartner.com/portal/server.pt/gateway/PT!"#$0$2%%&5&'$'5'$25&$2'50...
/o,ing 0first na/e0 in a custo/er ta*le to 0gi,en na/e0 in a +arehouse. The 3ualit( rules
/ight e,en *e coded directl( into the <T6 s(ste/. '( contrast. the 67 ta&es a data
ser,ices approach to /anaging these ,arious re3uire/ents.
A data ser,ices approach separates data access fro/ processing. processing fro/
transfor/ation. and transfor/ation fro/ deli,er(. In a data ser,ices approach. the pieces
are +ritten separatel( to ena*le fle1i*le 9o* flo+s and easil( coupled processing. 6etFs
assu/e. for e1a/ple. that there are se,en sources for 0gi,en na/e.0 One le,el of ser,ices
+ould /anage the connection string. Another le,el +ould read the /etadata ta*le
indicating that in three of the s(ste/s the colu/n desired is na/ed 0fna/eH2C.0 in t+o of
the s(ste/s it is listed as 0cusHfirstna/e0 and in another t+o s(ste/s 0na/eHgi,en0 and
0na/enerst/al.0 All of these are e3ui,alent to 0gi,enHna/e0 and therefore su*9ect to the
sa/e data 3ualit( rule. So the ser,ice that accesses the data passes each of the/ to one
co//on 3ualit( ser,ice. Then. after the co/pletion of 3ualit( operations. the data is
passed on to a deli,er( ser,ice. Sa(. ho+e,er. that one of the targets is a data +arehouse
that needs an insert ser,ice to a data*ase /anage/ent s(ste/ ('%S). that another
deli,er( location is an I%6 /essage +hich needs I%6 structure around it. and that a third
deli,er( location is an application +hich needs to +rite data to an arra( or cursor. etc. It
+ould then *e possi*le to +rite code so that the I%6 is al+a(s created and additional
ser,ices render the insert and the arra( *uild. or the three deli,er( functions (insert. I%6
and arra() could *e +ritten as three ser,ices. 'ased partl( (or no/inall() on the reuse
rate of a function. it +ould *e necessar( to code that function in a loosel( coupled fashion
or a tightl( coupled procedure.
The 67 participates in. and is a *eneficiar( of. a +ider infor/ation capa*ilities ser,ices4
st(le approach (see 0Infor/ation %anage/ent in the 21st ?entur(.0). In a 21st4centur(
infor/ation /anage/ent architecture. the ne+ +arehouse participates in an infor/ation
capa*ilities fra/e+or& (I?))! see 0The Infor/ation ?apa*ilities )ra/e+or&! An Aligned
Jision )or Infor/ation Infrastructure.0
Since a data +arehouse ser,es pri/aril( as a rationali-ation and integration engine. it is
e1pected to perfor/ /ost of its infor/ation /anage/ent duties using data /anage/ent
,er*s that integrate and organi-e data. Additionall(. the +arehouse is e1pected to deli,er
integrated infor/ation in an opti/i-ed fashion. supporting *oth co/prehension and
perfor/ance. The ne+ +arehouse. therefore. /ust 0decide0 +hen a consolidated
repositor( or a transient (,irtual) st(le of deli,er( is appropriate. Organi-ation +ill ta&e
place at t+o le,els 5 first putting infor/ation assets together and then deter/ining
+hether a su//ar(>aggregated dataset is the *est organi-ation approach for an end4use
case.
An I?) specifies that. regardless of ho+ an application or repositor( is designed. the
infor/ation /anage/ent approach used is e1pected to perfor/ duties and ser,ices fro/
+ell4esta*lished categories of infor/ation /anage/ent functions (see )igure 2). )urther.
infor/ation itself is treated as an o*9ect +ith ,alue. integrit( and rules of *eha,ior. So/e
of these rules can *e deplo(ed as logical policies that are enacted against an( asset t(pe.
as long as the actual content is the sa/e. )or e1a/ple. a personFs na/e is his or her na/e
regardless of +hether it is in a data*ase. a docu/ent or spo&en in an audio clip. The 67
architect si/pl( deter/ines +here each of these ,er* classes +ill *e pro,ided 5 in a
data*ase. on a ser,ices *us. in a ,ie+ la(er and so on. I/portantl(. an <7 uses a
0dedicated0 se/antic st(le onl(. *ut an 67 uses all se/antic st(les *ased on +hich is
/ost appropriate for the applica*le S6A.
igure 2) 8nfor#ation Capa%i&ities ra#ewor2 Manage#ent and ,e#anti"
,ervi"es Categories
,our"e/ 3artner 4(ugust 25116
$he 'ogi"a& Data Warehouse (r"hite"ture
In a ser,ices4oriented approach to data /anage/ent it is i/perati,e to understand that
nothing is re3uired to e1ecute in the sa/e order e,er( ti/e. Orchestration of ser,ices can
*e declared or d(na/ic. eclared orchestration e1ecutes al/ost as /odulari-ed procedural
code. the difference *eing that the ser,ices are free4standing operations and can also *e
called *( other co/posite processes to occur in a different order. (na/ic orchestration
reacts to /etadata instructions that are often recei,ed as audits of the en,iron/ent. )or
Page 5 of 10 Print Document
15/01/2014 http://my.gartner.com/portal/server.pt/gateway/PT!"#$0$2%%&5&'$'5'$25&$2'50...
example, an analytic query that anticipates putting multiple sources together could get the
information from an integrated repository or from a federated view; it might decide which
is best by comparing the latency of the integrated repository data with the intention of the
querying user to capture newer or older data.
Within an ICF, the data warehouse, lie any other use case, must determine a primary
semantic !entry point! to begin using a services architecture. For the data warehouse this
primary entry point is defined by the primary service contract, to deliver a consolidated
view of disparate data in optimal fashion. " warehouse needs to access sources and deliver
that consolidated view.
#herefore, the $%W is designed, first and foremost, using a combination of services and
physical data repositories. &econdly, it can be designed with a focus on declared or
dynamic orchestration. Finally, it is possible to design some of the $%W using any
combination of physical repositories, virtual data ob'ects, declared orchestration or
dynamic orchestration. It is also possible to begin with a physical repository approach with
highly dedicated, declared access, and then evolve slowly toward more dynamic and mixed
data delivery approaches.
Evolving Toward the Logical Data Warehouse
#raditional data warehouses and (I environments have a fairly consistent architecture )see
Figure *+. &ome of the capabilities are on different platforms, but there is primarily a
unidirectional flow of data toward one set of new models and data governance rules.
Figure 3. Traditional Data Warehouse and Business Intelligence
Infrastructure
(I , business intelligence; %(-& , database management system; %W , data warehouse; .#$ ,
extraction, transformation and loading; $%"/ , $ightweight %irectory "ccess /rotocol; 0%& ,
operational data store; 0$"/ , online analytical processing; 1%(-& , relational database
management system
Source: Gartner !ugust "#$$%
If we assume an initial state with a traditional data warehouse, the following points most
liely apply2
3 4ou already have a data integration process with !describe! and !organi5e! functions
that specify both the source and target states of the data. #hey may or may not be
deployed as modular code or metadata that drives the process.
3 4ou already have functions that resolve differences between the governance rules of
sources and your warehouse target. 4ou also have integration processes that
resolve formatting issues.
3 4ou have some implementation rules 6 sometimes embedded at design time,
sometimes deployed when ready for runtime.
Page 6 of 10 Print Document
15/01/2014 http://my.gartner.com/portal/server.pt/gateway/P!"#$%0%2&&656'%'5'%256%2'50...
3 4ou may or may not have auditing capabilities built into your processing )such as for
profiling, record counts of completed versus dropped transformations and data
quality !outs!+. 7owever, they are probably designed for permanent use of a
combined !consolidation! and !dedicated! semantic layer. In other words, they are
probably procedural and not dynamic )at least not without returning to the design
tools and redeploying+.
3 4our existing orchestration is most liely not dynamic 6 and, unless you are using a
virtual warehouse strategy, the concept of using registries for data sources and
target ob'ects is most liely nonexistent.
3 #here is probably little or no ability to use other repositories as information assets in
query responses 6 such external assets are either loaded directly during a
transformation8and8load process or loaded from one of your source systems )as
with postal data added to an .1/ system and then relayed to the warehouse+.
$et9s assume that, instead of accepting this unidirectional, static orchestration, you want to
develop an $%W. #o do this, you start by introducing the $%W concepts used for dynamic
consolidation, integration and implementation, as depicted in Figure : )note that the
diagram uses today9s terminology 6 transformation using !.#$;.$#,! !federation! and so
on 6 but that these concepts are deconstructed, for evolution toward a modern
architecture, in !Information -anagement in the <=st Century! +2
3 #he data integration process can be broen into sourcing, collation, data quality,
formatting and domain governance segments, based on information availability and
governance rules. For example, the sourcing;extraction process can be a registry
semantic layer using !describe! verbs that tell the service !where! the data is. If
data for !person! is located in documents, clicstreams and enterprise systems, one
service can use textual analysis and search for documents, another service can use
-ap1educe to read massive volumes of tags in !clics,! and a traditional native
driver access approach can pull data from the enterprise system database. " data
quality process can then verify the wor done by each service and undertae an
enrichment and;or value substitution process, before prepping the data for delivery.
If the data is dynamic and constantly changing, the data integration process can
deliver a virtual data ob'ect, but if the data is already validated by a master data
management process and fairly static, it can be loaded into a table or file. " final
service can determine the appropriate load or access format and put the data into
that format.
3 In relation to latency issues, you are no longer bound by load restrictions. It is
possible to indicate in a metadata layer that there are different requirements for
different analytic end8use cases. For example, one department may require higher8
quality data but tolerate higher8latency delivery )it would get data from fully
validated tables+, while another department might be prepared to ris
inconsistencies in data but require low latency )it would get a combined8registry
delivery of yesterday9s data in the tables with today9s data from the 0$#/ system 6
!dirty but fast!+. 0r, instead of this fixed approach, you could have a service that
negotiates whether the quality &$" is being met for each of the departments and
switches between strategies dynamically. For example, the department requiring
low latency might receive data from the warehouse repository in the morning, after
the previous night9s load had brought everything up to date, but in the afternoon it
might receive a composite view. "nd, instead of switching at a predetermined time
of day, the switch would be based on how far out of synchroni5ation the two sources
are, based on record counts and data quality ratings.
3 " dynamic service that determines when to write summary or aggregate data is
generally faster than one that performs a query8time summary of detailed rows. It
could even switch on the basis of C/> and storage utili5ation;performance audits,
and change its approach throughout the day. It could also switch dynamically
between approaches on the basis of system audits that determine whether more
memory is added for caching, or even if it is worthwhile to perform caching.
3 "dding external data based on services written to read and analy5e those data
sources also becomes easier. For example, adding operational8technology data such
as the millions of records generated each day by 1FI%8enabled supply chain
management tracing systems or utility smart grid meters requires massive data
integration processing in a procedural manner when using traditional warehouses
(ut by developing two or three variants of the same -ap1educe function, the $%W
can orchestrate the preferred approach for different analyst audiences and leave the
data in the source or the historian software )see !7istorian &oftware and the &mart
?rid2 @uestions and -isconceptions!+.
Aote that with the $%W approach, the differing styles of support, such as federation, data
repositories, messaging and reductions, are not mutually exclusive. #hey are 'ust
manifestations of data delivery. #he focus is on getting the data first, then figuring out the
delivery approach that best achieves the &$" with the querying application. #he
transformations occur in separate services.
Figure &. Services'(riented !nal)tics Infor*ation +anage*ent
Page & of 10 Print Document
15/01/2014 http://my.gartner.com/portal/server.pt/gateway/P!"#$%0%2&&656'%'5'%256%2'50...
.#$ , extraction, transformation and loading; .$# , extraction, loading and transformation
Source: Gartner !ugust "#$$%
,ow E-isting Technolog) .an Fit In
Aote that Figure : does not argue for specific technologies to perform each approach. #his
is because multiple engineered solutions can be used to deliver the same architecture and
design, as noted in the two different design scenarios2
=. /se a BI 0latfor* and DB+S stac1. While tending toward a more dedicated
semantic, a (I platform deployed in tandem with a dedicated %(-& can deliver the
entire approach. For example, the (I platform could negotiate with the %(-& when
to use a table as opposed to a federated view of data. (ut any form of dynamic
approach to using federation, materiali5ed views or tables would have to be
leveraged by the %(-& optimi5er 6 and all the options would have to be
maintained in the database. 0f course, some semantic layers in (I platforms fail to
properly combine platform optimi5ation with %(-& optimi5ation, while others can
accomplish this tas, and still others are improving. #his is one disadvantage of an
engineering approach to !use what is available,! instead of !designing to purpose.!
<. /se an enter0rise service 2us ESB%3 data integration tools and DB+S. "n
.&( can define discrete services or register services provided by the data
integration tool )which becomes a development worbench with orchestration
occurring in the .&(+. #he %(-& wors in its usual fashion 6 optimi5ing for view
and table use to respond to queries )by maintaining cubes, views, indices, etc.+. In
addition, the %(-& or .&( could manage external calls to nondatabase types of
information as service calls to other application services, or by orchestrating calls to
the functions of other tools or repositories. #his could even include calls to content
management systems, sentiment analysis tools and text analysis tools.
In addition, many data integration tool vendors support variations of this infrastructure to
some degree. %atabase vendors support capabilities to deploy access to external
information assets and even to externally managed parallel distributed processes.
#he main shortcoming of an approach that uses existing technologies is the inability to
integrate data management with business process management. " business process
management tool could add the ability to lin analytics data sources with analytics
processing, and then provide the results to an operational application for use in on8
demand analytics. #he ability to lin process management with analytics is the first step in
a /attern8(ased &trategy.
Page 8 of 10 Print Document
15/01/2014 http://my.gartner.com/portal/server.pt/gateway/P!"#$%0%2&&'5'(%(5(%25'%2(50...
4eco**ended 4eading
Some documents may not be available as part of your current Gartner subscription.
!#he &tate of %ata Warehousing in <B==!
!-agic @uadrant for %ata Warehouse %atabase -anagement &ystems!
!"nalytics and $earning #echnology2 CI0s, C#0s &hould 1ethin "rt of the /ossible!
!-agic @uadrant for %ata Integration #ools!
!%ata "rchitectures to &upport /erformance -anagement "pplications!
!-agic @uadrant for %ata @uality #ools!
!7ype Cycle for %ata -anagement, <B==!
!"pplying ?artner9s /ace $ayer -odel to 7uman Capital -anagement!
!Cool Cendors in %ata -anagement and Integration, <B==!
Strategic 5lanning !ssu*0tions
3 (y <B=:, DEF of organi5ations will fail to deploy new strategies to address data
complexity and volume in their analytics.
3 0rgani5ations which fail to deploy strategies to address data complexity and volume
issues for their analytics by <B=< will experience more than doubling costs of ownership
for their data warehouse and mart environments in disorgani5ed attempts to meet this
new demand.
3 (y <B=:, organi5ations which have deployed analytics to support new complex data
types and large volumes of data in analytics will outperform their maret peers by more
than <BF in revenue, margins, penetration and retention.
6ote $
,ow 7!2le7 Is 8our Data Warehouse9
-any organi5ations recogni5e that best practices demand a data warehouse that provides
for sub'ect8oriented, integrated, consistent and time8variant data for critical corporate data.
#he overall architecture of the warehouse can achieve these ob'ectives by adhering to six
basic architectural principles.
%ata warehouses should be2
3 E-tensi2le. It should be easy to add more data sources or to change data sources
during the life of the data warehouse.
3 Fle-i2le. #he data warehouse should be modeled to a level of abstraction that supports
modifications to the data model as more data sub'ect areas are added.
3 4e0eata2le. %ata warehouses should provide consistent, predictable query response
times; as a result, they may themselves introduce redundancy as needed.
3 4eusa2le. %ata in the warehouse should be fully qualified to allow multiple departments
to use it in a variety of contexts. #his relates to the abstraction rules in the data model,
and to the data integration transformation rules that consolidate and collate data to
support the introduction of commonly held data enrichment and cleansing rules.
3 Scala2le. #he data warehouse must be able to support more rows of data, and the data
architecture must account for storage of and access to data, as well as its archive and
retirement.
3 !vaila2le. #he data warehouse must be able to operate in virtually nonstop mode, with
provisions for reconfiguration, migration, bacup, data insertion and performance
optimi5ation.
#hese !8ables,! which were originally conceived as a group by other analysts, have existed
for years. 7owever, many organi5ations have attempted to achieve all six in a single data
architecture tier, an approach that has proved untenable in the end8user maret. It is best
to thin of these six expectations as clauses in a service contract 6 which the warehouse is
expected to fulfill.
6ote "
Gartner:s I.F Definitions
?artner9s infor*ation ca0a2ilities fra*ewor1 I.F% is the collection of technical
capabilities required to create business value from information assets. It is a conceptual
model that is people, process and technology independent and allows I# leaders to thin
holistically about the capabilities required to describe, organi5e, integrate, share and govern
information in an application8independent manner. It is independent of use case and
Page 9 of 10 Print Document
15/01/2014 http://my.gartner.com/portal/server.pt/gateway/P!"#$%0%2&&'5'(%(5(%25'%2(50...
information source and does not rely on, nor advocate, any technology or architectural style.
7owever, it does tae into account the specifics of use cases.
"n 7infor*ation ca0a2ilit)7 is a representation of the actions needed for the information
to be used, treated, organi5ed or developed for the general management of, and for specific
purposes throughout, the organi5ation.
"n 7infor*ation use case7 represents the usage of information throughout the
organi5ation to create business value.
#he ICF9s co**on ca0a2ilitiesla)er provides the range of functionalities used to describe,
organi5e, integrate, share and govern the information, and the capabilities required to
interact with physical data stores )operate+, to prepare the information for consumption
)provision+ and to increase the value of the information by maing it more easily used and
found, and by providing context )enrich+.
#he ICF9s infor*ation se*antic st)lesla)er provides the specific entry or !gate! into
information management functions or capabilities. #hese services follow styles or
approaches that support specific assumptions on how an application interacts with the data
it uses.
#he ICF9s s0eciali;ed ca0a2ilities la)er deals with the range of functionalities used to
support use8case8specific requirements.
G <B== ?artner, Inc. and;or its "ffiliates. "ll 1ights 1eserved. 1eproduction and distribution of this publication
in any form without prior written permission is forbidden. #he information contained herein has been obtained
from sources believed to be reliable. ?artner disclaims all warranties as to the accuracy, completeness or
adequacy of such information. "lthough ?artner9s research may discuss legal issues related to the information
technology business, ?artner does not provide legal advice or services and its research should not be
construed or used as such. ?artner shall have no liability for errors, omissions or inadequacies in the
information contained herein or for interpretations thereof. #he opinions expressed herein are sub'ect to
change without notice.
Page 10 of 10 Print Document
15/01/2014 http://my.gartner.com/portal/server.pt/gateway/P!"#$%0%2&&'5'(%(5(%25'%2(50...

Das könnte Ihnen auch gefallen