Sie sind auf Seite 1von 12

112 DATA WAREHOUSI NG, OLAP AND DATA MI NI NG

Data to be backed up. I denti fy the data that must be backed up on a regul ar
basi s. Thi s gi ves an i ndi cati on of the regul ar backup si ze. Asi de from warehouse
data and metadata, the team mi ght al so want to back up the contents of the
stagi ng or de-dupl i cati on areas of the warehouse.
Batch window of the warehouse. Backup mechani sms are now avai l abl e to
support the backup of data even when the system i s onl i ne, al though these are
expensi ve. I f the warehouse does not need to be onl i ne 24 hours a day, 7 days a
week, deter mi ne the maxi mum al l owabl e down ti me for the war ehouse (i .e.,
determi ne i ts batch wi ndow). Part of that batch wi ndow i s al l ocated to the regul ar
warehouse l oad and, possi bl y, to report generati on and other si mi l ar batch jobs.
Determi ne the maxi mum ti me peri od avai l abl e for regul ar backups and backup
veri fi cati on.
Maximum acceptable time for recovery. I n case of di sasters that resul t i n the
l oss of warehouse data, the backups wi l l have to be restored i n the qui ckest way
possi bl e. Di fferent backup mechani sms i mpl y di fferent ti me frames for recovery.
Determi ne the maxi mum acceptabl e l ength of ti me for the warehouse data and
metadata to be restored, qual i ty assured, and brought onl i ne.
Acceptable costs for backup and recovery. Di fferent backup mechani sms i mpl y
di fferent costs. The enterpri se may have budgetary constrai nts that l i mi t i ts backup
and recovery opti ons.
Al so consi der the fol l owi ng when sel ecti ng the backup mechani sm:
Archive format. Use a standard archi vi ng format to el i mi nate potenti al recovery
probl ems.
Automatic backup devices. Wi thout these, the backup medi a (e.g., tapes) wi l l
have to be changed by hand each ti me the warehouse i s backed up.
Parallel data streams. Commerci al l y avai l abl e backup and recovery systems now
support the backup and recovery of databases through paral l el streams of data i nto
and from mul ti pl e removabl e storage devi ces. Thi s technol ogy i s especi al l y hel pful
for the l arge databases typi cal l y found i n data warehouse i mpl ementati ons.
Incremental backups. Some backup and recovery systems al so support i ncremental
backups to reduce the ti me requi red to back up dai l y. I ncremental backups archi ve
onl y new and updated data.
Offsite backups. Remember to mai ntai n offsi te backups to prevent the l oss of
data due to si te di sasters such as fi res.
Backup and recovery procedures. Formal l y defi ne and document the backup
and r ecover y pr ocedur es. Per for m r ecover y pr acti ce r uns to ensur e that the
procedures are cl earl y understood.
7.6 SET UP COLLECTION OF WAREHOUSE USAGE STATISTICS
Warehouse usage stati sti cs are col l ected to provi de the data warehouse desi gner wi th
i nputs for further refi ni ng the data warehouse desi gn and to track general usage and
acceptance of the warehouse.
WAREHOUSE MANAGEMENT AND SUPPORT PROCESSES 113
Defi ne the mechani sm for col l ecti ng these stati sti cs, and assi gn resources to moni tor
and revi ew these regul arl y.
In Summary
The capaci ty pl anni ng process and the i ssue tracki ng and resol uti on process are cri ti cal
to the successful devel opment and depl oyment of data warehouses, especi al l y duri ng earl y
i mpl ementati ons.
The other management and support processes become i ncreasi ngl y i mportant as the
warehousi ng i ni ti ati ve progress further.
114 DATA WAREHOUSI NG, OLAP AND DATA MI NI NG
114
The data warehouse pl anni ng approach presented i n thi s chapter descri bes the acti vi ti es
rel ated to pl anni ng one rol l out of the data warehouse. The acti vi ti es di scussed bel ow bui l d
on the resul ts of the warehouse strategy formul ati on descri bed i n Chapter 6.
Data warehouse pl anni ng further detai l s the prel i mi nary scope of one warehouse rol l out
by obtai ni ng detai l ed user requi rements for queri es and reports, creati ng a prel i mi nary
warehouse schema desi gn to meet the user requi rements, and mappi ng source system fi el ds
to the warehouse schema fi el ds. By so doi ng, the team gai ns a thorough understandi ng of
the effort requi red to i mpl ement that one rol l out.
A pl anni ng project typi cal l y l asts between fi ve to ei ght weeks, dependi ng on the scope
of the rol l out. The progress of the team vari es, dependi ng (among other thi ngs) on the
parti ci pati on of enterpri se resource persons, the avai l abi l i ty and qual i ty of source system
documentati on, and the rate at whi ch project i ssues are resol ved.
Upon compl eti on of the pl anni ng effor t, the team moves i nto data war ehouse
i mpl ementati on for the pl anned rol l out. The acti vi ti es for data warehouse i mpl ementati on
are di scussed i n Chapter 9.
8.1 ASSEMBLE AND ORIENT TEAM
I denti fy al l parti es who wi l l be i nvol ved i n the data warehouse i mpl ementati on and
bri ef them about the project. Di stri bute copi es of the warehouse strategy as background
materi al for the pl anni ng acti vi ty.
Defi ne the team setup i f a formal project team structure i s requi red. Take the ti me and
effort to ori ent the team members on the rol l out scope, and expl ai n the rol e of each member
of the team. Thi s approach al l ows the project team members to set real i sti c expectati ons
about ski l l sets, project workl oad, and project scope.
Assi gn project team members to speci fi c rol es, taki ng care to match ski l l sets to rol e
responsi bi l i ti es. When al l assi gnments have been compl eted, check for unavoi dabl e trai ni ng
requi rements due to ski l l -rol e mi smatches (i .e., the team member does not possess the
appropri ate ski l l sets to properl y ful fi l l hi s or her assi gned rol e).
DATA WAREHOU$E FLANNNG
8
CHAFTER
DATA WAREHOUSE PLANNI NG 115
I f requi red, conduct trai ni ng for the team members to ensure a common understandi ng
of data warehousi ng concepts. I t i s easi er for everyone to work together i f al l have a common
goal and an agreed approach for attai ni ng i t. Descri be the schedul e of the pl anni ng project
to the team. I denti fy mi l estones or checkpoi nts al ong the pl anni ng project ti mel i ne. Cl earl y
expl ai n dependenci es between the vari ous pl anni ng tasks.
Consi deri ng the short ti me frame for most pl anni ng projects, conduct status meeti ngs
at l east once a week wi th the team and wi th the project sponsor. Cl earl y set objecti ves for
each week. Use the status meeti ng as the venue for rai si ng and resol vi ng i ssues.
8.2 CONDUCT DECISIONAL REQUIREMENTS ANALYSIS
Deci si onal Requi rements Anal ysi s i s one of two acti vi ti es that can be conducted i n
paral l el duri ng Data Warehouse Pl anni ng; the other acti vi ty bei ng Deci si onal Source System
Audi t (descri bed i n the next secti on). The object of Deci si onal Requi rements Anal ysi s i s to
gai n a thorough understandi ng of the i nformati on needs of deci si on-makers.
TOP-DOWN
User Requirements
Decisional Requirements Analysis is Working Top-Down
Deci si onal requi rements anal ysi s represents the top-down aspect of data warehousi ng.
Use the warehouse strategy resul ts as the starti ng poi nt of the deci si onal requi rements
anal ysi s; a prel i mi nary anal ysi s shoul d have been conducted as part of the warehouse
strategy formul ati on.
Revi ew the i ntended scope of thi s warehouse rol l out as documented i n the warehouse
str ategy document. Fi nal i ze thi s scope by fur ther detai l i ng the pr el i mi nar y deci si onal
requi rements anal ysi s. I t wi l l be necessary to revi si t the user representati ves. The rol l out
scope i s typi cal l y expressed i n terms of the queri es or reports that are to be supported by
the warehouse by the end of thi s rol l out. The project sponsor must revi ew and approve the
scope to ensure that management expectati ons are set properl y.
Document any known l i mi tati ons about the source systems (e.g., poor data qual i ty,
mi ssi ng data i tems). Provi de thi s i nformati on to source system audi tors for thei r confi rmati on.
Veri fi ed l i mi tati ons i n source system data are used as i nputs to fi nal i zi ng the scope of the
rol l outi f the data are not avai l abl e, they cannot be l oaded i nto the warehouse.
Take note that the scope strongl y i nfl uences the i mpl ementati on ti me frame for thi s
rol l out. Too l arge a scope wi l l make the project unmanageabl e. As a general rul e, l i mi t the
scope of each project or rol l out so that i t can be del i vered i n three to si x months by a ful l -
ti me team of 6 to 12 team members.
116 DATA WAREHOUSI NG, OLAP AND DATA MI NI NG
Conducting Warehouse Planning Without a Warehouse Strategy
I t i s not unusual for enterpri ses to go di rectl y i nto warehouse pl anni ng wi thout previ ousl y
formul ati ng a warehouse strategy. Thi s typi cal l y happens when a group of users i s cl earl y
dri vi ng the warehouse i ni ti ati ve and are more than ready to parti ci pate i n the i ni ti al rol l out
as user representati ves. More often than not, these users have al ready taken the i ni ti ati ve
to l i st and pri ori ti ze thei r i nformati on requi rements.
I n thi s type of si tuati on, a number of tasks from the strategy formul ati on wi l l have to be
conducted as part of the pl anni ng for the fi rst warehouse rol l out. These tasks are as fol l ows:
Determine organizational context. An understandi ng of the organi zati on i s
al ways hel pful i n any warehousi ng project, especi al l y si nce organi zati onal i ssues
may compl etel y derai l the warehouse i ni ti ati ve.
Define data warehouse rollouts. Al though busi ness users may have al ready
predefi ned the scope of the fi rst rol l out, i t hel ps the warehouse archi tect to know
what l i es ahead i n subsequent rol l outs.
Define data warehouse architecture. Defi ne the data warehouse archi tecture
for the current rol l out (and i f possi bl e, for subsequent rol l outs).
Evaluate development and production environment and tools. The strategy
formul ati on was expected to produce a short-l i st of tool s and computi ng envi ronments
for the warehouse. Thi s eval uati on wi l l be fi nal i zed duri ng pl anni ng by the actual
sel ecti on of both envi ronments and tool s.
8.3 CONDUCT DECISIONAL SOURCE SYSTEM AUDIT
The deci si onal source system audi t i s a survey of al l i nformati on systems that are
current or potenti al sources of data for the data warehouse.
A prel i mi nary source system audi t duri ng warehouse strategy formul ati on shoul d provi de
a compl ete i nventory of data sources. I denti fy al l possi bl e source systems for the warehouse
i f thi s i nformati on i s currentl y unavai l abl e.
Source Systems
External Data
BOTTOM-UP
Data Sources can be Internal or External
Data sources are pri mari l y i nternal . The most obvi ous candi dates are the operati onal
systems that automate the day-to-day busi ness transacti ons of the enterpri se. Note that
DATA WAREHOUSE PLANNI NG 117
asi de from transacti onal or operati onal processi ng systems, one often-used data source i n
the enterpri se general l edger, especi al l y i f the reports or queri es focus on profi tabi l i ty
measurements.
I f external data sources are al so avai l abl e, these may be i ntegrated i nto the warehouse.
DBAs and IT Support Staff are the Best Resource Persons
The best resource persons for a deci si onal source system audi t of i nternal systems are
the database admi ni strators (DBAs), system admi ni strators and other I T staff who support
each i nternal system that i s a potenti al source of data. Wi th thei r i nti mate knowl edge of the
systems, they are i n the best posi ti on to gauge the sui tabi l i ty of each system as a warehouse
data source.
These i ndi vi dual s are al so more l i kel y to be fami l i ar wi th any data qual i ty probl ems
that exi st i n the source systems. Cl earl y document any known data qual i ty probl ems, as
these have a beari ng on the data extracti on and cl eansi ng processes that the warehouse
must support. Known data qual i ty probl ems al so provi de some i ndi cati on of the magni tude
of the data cl eanup task.
I n organi zati ons where the producti on of manageri al reports has al ready been automated
(but not through an archi tected data warehouse), the DBAs and I T support staff can provi de
very val uabl e i nsi ght about the data that are presentl y col l ected. These staff members can
al so provi de the team wi th a good i dea of the busi ness rul es that are used to transform the
raw data i nto management reports.
Conduct i ndi vi dual and group i ntervi ews wi th the I T organi zati on to understand the
data sources that are currentl y avai l abl e. Revi ew al l avai l abl e documentati on on the candi date
source systems. Thi s i s wi thout doubt one of the most ti me-consumi ng and detai l ed tasks
i n data warehouse pl anni ng, especi al l y i f up-to-data documentati on of the exi sti ng systems
i s not readi l y avai l abl e.
As a consequence, the whol e-hearted support of the I T organi zati on greatl y faci l i tates
thi s enti re acti vi ty.
Obtai n the fol l owi ng documents and i nformati on i f these have not yet been col l ected as
part of the data warehouse strategy defi ni ti on:
Enterprise IT architecture documentation. Thi s refers to al l documentati on
that provi des a bi rds eye vi ew of the I T archi tecture of the enterpri se, i ncl udi ng
but not l i mi ted to:
System ar chi tectur e di agr ams and documentati onA model of al l the
i nformati on systems i n the enterpri se and thei r rel ati onshi ps to one another.
Enterpri se data model A model of al l data that currentl y stored or mai ntai ned
by the enterpri se. Thi s may al so i ndi cate whi ch systems support whi ch data
i tem.
Network archi tectureA di agram showi ng the l ayout and bandwi dth of the
enterpri se network, especi al l y for the l ocati ons of the project team and the
user representati ves parti ci pati ng i n thi s rol l out.
User and technical manuals of each source system. Thi s refers to data model s
and schemas for al l exi sti ng i nformati on systems that are candi dates data sources.
118 DATA WAREHOUSI NG, OLAP AND DATA MI NI NG
I f extracti on programs are used for ad hoc reporti ng, obtai n documentati on of these
extr acti on pr ogr ams as wel l . Obtai n copi es of al l other avai l abl e system
documentati on, whenever possi bl e.
Database sizing. For each source system, i denti fy the type of database used, the
typi cal backup si ze, as wel l as the backup format and medi um. I t i s hel pful al so to
know what data are actual l y backed up on a regul ar basi s. Thi s i s parti cul arl y
i mportant i f hi stori cal data are requi red i n the warehouse and such data are avai l abl e
onl y i n backups.
Batch window. Determi ne the batch wi ndows for each of the operati onal systems.
I denti fy al l batch jobs that are al ready performed duri ng the batch wi ndow. Any
data extracti on jobs requi red to feed the data warehouse must be compl eted wi thi n
the batch wi ndows of each source system wi thout affecti ng any of the exi sti ng
batch jobs al ready schedul ed. Under no ci rcumstances wi l l the team want to di srupt
normal operati ons on the source systems.
Future enhancements. What appl i cati on devel opment projects, enhancements,
or acqui si ti on pl ans have been defi ned or approved for i mpl ementati on i n the next
6 to 12 months, for each of the source systems? Changes to the data structure wi l l
affect the mappi ng of source system fi el ds to data warehouse fi el ds. Changes to the
operati onal systems may al so resul t i n the avai l abi l i ty of new data i tems or the l oss
of exi sti ng ones.
Data scope. I denti fy the most i mpor tant tabl es of each sour ce system. Thi s
i nformati on i s i deal l y avai l abl e i n the system documentati on. However, i f defi ni ti ons
of these tabl es are not documented, the DBAs are i n the best posi ti on to provi de
that i nformati on. Al so requi red are busi ness descri pti ons or defi ni ti ons of each fi el d
i n each i mportant tabl e, for al l source systems.
System codes and keys. Each of the source systems no doubt uses a set of codes
for the system wi l l be i mpl ementi ng key generati on routi nes as wel l . I f these are
not documented, ask the DBAs to provi de a l i st of al l val i d codes and a textual
descri pti on for each of the system codes that are used. I f the system codes have
changed over ti me, ask the DBAs to provi de al l system code defi ni ti ons for the
rel evant ti me frame. Al l key generati on routi nes shoul d l i kewi se be documented.
These i ncl ude r ul es for assi gni ng customer number s, pr oduct number s, or der
numbers, i nvoi ce numbers, etc. check whether the keys are reused (or recycl ed) for
new records over the years. Reused keys may cause errors duri ng redupl i cati on
and must therefore be thoroughl y understood.
Extraction mechanisms. Check i f data can be extracted or read di rectl y from the
producti on databases. Rel ati onal databases such as oracl e or Sybase are open and
shoul d be r eadi l y accessi bl e. Appl i cati on packages wi th pr opr i etar y database
management softwar e, however , may pr esent pr obl ems, especi al l y i f the data
structures are not documented. Determi ne how changes made to the database are
tracked, perhaps through an audi t l og. Determi ne al so i f there i s a way to i denti fy
data that have been changed or updated. These are i mportant i nputs to the data
extracti on process.
DATA WAREHOUSE PLANNI NG 119
8.4 DESIGN LOGICAL AND PHYSICAL WAREHOUSE SCHEMA
Desi gn the data warehouse schema that can best meet the i nformati on requi rements
of thi s rol l out. Two mai n schema desi gn techni ques are avai l abl e:
Normalization. The database schema i s desi gned usi ng the nor mal i zati on
techni ques tradi ti onal l y used for OLTP appl i cati ons;
Dimensional modeling. Thi s techni que produces demoral i zed, star schema desi gns
consi sti ng of fact and di mensi on tabl es. A vari ati on of the di mensi onal star schema
al so exi sts (i .e., snowfl ake schema).
There are ongoi ng debates regardi ng the appl i cabi l i ty or sui tabi l i ty of both these model i ng
techni ques for data warehouse projects, al though di mensi onal model i ng has certai nl y been
gai ni ng popul ari ty i n recent years. Di mensi onal model i ng has been used successful l y i n
l arger data warehousi ng i mpl ementati ons across mul ti pl e i ndustri es. The popul ari ty of thi s
model i ng techni que i s al so evi dent from the number of databases and front-end tool s that
now support opti mi zed performance wi th star schema desi gns (e.g., Oracl e RDBMS 8, R/ol ap
XL).
A di scussi on of di mensi onal model i ng techni ques i s provi ded i n Chapter 12.
8.5 PRODUCE SOURCE-TO-TARGET FIELD MAPPING
The Source-To-Target Fi el d Mappi ng documents how fi el ds i n the operati onal (source)
systems are transformed i nto data warehouse fi el ds. Under no ci rcumstances shoul d thi s
mappi ng be l eft vague or open to mi si nterpretati on, especi al l y for fi nanci al data. The mappi ng
al l ows non-team members to audi t the data transformati ons i mpl emented by the warehouse.
BACK-END
Extraction
Integration
QA
DW Load
Aggregates
Metadata
Many-to-Many Mappings
A si ngl e fi el d i n the data warehouse may be popul ated by data from more than one
source system. Thi s i s a natural consequence of the data warehouses rol e of i ntegrati ng
data from mul ti pl e sources.
120 DATA WAREHOUSI NG, OLAP AND DATA MI NI NG
The cl assi c exampl es are customer name and product name. Each operati onal system
wi l l typi cal l y have i ts own customer and product records. A data warehouse fi el d cal l ed
customer name or product name wi l l therefore be popul ated by data from more than one
systems.
Conversel y, a si ngl e fi el d i n the operati onal systems may need to be spl i t i nto several
fi el ds i n the warehouse. There are operati onal systems that sti l l record addresses as l i nes
of text, wi th fi el d names l i ke address l i ne 1, address l i ne
2
, etc. these can be spl i t i nto
mul ti pl e address fi el ds such as street name, ci ty, country and Mai l /Zi p code. Other exampl es
are numeri c fi gures or bal ances that have to be al l ocated correctl y to two or more di fferent
fi el ds.
To el i mi nate any confusi on as to how data are transformed as the data i tems are moved
from the source systems to the warehouse database, create a source-to-target fi el d mappi ng
that maps each source fi el d i n each source system to the appropri ate target fi el d i n the data
warehouse schema. Al so, cl earl y document al l busi ness rul es that govern how data val ues
are i ntegrated or spl i t up. Thi s i s requi red for each fi el d i n the source-to-target fi el d mappi ng.
The sour ce-to-tar get fi el d mappi ng i s cr i ti cal to the successful devel opment and
mai ntenance of the data warehouse. Thi s mappi ng serves as the basi s for the data extracti on
and transformati on subsystems. Fi gure 8.1 shows an exampl e of thi s mappi ng.
No.
Schema
Table
Fields
SF1
SF2
SF3
SF4
SF5
SF6
SF7
SF8
SF9
S F10
ST1
ST1
ST1
ST1
ST2
ST2
ST2
ST3
ST3
S T3
SS1
SS1
SS1
SS1
SS1
SS1
SS2
SS2
Ss2
S S2
1
2
3
4
5
6
7
8
9
10
No. System Table
TARGET 1
R1
T T1
TF1
2
R1
T T1
TF2
3
R1
T T1
TF3
4
R1
T T2
TF4
5
R1
T T2
TF5
6
R1
T T2
TF6
7
R1
T T2
TF7
SOURCE
... ... ... ... ... ... ... ... ... ... ...
SOURCE: SS1 = Source System1. ST1= Source Table 1. SF1 = Source Fi eld 1
TARGET: R1 = Roll out1.TT1 = Target Tabl e1. TF1 = Target Fi el d 1
Figure 8.1. Sampl e Source-to-Target Fi el d Mappi ng.
Revi se the data warehouse schema on an as-needed basi s i f the fi el d-to-fi el d mappi ng
yi el ds mi ssi ng data i tems i n the source systems. These mi ssi ng data i tems may prevent the
warehouse from produci ng one or more of the requested queri es or reports. Rai se these
types of scope i ssues as qui ckl y as possi bl e to the project sponsors.
Historical Data and Evolving Data Structures
I f users requi re the l oadi ng of hi stori cal data i nto the data warehouse, two thi ngs must
be determi ned qui ckl y:
Changes in schema. Determi ne i f the schemas of al l source systems have changed
over the rel evant ti me peri od. For exampl e, i f the retenti on peri od of the data
DATA WAREHOUSE PLANNI NG 121
warehouse i s two years and data from the past two years have to be l oaded i nto
the warehouse, the team must check for possi bl e changes i n source system schemas
over the past two years. I f the schemas have changed over ti me, the task of extracti ng
the data i mmedi atel y becomes more compl i cated. Each di fferent schema may requi re
a di fferent source-to-target fi el d mappi ng.
Availability of historical data. Determi ne al so i f hi stori cal data are avai l abl e for
l oadi ng i nto the warehouse. Backups duri ng the rel evant ti me peri od may not
contai n the requi red data i tem. Veri fy assumpti ons about the avai l abi l i ty and
sui tabi l i ty of backups for hi stori cal data l oads.
These two tedi ous tasks wi l l be more di ffi cul t to compl ete i f documentati on i s out of
data or i nsuffi ci ent and i f none of the I T professi onal s i n the enterpri se today are fami l i ar
wi th the ol d schemas.
8.6 SELECT DEVELOPMENT AND PRODUCTION ENVIRONMENT AND TOOLS
Fi nal i ze the computi ng envi ronment and tool set for thi s rol l out based on the resul ts
of the devel opment and producti on envi ronment and tool s study duri ng the data warehouse
strategy defi ni ti on. I f an exhausti ve study and sel ecti on had been performed duri ng the
strategy defi ni ti on stage, thi s acti vi ty becomes opti onal .
I f, on the other hand, the warehouse strategy was not formul ated, the enterpri se must
now eval uate and sel ect the computi ng envi ronment and tool s that wi l l be purchased for the
warehousi ng i ni ti ati ve. Thi s acti vi ty may take some ti me, especi al l y i f the eval uati on process
requi res extensi ve vendor presentati ons and demonstrati ons, as wel l as si te vi si ts. Thi s
acti vi ty i s therefore best performed earl y on to al l ow for suffi ci ent ti me to study and sel ect
the tool s. Suffi ci ent l ead ti mes are al so requi red for the del i very (especi al l y i f i mportati on
i s requi red) of the sel ected equi pment and tool s.
8.7 CREATE PROTOTYPE FOR THIS ROLLOUT
Usi ng the short-l i sted or fi nal tool s and producti on envi ronment, create a prototype of
the data warehouse.
A prototype i s typi cal l y created and presented for one or more of the fol l owi ng reasons:
To assists in the selection of front-end tools. I t i s someti mes possi bl e to ask
warehousi ng vendors to present a prototype to the eval uators as part of the sel ecti on
122 DATA WAREHOUSI NG, OLAP AND DATA MI NI NG
process. However, such prototypes wi l l natural l y not be very speci fi c to the actual
data and reporti ng requi rements of the rol l out.
To verify the correctness of the schema design. The team i s better served by
creati ng a prototype usi ng the l ogi cal and physi cal warehouse schema for thi s
rol l out. I f possi bl e, use actual data from the operati onal systems for the prototype
queri es and reports. I f the user requi rements (i n terms of queri es and reports) can
be created usi ng the schema, then the team has concretel y veri fi ed the correctness
of the schema desi gn.
To verify the usability of the selected front-end tools. The warehousi ng team
can i nvi te representati ves from the user communi ty to actual l y use the prototype
to veri ty the usabi l i ty of the sel ected front-end tool s.
To obtain feedback from user representatives. The prototype i s often the fi rst
concrete output of the pl anni ng effort. I t provi des users wi th somethi ng that they
can see and touch. I t al l ows users to experi ence for the fi rst ti me the ki nd of
computi ng envi ronment they wi l l have when the warehouse i s up. Such an experi ence
typi cal l y tri ggers a l ot of feedback (both posi ti ve and negati ve) from users. I t may
even cause users to arti cul ate previ ousl y unstated requi rements.
Regardl ess of the type of feedback, however, i t i s al ways good to hear what the
users have to say as earl y as possi bl e. Thi s provi des the team more ti me to adjust
the approach or the desi gn accordi ngl y.
Duri ng the prototype presentati on meeti ng, the fol l owi ng shoul d be made cl ear to
the busi ness users who wi l l be vi ewi ng or usi ng the prototype:
Objective of the prototype meeting. State the objecti ves of the meeti ng cl earl y
to properl y ori ent al l parti ci pants. I f the objecti ve i s to sel ect a tool set, then the
attenti on and focus of users shoul d be di rected accordi ngl y.
Nature of data used. I f actual data from the operati onal systems are used wi th
the prototype, make cl ear to al l busi ness users that the data have not yet been
qual i ty assur ed. I f dummy or test data ar e used, then thi s shoul d be cl ear l y
communi cated as wel l . Users who are concerned wi th the correctness of the prototype
data have unfortunatel y si detracked many prototype presentati ons.
Prototype scope. I f the prototype does not yet mi mi c al l the requi rements i denti fi ed
for thi s rol l out, then say so. Dont wai t for the users to expl i ci tl y ask whether the
team has consi dered (or forgotten!) the requi rements they had speci fi ed i n earl i er
meeti ngs or i ntervi ews.
8.8 CREATE IMPLEMENTATION PLAN OF THIS ROLLOUT
Wi th the scope now ful l y defi ned and the source-to-target fi el d mappi ng ful l y speci fi ed,
i t i s now possi bl e to draft an i mpl ementati on pl an for thi s rol l out. Consi der the fol l owi ng
factors when creati ng the i mpl ementati on pl an:
Number of source systems, and their related extraction mechanisms and
logistics. The more source systems there are, the more compl ex the extracti on and
i ntegrati on processes wi l l be. Al so, source systems wi th open computi ng envi ronments
present fewer compl i cati ons wi th the extracti on process than do propri etary systems.
DATA WAREHOUSE PLANNI NG 123
Number of decisional business processes supported. The l arger the number
of deci si onal busi ness processes supported by thi s rol l out, the more users there are
who wi l l want to have a say about the data warehouse contents, the defi ni ti on of
terms, and the busi ness rul es that must be respected.
Number of subject areas involved. Thi s i s a strong i ndi cator of the rol l out si ze.
The more subject areas there are, the more fact tabl es wi l l be requi red. Thi s
i mpl i es more warehouse fi el ds to map to source systems and, of course, a l arger
rol l out scope.
Estimated database size. The esti mated war ehouse si ze pr ovi des an ear l y
i ndi cati on of the l oadi ng, i ndexi ng, and capaci ty chal l enges of the warehousi ng
effort. The database si ze al l ows the team to esti mate the l ength of ti me i t takes to
l oad the warehouse regul arl y (gi ven the number of records and the average l ength
of ti me i t takes to l oad and i ndex each record).
Availability and quality of source system documentation. A l ot of the teams
ti me wi l l be wasted on searchi ng for or mi sunderstandi ng the data that are avai l abl e
i n the sour ce systems. The avai l abi l i ty of good-qual i ty documentati on wi l l
si gni fi cantl y i mpr ove the pr oducti vi ty of sour ce system audi tor s and techni cal
anal ysts.
Data quality issues and their impact on the schedule. Unfortunatel y, there
i s no di rect way to esti mate the i mpact of data qual i ty probl ems on the project
schedul e. Any attempts to esti mate the del ays often produce unreal i sti cal l y l ow
fi gures, much to the concentrati on of warehouse project managers. Earl y knowl edge
and documentati on of data qual i ty i ssues wi l l hel p the team to anti ci pate probl ems.
Al so, data qual i ty i s very much a user responsi bi l i ty that cannot be l eft to I T to
sol ve. Wi thout suffi ci ent user support, data qual i ty probl ems wi l l conti nual l y be a
thorn i n the si de of the warehouse team.
Required warehouse load rate. A number of factors external to the warehousi ng
team (parti cul arl y batch wi ndows of the operati onal systems and the average si ze
of each warehouse l oad) wi l l affect the desi gn and approach used by the warehouse
i mpl ementati on team.
Required warehouse availability. The warehouse i tsel f wi l l al so have batch
wi ndows. The maxi mum al l owed down ti me for the warehouse al so i nfl uences the
desi gn and approach of the warehousi ng team. A ful l y avai l abl e warehouse (24
hours 7 days) requi res an archi tecture that i s compl etel y di fferent from that
requi red by a warehouse that i s avai l abl e onl y 12 hours a day, 5 days a week.
These di fferent archi tectural requi rements natural l y resul t i n di fferences i n cost
and i mpl ementati on ti me frame.
Lead time for delivery and setup of selected tools, development, and
production environment. Project schedul es someti mes fai l to consi der the l ength
of ti me requi red to setup the devel opment and producti on envi ronments of the
warehousi ng project. Whi l e some warehouse i mpl ementati on tasks can proceed
whi l e the computi ng envi ronments and tool s are on thei r way, si gni fi cant progress
cannot be made unti l the correct envi ronment and tool sets are avai l abl e.

Das könnte Ihnen auch gefallen