You are on page 1of 9

Proceedings of the 37th Hawaii International Conference on System Sciences - 2004

Best Practices in Data Warehousing to Support Business Initiatives and Needs

Jeff Lawyer
and
Shamsul Chowdhury

Walter E. Heller College of BusinessAdministration, Roosevelt University


Albert A. Robin Campus. 1400 North Roosevelt Boulevard. Schaumburg, IL 60173
schowdhu@roosevelt.edu

Abstract and business partnerships. Today, there are many


departments benefiting from queries and requests for
The paper presents the data warehousing archi- data warehouse data, many anticipated, some not.
tecture and practices used at a major U. S. retailing Although not planned, the data warehouse has been a
company. Many considerations were assessed when valuable source of purchase and customer data in
deciding which data warehousing architecture to case of a manufacturer recall of merchandise. Above
adopt. The paper discusses the two pre-dominant all, the company has been able to leverage and share
styles in Data Warehousing, namely the “Bill Inmon enterprise customer data to the benefit of the entire
Style” or the top-down approach and the “Ralph company.
Kimball Style”or the bottom-up approach. The com- Keywords: Data Warehouse, Business Intelligence,
pany chose the Inmon style due to a unique combina- CRM
tion of circumstances in their business and technical
environments, which are being discussed in detail.
Much of the information presented in this paper 1. Introduction
is based upon the direct experiences of the lead data
architect assigned to the projects under which this U. A diverse U. S. retailing company was experi-
S. retailing company’s customer data warehouse encing the usual growing pains of the middle 1990’s.
evolved. The diversity of businesses supported by multiple
The architecture has evolved over time and cur- business units and the company’s Information Tech-
rently has been accepted at the company as a best nology organization had resulted in “stove-pipes” of
practice. It is interesting to mention that both the data, along with corresponding computer applica-
hardware platform (CPU and disk drives) and Rela- tions, which were built over several years. The data
tional Database Management System (RDBMS) soft- in these legacy systems were not easily accessed,
ware employed today at this company for data ware- causing difficulty in making information out of the
housing is not the same as was selected for the first data, discerning knowledge from the information, and
instantiation. The implication was that the best plan implementing sound business decisions based upon
or practice was a flexible one. this knowledge. Also, the legacy operational data
There were many challenges, like organiza- were not integrated with other operational data, were
tional, technical, data sourcing and data naming, organized along process or functional orientations,
needed to be solved during the pre-project, initial and were predominantly current-valued, containing
stages, and throughout the project and beyond. The little or no history. Because the data as such could
initial data warehouse, implemented in 1996, was yield very little business intelligence, the company
termed an overall success and approved for expan- decided in 1995 that data warehousing could be used
sion. The current data warehouse data are being to release their data from its “data jailhouse”.
used by over six hundred registered users to fine-tune
customer marketing and leverage and share data in
an enterprise manner. The data warehouse has al-
lowed the company to strengthen customer relation-
ship management (CRM) core capabilities

0-7695-2056-1/04 $17.00 (C) 2004 IEEE 1


Proceedings of the 37th Hawaii International Conference on System Sciences - 2004

sion, but rather “Product Type” or some other higher-


level dimensional measure. Under the Inmon ap-
proach, data are typically kept at the lowest level of
2. Data Warehousing Architecture detail [13]. In other words, each transaction would
be stored in its 3NF form and could be summarized
Many decisions must be made when implement- by “Product Type” or other dimensional measure
ing a data-warehousing environment. As if the tech- upon reporting to the business user.
nology decisions were not difficult enough in and of Due to traditional business “stove-pipes” of data,
themselves, deciding which data warehousing archi- potential cross-business use of data was unknown.
tecture approach to use is sometimes even more diffi- Under the Kimball approach, data are arranged in an
cult. There are two general styles from which to application- or data-view-specific manner [8]. Under
choose – one termed herein the “Bill Inmon Style” the Inmon approach, data are arranged according to
[14] and the other the “Ralph Kimball Style” [8]. the rules of normalization and remain application-
Both Bill Inmon and Ralph Kimball are acknowl- and data-view-independent [13].
edged experts in the data warehousing field, with Bill Legacy system data were predominantly non-
Inmon being credited with inventing (or first formal- database and, therefore, not integrated nor standard-
izing) the concept. While both styles obtain source ized. Under the Kimball approach, sourcing the leg-
data from legacy batch and online operational sys- acy system data would require a two-phase approach,
tems and specialized Operational Data Stores (ODS), one for standardizing and one for summarizing and
they differ in the arrangement of this data in the data arranging facts by their dimensions [8]. Under the
warehouse itself. The Inmon style calls for an Inmon approach, only one phase of data sourcing is
atomic-level, third-normal form (3NF) relational required for standardizing the data [13].
format in which to store extracted and transformed Sufficient expertise existed in the business com-
data, while the Kimball style calls for a multidimen- munity to support user self-sufficiency incorporating
sional style “dimension and fact” arrangement in native SQL against an atomic-level data warehouse.
which to store extracted and transformed data. Kim- There was also a general absence of Business Intelli-
ball’s multidimensional style design is often referred gence tools for accessing data warehouse data. Un-
to as a “star schema”, due to its typical arrangement der the Kimball approach, authoring SQL to access
of dimension entities around a central facts entity [3, data arranged in a multidimensional database would
4, 9, 11]. be a very complex task. In order to handle the capa-
The Inmon style is considered application neu- bilities of drilling up, down, and sideways within a
tral, while the Kimball style has data prearranged by multidimensional structure, the business user typi-
certain dimensions according to desired output [6, 7, cally requires a Business Intelligence tool, such as
10, 11, 12]. If the Inmon style data warehouse has used in On-Line Analytical Processing (OLAP) [8].
data covering most or all data subjects for the com- Under the Inmon approach, while the SQL can get
pany, it can be termed an “enterprise” data ware- quite complex, it still will not be as complicated as
house. With the Kimball style, the sum of all indi- that needed to access a multidimensional structure
vidual multidimensional data structures is considered and perform drilling navigation [13].
the “enterprise” data warehouse. Although highly There were other factors involved in the decision
debated in some data warehousing data architecture to use the Inmon style, but the above list is sufficient
communities, the detail advantages and disadvan- to illustrate the need for the flexibility of an applica-
tages, as well as the recommended analysis and selec- tion-neutral, 3NF database to house and maintain a
tion process of each style, is beyond the scope of this “single source of truth” of company data. An addi-
paper. Companies usually pick one style over the tional advantage to the Inmon approach is the ability
other based upon a combination of employee exper- to create dependent data marts from the atomic data
tise, assumed preference, consultant or vendor rec- warehouse for those situations where a repetitive re-
ommendation, budget, existing technologies, or per- porting requirement or application-specific need ex-
ceived net advantages. ists [13].
The U. S. retailing company chose the Inmon The architecture adopted as the best practice, as
style due to a unique combination of circumstances in shown in Figure 1, consists of four distinct, interact-
their business and technical environments: ing components. As depicted, the legacy operational
Business users had an overwhelming desire for systems and ODS are used as sources for data ware-
detail, transaction-level data. Under the Kimball ap- house data. Outside sources, such as household
proach, data are typically summarized by higher-level demographics or ZIP code geo-demographics, may
dimensions [8]. In other words, it would be rare to also be used as sources for data warehouse data. Any
employ “Transaction ID” as the lowest-level dimen- needed data marts are built with data from the data

0-7695-2056-1/04 $17.00 (C) 2004 IEEE 2


Proceedings of the 37th Hawaii International Conference on System Sciences - 2004

warehouse, thus being “dependent” data marts. “In- data warehouse is built [1, 2, 15]. It is extremely
dependent” data marts should be discouraged, as the important for the business champion to engage data
data used to build them would not be of the same warehouse business partners in an “enterprise” man-
assured quality as the data warehouse “single source ner, not as individual vertical business units
of truth”. (“stove pipes”). In conjunction with the business

champions, fully utilize or establish a data ownership

Legacy Operational Data Data


Operational Data Stores Warehouse Marts
Systems

•Process Orientation •Subject Orientation •Subject Orientation •Data subset: Summary;


•Not integrated •Integrated •Integrated Sample; Multidimensional
•Current-valued with •Current-valued with •Time Variant •Time Variant
little or no history light history •Non-Volatile •Non-Volatile
•Volatile •Volatile •Detail with much •Much history
•Detailed •Detailed history •Customized for local /
•Not easily accessed •Not arranged / tuned •Unpredictable application use
for mass retrieval requests / processing •User-friendly presentation
•Enterprise “single •Repetitive requests /
source of truth” processing
Figure 1. Chosen Data Warehousing Architecture / stewardship function and process [2]. Data owners
and data stewards are indispensable when negotiating
and standardizing multi-user differing views of the
3. Best Practices same data. To aid in accurately building the first
instantiation of the data warehouse, fully leverage
your Information Architecture organization’s enter-
As a reminder, the thrust of this paper is not prise data model [13]. If one does not exist, this is an
technological, so the various hardware and software opportune time to commence building and document-
selection decisions will not be covered. Those are ing one.
best left up to the company technicians and consult-
ants who are charged with selecting the best set of
3.2 Data Warehouse Growth
technological solutions to match the business prob-
lem. This analysis, research, testing, and selection
process can be a project in itself, and was handled as Most data warehousing initiatives have found
such at the U. S. retailing company. What may be that there is a continuous need for incremental addi-
interesting is that both the hardware platform (CPU tions to the data warehouse [2]. Treat the data ware-
and disk drives) and Relational Database Manage- house as an ongoing system and spawn specific pro-
ment System (RDBMS) software employed today at jects when appreciable expansion is needed. Keeping
this company for data warehousing is not the same as your data warehouse team intact after the initial build
was selected for the first instantiation in 1996. The is very important in order to sustain the capability to
implication here is that the best plan is a flexible one. react to this need. To paraphrase a popular saying,
“Data warehousing is not a destination – it is a jour-
ney”.
3.1 Data Warehouse Sponsorship
3.3 Data Warehouse Expertise
One of the basic best practices you can employ
for data warehousing is to ensure that a high-level
business champion exists, not just during building of In addition to the high-level business champion,
the data warehouse, but ongoing continually after the your organization should use data warehousing in-
dustry experts for both validation and expertise defi-

0-7695-2056-1/04 $17.00 (C) 2004 IEEE 3


Proceedings of the 37th Hawaii International Conference on System Sciences - 2004

ciencies [14]. Be sure to interview, hire, and contract compound fields, especially when for attributes mak-
with individuals and firms according to the data ing up the key of the entity [3]. Each component of a
warehousing style, and perhaps even the technolo- compound legacy field should be broken out into
gies, you choose. Also, there are a number of indus- detail attributes in the data warehouse, as each com-
try trade shows and conferences from which beginner ponent is a unit of business data which could have
to experienced practitioners can benefit greatly. significance on its own. As a best practice, use na-
Again, select and use these opportunities based upon tive keys for primary keys; do not use token keys,
the style of data warehousing you choose. which are made up "serial" type numbers with no
meaning that represent a unique set of multiple native
3.4 Data Warehouse Scope key values. With multidimensional data warehouse
structures, however, it is often recommended to use
For the initial release of your data warehouse, token keys because, with the multiple dimension enti-
limit the number of data subjects implemented and ties surrounding a central facts entity, the primary
the extent of their content, perhaps employing an key list of the central facts entity would be the un-
evolutionary prototype or proof-of-concept develop- wieldy list of all primary keys of its dimension enti-
ment methodology [15]. This will minimize initial ties [8]. This used to be primarily an RDBMS per-
investment, help gain expertise with a smaller set of formance issue, but most RDBMS vendors have pro-
data (and, thus, a smaller set of technical challenges), vided technical enhancements or indexing capabili-
and deliver business value sooner. This is an excel- ties that alleviate the concern for non-numeric, multi-
lent way of demonstrating the informational and ple keys. In practice, getting rid of the multiplicity of
monetary benefits of data warehousing to the com- keys has more to do with minimizing SQL keying of
pany's top-level management, increasing their overall power users than maximizing database performance.
commitment and support of the concept. A potential compromise would be to carry both the
native keys and the token keys, trading ease of use
for more database space consumed.
3.5 Data Warehouse Data Modeling
3.7 Data Warehouse Loading
It is important, once a data warehousing architec-
ture is chosen, to adhere to it from beginning to end .
This may seem rhetorical, but there can be many op- When populating the data warehouse from the
portunities and much pressure to short-cut the process legacy, external, and ODS files and databases, you
necessary to create a quality data warehouse. Using a should employ the use of utility Extract / Transfor-
robust data modeling tool, follow a typical concep- mation / Load (ETL) purchased software. Similarly,
tual to logical to physical data model progression, build necessary dependent data marts from the data
maintaining all data models in as close to third- warehouse using an ETL tool [14]. These tools are
normal form (3NF) as possible [14]. However, al- somewhat costly, but provide necessary structure and
though the goal is a 3NF data model for the atomic efficiency in ensuring data quality, transformation
data warehouse, employ practical denormalization and standardization of data values, and in building
without compromising the basic entity-relationship and delivering the data stream necessary to load the
structure. For example, one allowable type of de- data warehouse.
normalization technique is where a parent entity in-
stance includes a total attribute computed from add- 3.8 Data Warehouse Data Marts
ing together the attribute values from child entity
instances. Another allowable type is where child Independent ("end-run") data marts built directly
entity instances include a redundant attribute, such as from legacy, external, and/or ODS data files and da-
“transaction date”, which has been replicated from a tabases should be avoided. It is best to first source
parent entity instance. Denormalization actions the data into the data warehouse, thus becoming part
which combine or split entities should generally be of the "single source of truth", and then into a data
avoided unless necessary in the physical database mart, if necessary [14]. In addition, the data ware-
environment to address a demonstrated performance house should not feed any ODS or legacy systems
issue. directly, as that makes the 24 hours per day x 7 days
per week operational systems dependent upon the
3.6 Data Warehouse Attribution and Keys data warehouse, which is rarely set up in a 24 hours
per day x 7 days per week operational format. For
When defining attributes for the entities of the example, you may wish to compute a "relative cus-
data warehouse data model, do not define intelligent, tomer score" ("poor", "good", "better", or "best") to

0-7695-2056-1/04 $17.00 (C) 2004 IEEE 4


Proceedings of the 37th Hawaii International Conference on System Sciences - 2004

be used in an operational system such as customer warehouse data columns defined as "code" type col-
service for customer treatment workflow. If you umns should have all potential values and their mean-
need extensive history from the data warehouse to ings either encoded in metadata extensions or, if
compute this score, it is acceptable to "reverse- many values exist, consider building special data
source" this data from the data warehouse. However, warehouse code / decode tables. If you have a meta-
you should not make the operational system depend- data repository that supports import and export of
ent upon that action -- the operational system should metadata, use it. If not, strongly consider purchasing
be able to function with the most recent score avail- a metadata repository package that supports not only
able. Other business and system requirements which import and export of metadata, but also has Internet
are pushing the architecture towards what is referred or Intranet deployment capabilities. Since the thirst
to as "real-time data warehousing” are probably best for metadata will be great, it is important to have ro-
implemented as some form of legacy / ODS combi- bust metadata direct access and reporting capability.
nation. At this U. S. retailing company, there was
no formal, centralized metadata repository that could
3.9 Data Warehouse Loading be used for data warehousing. A cursory review of
metadata repositories available for purchase did not
Another consideration for loading the data ware- produce any candidate repositories that matched the
house is load frequency. It is likely that no single requirements for use in their data warehousing envi-
frequency (weekly, biweekly, monthly, etc.) will be ronment. Therefore, the development team initially
used for loading the data warehouse. The frequency collected the all-important metadata and entered it
and volume of data associated with outside, legacy, into an MS-Access database, from which rudimentary
and ODS systems will likely determine when to in- reports were used to communicate the metadata.
voke loading cycles. At the U. S. retailing company, Later, the MS-Access database was converted to In-
daily transaction data are collected, staged daily, and formix, and Java-based Intranet applications were
loaded weekly due to the tremendous volume of written to maintain, retrieve, and report on the meta-
transactions (500 million per year). Customer data data. Also available was bulk-loading of metadata
for about 190 million individuals are loaded every from MS-Excel spreadsheets.
two weeks and is done so in conjunction with a cus-
tomer management ODS which operates on a similar 3.11 Data Warehouse Education / Support
schedule. The credit account data for the retailer's
credit card portfolio are loaded on a monthly basis. In addition to good metadata describing the data
(Note the anomaly the business user must be aware of warehousing environment, you need to provide for
-- for a customer who opens a credit account during a regular and targeted education regarding the data
transaction, the transaction data will arrive on the warehouse and data mart structure and content, SQL
data warehouse weeks before the customer's credit coding techniques, access tools, data privacy, and any
account detail. Thus is the double-edged sword of other requests you need to field from the business
data warehousing.) users. Consider setting up an official data warehouse
Intranet web site as a clearinghouse for detailed in-
3.10 Data Warehouse Metadata formation, education requests, questions, forms, and
links to related web sites. Some companies have set
Data worth warehousing are data worth docu- up a "decision support center" and allocated person-
menting. This brings up the importance of ensuring a nel to it to handle or route questions and assist in
good metadata thread exists throughout all environ- getting data warehouse information to business
ments [14]. There is little one can do regarding leg- groups who are not users of the data warehouse, but
acy metadata, other than dedicating resources to ret- know the data warehouse may contain the detail an-
rofitting any discrepancies uncovered. For the data swer to their business question.
warehouse, however, the importance of good meta-
data can not be emphasized enough. ETL teams are 3.12 Data Warehouse Modification
going to rely heavily on the accuracy and complete-
ness of metadata. Data warehouse power users and Finally, you should maintain a detailed and con-
casual users are going to access metadata frequently trolled data warehouse change management process
in order to formulate their data warehouse queries. that involves the business sponsor, data / information
The data warehouse metadata must consist of good architecture, data / metadata administration, DBA,
business names and definitions, as well as standard- analyst, programmer, and any other group involved in
ized technical names and database formats. Data the data warehousing community [14]. In the change

0-7695-2056-1/04 $17.00 (C) 2004 IEEE 5


Proceedings of the 37th Hawaii International Conference on System Sciences - 2004

management process, allow for error corrections, warehouse sourcing requirement itself, sometimes by
relatively small change requests, larger work requests as much as 100%.
or enhancements, and major projects. A special data warehouse sourcing challenge is
what data to select to put in it. Legacy systems con-
4. Challenges tain many terabytes of data. How can anyone select
what subset of data to copy to the data warehouse?
The technique that represents perhaps the best way to
A number of challenges were encountered during
do this involves selecting key master files and data-
creation of the U. S. retailing company's data ware-
bases in the legacy environment and having the in-
house. Some of these challenges were anticipated
tended business users rank each and every element
and others were not. Interestingly, throughout the
and column as to its significance from a business
project to build the data warehouse, though, organiza-
intelligence standpoint. This could be as simple as a
tional and project process challenges overshadowed
"yes" / "no" designation, to a scoring system whereby
technical challenges.
the resultant list could be sorted by score and evalu-
ated at many points of cost versus benefit. In addi-
4.1 Organizational Challenges tion, some business data that would score low now
may score higher in the future due to business, mar-
Although a business sponsor was selected to rep- keting, societal, compliance, legal, or other factors.
resent the data warehouse, this sponsor was a mem- Regardless of the method of selection, there is a risk
ber of one of four vertical business units participating that not all needed detailed data will be captured.
in the data warehouse project. Interdepartmental However, as long as the data warehousing project is
cooperative promises were made, but annual incen- never "closed", there will be future opportunity to
tives for members of these departments were tied to include that data, but only to the extent that this data
their business performance alone. As well, the an- are kept from a historical standpoint.
nual incentives for the information technology asso-
ciates selected to participate were also tied to their 4.3 Technological Pressures
assigned business unit's performance. No incentive
dollars were tied to the data warehouse project di-
As the data warehouse and its environment ma-
rectly. The difficulty here is that maximum attention
ture, certain technological pressures surface that seem
was paid to departmental systems, with secondary
natural and creative to middle- and upper-level man-
consideration given to the data warehouse project.
agement, but violate standard data warehousing ar-
This affected work priorities and project time-line
chitecture precepts. The first stems from the fact that
adjustments had to be made on a regular basis.
a data warehouse built with excellent cleansing and
integration techniques yields data that can be of
4.2 Data Sourcing Challenges higher quality than the legacy data from which it
came. Top-level executives start referring to their
The data sourcing challenges have their roots in "Customer Data Warehouse" as their "Customer Da-
the legacy system environment. Many companies tabase". In turn, they will exert pressure to update
have no real system of record for critical enterprise legacy customer data from the data warehouse, not
data, their legacy systems and data being aligned realizing the difference in data currency between the
more on critical process boundaries instead. Since warehouse (up to two-week old customer data) and
ODS systems and ODS databases are a relatively the legacy system (near-real-time).
recent spin-off of data warehousing, there is typically Secondly, there is pressure to push the data
little or no integrated ODS data from which to source warehouse into a real-time or near-real-time envi-
the data warehouse, as well. In an environment with ronment. This idea often comes from on-site vendors
weak or informal data management practices, data that would dearly love to see the data warehouse
subjects are either not clearly established or not used pushed into a 24 hours per day x 7 days per week
at all. Saddle this net situation with weak or nonexis- environment -- they would be the most likely recipi-
tent metadata, then the data sourcing effort will seem ents of the cost of the equipment and expertise to
mammoth, and it usually is. This is why the phrase support a multi-terabyte operational database in a
"data warehouse sourcing" has been equated with the real-time environment and connect it to the opera-
phrase "data archaeology" -- you may know where to tional legacy systems, as well. After seven years of
dig, but you simply do not have any idea what the existence, the data warehouse at the U. S. retailing
next shovel-full will give you. Even with this reali- company satisfies business user requirements by op-
zation, most data warehousing projects, even with the erating on a service level agreement of 10 hours per
use of an ETL tool, woefully underestimate the data

0-7695-2056-1/04 $17.00 (C) 2004 IEEE 6


Proceedings of the 37th Hawaii International Conference on System Sciences - 2004

day and 5 days per week. The cost to operate this and a fourth needs view ABD, you could build a data
database on a 24 hours per day x 7 days per week mart supporting view ABCDE to satisfy all four users
basis would be several times higher. (and other users requiring the same views or even
Thirdly, there is pressure to build independent new views of interest BCD, BCE, BCDE, etc.).
data marts from non-data warehouse data directly
from legacy or ODS files and databases. Some argue 4.4 Customer Data Challenges
that this is nothing more than implementing a hybrid
Inmon / Kimball data warehousing environment, Customer name and address data is perhaps the
however, there is no such thing. There are advan- most difficult to establish and maintain, particularly
tages to either over the other, but only one can be if you have multiple stove-pipe systems each main-
selected as the architecture for any particular data taining their own customer master files or databases.
warehousing implementation. In the case where the Do not underestimate the effort needed to overcome
Inmon style has been selected, all data marts should this challenge. You may need to purchase software
be created from the data warehouse, where the data to assist you in standardizing name and address for-
has already been scrubbed, integrated, and stored mats. You will have to write some form of a cus-
with a high degree of quality [14]. Creating data tomer management system to collect the customer
marts from different sources for the same data allows data you have and determine if one vertical business
for error and variable tolerances such that different unit's "John Smith" is the same as another vertical
answers may be given to the exact same business business unit's "John Smith". You may even need to
intelligence question. Always create your data marts rely on an external vendor to assist you in keeping up
from the "single source of truth" -- the data ware- with name, address, and telephone number changes.
house. At the U. S. retailing company, no less than sixteen
A fourth technological pressure is to proliferate customer master files had been identified across the
application-specific data marts in an uncontrolled multiple businesses. Long term, their goal is to in-
fashion. It takes time, education, and experience to stall a customer management system as a robust ODS
become a skilled query writer using native SQL. containing multiple detailed source customer data by
Many power users have no problem accessing the vertical business unit. A ranking process "survives"
native data warehouse directly using SQL. However, the best individual name, address, and telephone
new users of the data warehouse could easily become number data and prepares it for transmission to an
discouraged if not enough time, education, and op- external vendor. The vendor matches the name and
portunity to practice are allowed to attain SQL self- address data to its files and appends an individual key
sufficiency. A typical solution would be to create a to the individual, as well as an address key to the
data mart containing only the subset of data of inter- address. The data is then transmitted back from the
est, perhaps pre-summarized and pared down to meet vendor and loaded into the data warehouse, after
a specific area of business interest. This, in itself, is which sophisticated SQL households individuals ac-
not a bad thing to do if there are a multitude of busi- cording to those having the same address or other
ness users that could benefit from this data. How- shared detail data. History is kept on individuals
ever, if only this one user will benefit, then this user migrating from household to household, as well as
will request additional data marts. Multiply this by individuals and households migrating from address to
several business users of the same skill, and you will address. Currently, eight of the company's sixteen
soon have uncontrolled proliferation of data marts. customer master files are included, with the remain-
The solution is not to prevent all creation of data ing eight targeted for future incorporation, at which
marts, but to analyze requests for data marts (views) time the customer management system ODS will be
against other requests for data marts (views) and in- upgraded and integrated with the sixteen legacy sys-
stall actual data marts that satisfy more than one tems for two-way customer data communication.
view. For example, if one user needs view ABDE, a
second needs view ACE, a third needs view ABE,

0-7695-2056-1/04 $17.00 (C) 2004 IEEE 7


Proceedings of the 37th Hawaii International Conference on System Sciences - 2004

Figure 2. Sample List of Customer Interactions

Customer Interaction
Sample List

Customer Interaction

Credit Application Credit Billing

Survey Mailing Credit Account Payment

Survey Response Catalog Request

Telemarketing Contact Catalog Mailing

POS Sale Transaction Other Purchase

POS Sale / Return Catalog Order

Home Improvement Item Insurance Coverage Payment

Parts Purchase Order Credit Protection Payment

Service Contract Agreement Club Membership / Fee

5. Successes customer interactions in the areas of inbound


customer requests and/or complaints, outbound cus-
tomer telemarketing, survey requests and responses,
At the U. S. retailing company, the first success
and other customer interactions.
came in 1995 with the initial, eleven-table, “credit-
By segmenting their customer portfolio, the
focused” proof-of-concept data warehouse. The
company has been able to provide both enterprise
eighty-gigabyte Inmon-style data warehouse was
(cross-business selling) and vertical business unit
used to select customers for a targeted credit-
marketing opportunities, along with enhanced access
stimulation marketing program. The additional mer-
to enterprise customer data. By establishing these
chandising and credit profit generated by this proof-
high-relevancy marketing programs, the company has
of-concept data warehouse exceeded the estimated
been able to participate in and respond to customers’
cost to build first full-blown, sixty-five-table, six-
life-style (credit heavy revolver, high tech consumer,
hundred-gigabyte credit customer data warehouse in
jewelry purchaser, etc.), life-stage (young married,
1996. The data warehouse has since grown to seven
mid-life spenders, empty-nesters, etc.), and life-event
terabytes with two hundred tables and two thousand
(newlyweds, birth of a baby, new home buyer, etc.)
seven hundred columns. The combined customer and
marketing opportunities.
prospect list on the data warehouse includes everyone
In addition, enhanced contact management has
in the U. S. who is eighteen years old and older.
been engaged for proactive management of high-
There are over six hundred registered users of the
value customer segments, alignment of vertical busi-
data warehouse.
ness unit and enterprise needs, and incorporation of
In addition to robust customer information, the
customer preferences. The data warehouse has al-
U. S. retaining company has captured many “cus-
lowed the company to strengthen customer relation-
tomer interactions” (see Figure 2) and associated
ship management (CRM) core capabilities and busi-
them with individuals. A “customer interaction”
ness partnerships. Today, there are many depart-
represents activity between the company and an indi-
ments benefiting from queries and requests for data
vidual, such as a store sale transaction, a return or
warehouse data, many anticipated, some not. Al-
exchange, an Internet sale, a catalog order, a service
though not planned, the data warehouse has been a
contract agreement, a credit insurance coverage pay-
valuable source of purchase and customer data in
ment, and other activities. Capturing these customer
case of a manufacturer recall of merchandise. In ad-
interactions has enabled the company to gain a cus-
dition, the company has been able to investigate and
tomer intimacy not attainable without the critical
alleviate some inventory shrinkage challenges.
mass of data they have amassed regarding their cus-
Above all, however, the company has been able
tomers and their customers’ preferences. Company
to leverage and share enterprise customer data to the
plans call for incorporation of even more
benefit of the entire company.

0-7695-2056-1/04 $17.00 (C) 2004 IEEE 8


Proceedings of the 37th Hawaii International Conference on System Sciences - 2004

[3] Armstrong, R. (2002). White paper by NCR Corpora-


tion - The Fallacy of Data Mart Centric Strategies
(Short Term Gain, Long Term Pain).
6. Discussions [4] Chen, P. (1976). The Entity-Relationship Model –
Towards a Unified View of Data, ACM Transactions
on Database Systems, pp 9-36.
The dilemma of choosing one or the other style [5] Chowdhury, S (2002). Lecture notes on Data Ware-
for data warehousing is a considerable issue in real house and Data Mining. The College of Business Ad-
life data warehousing to solve organizational prob- ministration, Roosevelt University, IL.
lems and bring benefits to the organization. The se- [6] Inmon, B. (1999). “The Problem with Dimensional
lection depends on many factors and considerations. Modeling” DM Review Magazine Archived Article.
There are also considerable philosophical debates, [7] Ismail, W. and Chowdhury, S. (2003). Database Ap-
obstacles, and pros and cons as to the selection of a plications in Business. In Proceedings of the MBAA,
data warehousing methodology. However, whatever Chicago, IL.
[8] Kimball, R. (1996). The Data Warehouse Toolkit:
methodology is chosen it must meet the business re-
Practical Techniques for Building Dimensional Data
quirements of the organization and be flexible and Warehouses, John Wiley.
scalable. In other words the methodology must [9] Kroenke, D. (2002). Database Processing – Funda-
minimize the gaps between the business processes mentals, Design and Implementation (8th ed.). Pearson
and the technology that are being used to run the Education, Inc.
business in the organization [5, 12]. The best prac- [10] Letowski, B. Parzatka, H. Woods, N (2002). North-
tices (Inmon style) that have been adopted by the Wind Star Schema – a project work presented and
company enabled them to create and deliver the nec- submitted in Seminar on Data Warehouse and Data
essary analytical environment to meet the changing Mining at the College of Business Administration,
Roosevelt University, IL. 2002.
needs of business.
[11] McFadden, F. Hoffer, J. and Prescott, M (1999). Mod-
ern Database Management. (5th ed.). Addison-Wesley
Educational Publishers, Inc.
7. References [12] Sperley, E (1999). "Planning, Building, and Imple-
mentation". The Enterprise Data Warehouse. Prentice
Hall.
[1] Agosta, L. (2000). The Essential Guide to Data Ware-
[13] Utley, C. (2002). “Designing the Star Schema Data-
housing. Prentice Hall PTR, Upper Saddle River,
base” Data Warehousing Resources, November 2002.
NJ 07458.
[14] Inmon W H, Imhoff C, Sousa R. Corporate Informa-
[2] Anahory, S and Murray, D. (1997). Data Warehousing
tion Factory. New York, NY: John Wiley & Sons, Inc.,
in the Real World: A Practical Guide for Building De-
1998.
cision Support Systems. Addison Wesley Longman
[15] Data Management Association. Guidelines to Imple-
Limited, England.
menting Data Resource Management. Bellevue, WA:
DAMA International, 2002.

0-7695-2056-1/04 $17.00 (C) 2004 IEEE 9