Sie sind auf Seite 1von 19

336

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING,

VOL. 38,

NO. 2,

MARCH/APRIL 2012

A Model of Data Warehousing Process Maturity


Arun Sen, K. (Ram) Ramamurthy, and Atish P. Sinha
AbstractEven though data warehousing (DW) requires huge investments, the data warehouse market is experiencing incredible
growth. However, a large number of DW initiatives end up as failures. In this paper, we argue that the maturity of a data warehousing
process (DWP) could significantly mitigate such large-scale failures and ensure the delivery of consistent, high quality, single-version
of truth data in a timely manner. However, unlike software development, the assessment of DWP maturity has not yet been tackled in
a systematic way. In light of the critical importance of data as a corporate resource, we believe that the need for a maturity model for
DWP could not be greater. In this paper, we describe the design and development of a five-level DWP maturity model (DWP-M) over a
period of three years. A unique aspect of this model is that it covers processes in both data warehouse development and operations.
Over 20 key DW executives from 13 different corporations were involved in the model development process. The final model was
evaluated by a panel of experts; the results strongly validate the functionality, productivity, and usability of the model. We present the
initial and final DWP-M model versions, along with illustrations of several key process areas at different levels of maturity.
Index TermsData warehousing process, design-science research, model validation, software maturity models.

INTRODUCTION

ATA warehousing (DW) has experienced tremendous


growth in the last decade. It has become so popular in
industry that it was cited as the highest priority postmillennium project of more than half of IT executives [64]. A data
warehouse is a subject-oriented, integrated, time-variant,
and nonvolatile collection of data that supports managerial
decision making [28]. The data in a data warehouse are
typically extracted and loaded from multiple online transaction processing (OLTP) systems and other data sources using
an extract, transform, and load (ETL) process.
Data warehouse projects tend to be costly [71]. Despite
the fact that the projects require large investments, both in
terms of money and effort [30], the data warehouse market
is continuing to experience incredible growth, primarily
because of the role of data warehouse as a powerful
decision support tool [61]. If the growth trend continues,
the real data in the data warehouse could easily reach
1,000 terabytes [73]. This growth is not only in sheer size,
but also in the number of end users, query volumes, data
complexity, and right-time information. Data warehouses are now getting incorporated into mission-critical
systems that demand high availability, right-time refresh
rates, and high data quality [4].
Despite this booming market, a large number of DW
initiatives end up as failures. Friedman [14] expected that
over 50 percent of data warehouse projects would experience
limited acceptance, if not outright failure. It is therefore
critical for the DW community to devote more thought to

. A. Sen is with the Department of Information and Operations Management, Mays Business School, Texas A&M University, College Station,
TX 77843. E-mail: asen@mays.tamu.edu.
. K. Ramamurthy and A.P. Sinha are with the Sheldon B. Lubar School of
Business, University of Wisconsin-Milwaukee, PO Box 742, Milwaukee,
WI 53201-0742. E-mail: {ramurthy, sinha}@uwm.edu.
Manuscript received 28 Apr. 2009; revised 24 May 2010; accepted 23 Oct.
2010; published online 3 Jan. 2011.
Recommended for acceptance by H. Muller.
For information on obtaining reprints of this article, please send e-mail to:
tse@computer.org, and reference IEEECS Log Number TSE-2009-04-0091.
Digital Object Identifier no. 10.1109/TSE.2011.2.
0098-5589/12/$31.00 2012 IEEE

understanding what afflicts DW design, development,


implementation, and management. DW initiatives often
end up as failures because of factors such as slipped
schedules, unacceptable performance, expandability problems, poor availability, complicated tools, poor data
quality, and unhappy users [2], [31]. Data quality is a very
important issue [74] because it caters to a variety of
stakeholders, encompasses diverse aspects (e.g., coherency,
freshness, accuracy, accessibility, availability, etc.), and
requires complex assessment techniques [15], [67].
In response to similar types of problems in the software
engineering domain, researchers advocated the need to
study the software process and its management. Humphrey
[21], [22] broadly defines a software process as a set of tools,
methods, and practices used to produce a software product.
The objectives of software process management are to
generate products according to plan, while concurrently
improving the capability to produce better products [21].
The Capability Maturity Model (CMM) [49] and ISO 9001
[48] were developed in an effort to promote and assess
software process quality standards in organizations.
Although data warehousing has been around since the
early 1990s, unlike software development, the assessment of
its process maturity has not received much attention. A data
warehousing process (DWP) can be viewed as a data
production process that includes subprocesses such as
business requirements analysis, data design, architecture
design, data mapping, ETL design, end-user application
design, data quality management, business continuity
management, implementation, and deployment.
Just as CMM has been useful in reducing defects in a
software development process [1], [19], we expect a mature
DWP to address many issues surrounding the development
and management of a data warehouse. While many DW
development methodologies are currently available (see, for
example, [58], [59]), they have not been extended to
incorporate CMM-like maturity concepts. Most firms do
not appear to follow a set of engineering practices and
standards for data warehousing and, as a result, have not
Published by the IEEE Computer Society

SEN ET AL.: A MODEL OF DATA WAREHOUSING PROCESS MATURITY

attained the high levels of maturity that software development has, resulting in failed DW implementations, poor
data quality, and other associated problems.
Most companies embark on data quality initiatives to
address concerns such as spiraling direct-mail costs, poor
customer service, and faulty reports [54]. Poor data quality
costs money in terms of lost productivity, faulty business
decisions, and an inability to achieve results from expensive
investments in enterprise applications. One of the major
reasons for data quality problems is inconsistent data
definition. A data warehouse, which reflects the single
version of the truth [25] for an organization, is a prime touch
point for addressing data quality problems. Subject matter
experts who are knowledgeable about business as well as
data are often employed to define data cleansing rules and
data quality metrics, as well as recommend whether to fix the
data at the source, the staging area, or the warehouse [10].
In addition to addressing data quality concerns, a mature
DWP can be expected to provide several other benefits. A
mature DWP would help the organization to define and
deliver projects with predictable durations. It would force
the organization to develop data quality and data governance strategies that enhance the trust of sponsors and users
in the data. By building trust and providing the ability to
perform sophisticated business analytics, a mature DWP
would also keep the user base satisfied, thereby addressing
one of the main causes of data warehouse failures.
The need for a CMM-like maturity model for DWP has
been mooted in the literature [33], [34], [35], [60], [69]. But
none of the prior studies has gone beyond presenting a
sketchy or preliminary model. In this paper, we describe the
design and development of a comprehensive, detailed, and
robust DWP maturity modelbased on design-science
research guidelinesover a period of three years.
It is important to note that while there are many issues
common to software development and data warehousing,
there are a number of factors that render DWP unique. In
particular, any discussion on DWP maturity revolves
around data quality management, ETL design, metadata
management, data change management, data warehouse
governance, end-user cube design, etc.activities that do
not fall under the purview of traditional software development. The main contribution of our work is in designing a
DWP maturity model by identifying, defining, and accommodating those aspects that pertain specifically to DW.
The paper is organized as follows: Section 2 provides
the motivation for developing the DWP maturity (DWP-M)
model. Section 3 reviews the extant literature on maturity
models. Section 4 first presents the framework that has
been recently proposed for conducting design-science
research in IS, and then describes how we employed this
framework to design, develop, and evaluate the DWP-M
model. Section 5 discusses the contributions of our study
and Section 6 concludes the paper and identifies the future
research directions.

MOTIVATION FOR A DWP MATURITY MODEL

A data warehousing process is a set of activities that begins


with the identification of a need and concludes with
delivering a product that satisfies the need [60]. More

337

specifically, it is a set of activities, methods, practices, and


transformations that people use to develop, maintain, and
operate data warehouse and its associated products.
Many DWP tasks can be categorized as development tasks,
which revolve around the design, development, and
implementation of the data warehouse. Development tasks
in DWP include business requirements analysis, data design,
architecture design, data mapping, ETL design, end-user application design, end-user cube design, implementation, and deployment [25], [28], [60]. Data warehouses are geared toward
addressing the analytic questions of business managers and
executives, as opposed to processing routine transactions in
OLTP systems. Given the complexity inherent in such
analytics, special attention has to be devoted to designing
the right type of data marts, aggregates, and cubes in order
to promote ease of access and support efficient processing
of business queries. The firm also needs to be aware of enduser applications such as business intelligence (BI), data
mining, and customer relationship management (CRM),
which rely heavily on warehouse data.
As discussed later, it became clear from our interactions
with industry experts that a DW process also needs to focus
on the operations of the data warehouse. Operations tasks are
generally responsible for making sure that the data warehouse keeps functioning as designed. Operations tasks in a
DWP include metadata management, recovery management,
financial services management, data warehouse governance, data
governance, and service level management. Other operations
tasks in a DWP provide customer service/support consistently in a timely manner to the end users by supplying high
quality and valuable data. These tasks include supporting
business users, training business users, managing the technical
infrastructure, information delivery management, tuning for
database performance, and service level agreement [28].
We believe that a DWP is quite complex because it
includes many activities, a variety of tools (such as ETL,
metadata management, and end-user tools), and resource
coordination with these activities. Such activities are critical
in any DW implementation. For instance, a successful DW
venture entails having the right plan for managing metadata.
Metadata in a DWP can be broadly classified into three types:
operational metadata, extraction and transformation metadata, and end-user metadata [28]. Operational metadata
describes the operational data sources, while extraction and
transformation metadata contains information on the extraction
of data from source systems and its subsequent transformation in the staging area. End-user metadata provides a
navigational map for users to browse and find the information that they are interested in. The metadata in a warehouse,
therefore, is not only used for building the warehouse, but
also for using and administering the warehouse.
Data warehouses are time dependent, i.e., they can track
history. That is one of the major differences with operational databases, which store transient data and do not
typically maintain any history. From an operations standpoint, it is important to ensure that there are right
procedures in place to effectively execute the change
management strategies.
These complexities make us believe that the management
of a data warehousing process should follow the tenets of

338

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING,

VOL. 38,

NO. 2,

MARCH/APRIL 2012

TABLE 1
Comparison of Maturity Models from Different Domains

management of a software process. Good software


processes help produce better, cheaper software faster. In
fact, when either a defined process is not available or when
such a process is defined but not used the chances are that a
software project will fail [32]. Software process improvement has become an important challenge to modern
organizations. Maturity models such as CMM [49], Capability Maturity Model Integration (CMMI) [55], and ISO 9001
[48] play an extremely important role in helping software
companies achieve higher levels of maturity, in terms of
their development process and product quality. To be
considered mature, an organization needs to not only define
and use a software process, but to also evolve it
continuously [21], [50]. Using this as a basic principle, we
envision a continuously evolving data warehousing process
in an organization as one that promotes quality and timely
delivery of information.

EXTANT LITERATURE ON MATURITY MODELS

Maturity models have their roots in the field of quality


management [6], [8], [13]. The concept of maturity implies

progress from some initial state to a more advanced state.


The notion of evolution is implicit in the stages of growth,
suggesting that the progress transitions through a number
of intermediate states on the way to higher maturity levels.
In his Quality Management Maturity Grid (QMMG),
Crosby [8] describes the typical behavior exhibited by firms
at five levels of maturity with respect to various aspects of
quality management. The QMMG has a strong evolutionary
theme, suggesting that firms typically evolve through five
phasesuncertainty, awakening, enlightenment, wisdom, and
certaintyin their ascent to quality management excellence.
Crosbys framework has been adopted by many disciplines, including software development, project management, open source systems, web services, data governance,
service-oriented architectures, enterprise architectures, information quality management, database administration,
and IT services. To determine the inherent characteristics of
a maturity model, we analyze a representative sample of
maturity models (see Table 1) based on a set of attributes,
which include inherent maturity abstraction, focus of
maturity support, model benefits, model scope, related
technologies, and stakeholders. The inherent maturity

SEN ET AL.: A MODEL OF DATA WAREHOUSING PROCESS MATURITY

abstraction attribute illustrates the intrinsic maturity paradigm that is used to develop the model. The model can
focus on either process maturity or product maturity.
Process maturity concentrates on how a process evolves
toward maturity. Product maturity, on the other hand,
explains the evolution to better products. The model benefits
attribute describes the benefits obtained from a maturity
model. The model scope attribute describes the focus of the
maturity model. The related technologies attribute portrays
the technologies that are covered by the maturity model.
Finally, the stakeholders attribute describes the people or
groups affected by the maturity exercise.
Our analysis of the extant models reveals that maturity
models focus on process or product evolution. All models,
directly or indirectly, use Crosbys framework to abstract
their inherent processes even though the stages of evolution
can differ. The process maturity modelssuch as IT Service
CMM [42], [43], [57], IQ CMM [12], and NASCIOs EMM
[41]borrow heavily from CMM. Some, such as the project
management maturity model [24], [39], have their own
specific models since CMM is geared toward IS companies.
Others, such as OSMM [16] and DW Maturity Model [11],
follow the Chasm Model espoused by Moore [38]. Also, note
that Eckersons DW maturity model [10], [11] focuses on
product (data warehouse) maturity, not on process maturity.
Even though many types of maturity models have been
proposed for different contexts (see Table 1), they all share a
set of common features. We find that a maturity model
typically supports the three key features described below:
Feature-1: Maturity levels. The idea of levels originated
from Crosbys work. The number of levels in a model
typically ranges from three to six. Each level usually has a
descriptor that serves as a name for the level. The Capability
Maturity Model developed by the Software Engineering
Institute (SEI) for software development process reflects the
best practices in software development and emphasizes the
need to conduct periodic software process assessments and
introduce improvements. CMM advocates that continuous
process improvement be based on small, evolutionary steps
and provides a framework for organizing those steps into
five maturity levels [50]. The five levels are initial, repeatable,
defined, managed, and optimizing.
Many organizations provide IT services [51], [70] either
internally or externally. These services include software
maintenance, operating information systems/data centers,
running networks, and providing technical support. Customers of these services at times may not be able to express
their real service requirements and may not know the
performance needs. Quite often, service providers also do
not know how to assess their own capabilities with respect
to the delivery of IT services. According to Niessink et al.
[43, p. 12], Regardless of the exact circumstances in which
an IT service provider operates, sufficient emphasis should
be on processes...to be able to deliver quality IT service. To
address these problems, Niessink and van Vliet [42] and
Niessink et al. [43] proposed the IT Service CMM model
based on the CMM Version 1.1 framework. The scope of
this model covers all of the service delivery activities and
focuses on the maturity of the service organization. Like
CMM, this model also has five levels and does not measure

339

the maturity of individual services, but only that of the


entire service organization.
With respect to IT services, the Information Technology
Infrastructure Library (ITIL) provides a comprehensive
framework that is built around a process model-based view
of controlling and managing operations [7]. Its goal is to
help an IT organization understand how to deliver value
to its customers and for the parent/client organization to
better realize value from IT services. The ITIL framework
provides a set of best practices to help organizations achieve
enhanced efficiency and effectiveness in their IT service
management and realize the following objectives: 1) align IT
services with both current and future needs of business,
2) improve the quality of IT services, and 3) reduce the cost
of providing these IT services [65].
The primary focus of most organizations practicing ITIL
has been on two process groups: 1) service support and
2) service delivery [44], [45]. Service support consists of six
categories:
1. service request management,
2. incident management,
3. problem management,
4. change management,
5. release management, and
6. configuration management.
Service delivery consists of five categories:
1. service level management,
2. capacity management,
3. IT service continuity management,
4. availability management, and
5. financial management.
Feature-2: Key process areas. Each level in the maturity
model indicates a level of process capability. A level is
decomposed into a set of key process areas (KPAs) that an
organization should focus on to improve its process. The
levels and KPAs form a grid for a maturity model. All
maturity models invoke this grid approach [8] and provide
textual descriptions for the performance characteristics/
traits at each level. Each KPA includes a cluster of related
activities that, when performed, collectively achieve a set of
goals considered important for enhancing process capability. Processes at each level provide the foundation for the
higher level processes. In CMM and other maturity models,
KPAs differ from level to level. For example, requirements
management, software project planning, etc., form the
KPAs for the repeatable level (or level 2) in CMM. For higher
levels, organizational process definition, intergroup coordination, integrated software management, software quality
management, defect prevention, etc., form the different key
process areas.
Feature-3: Activities in key process areas. The objective
of each process area can be summarized by its key practices,
also known as activities. Each activity must have goals and
commitments. Key practices, according to Paulk et al. [50,
p. 39], describe what is to be done, but they should not
be interpreted as mandating how the process should be
implemented. Alternative practices may accomplish the
goals of the key process area. For example, in CMM,
activities for the requirements management KPA are requirements review, change requirements review, requirements induced
planning, etc.

340

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING,

DESIGNING A DWP MATURITY MODEL

Even though the concept of data warehousing process


maturity has been mooted in the literature [9], [30], [34],
[35], [60], [69], work in this area has been limited to a simple
specification of the levels and mapping of some of the
activities from CMM, without much serious theory-based
development. In this section, we develop a conceptual model
of DWP maturity by grounding our work in design-science
research [17], [18], [20] and in the literature on quality,
maturity models, and IT services.

4.1 Design-Science Research


According to Hevner et al. [20], the goal of the design-science
research paradigm is to extend the boundaries of human
and organizational capabilities by creating new and innovative artifacts (p. 75). Their framework for design-science
research provides a set of guidelines. The first guideline is
that design-science research should result in an artifact,
which could be a construct, model, method, or instantiation.
But design is also a process [20], [68], comprised of a set of
activities that produces the artifact. Constructs provide the
symbols and vocabulary for defining and solving problems
[20]; they are the representations of the entities of interest
[18]. A model uses the constructs to represent the design
problem and its solution space [62]. A method defines the
process for searching through the solution space. Finally, an
instantiation is the implementation of the constructs, models,
or methods in a working system.
The second guideline Hevner et al. [20] present relates to
problem relevance. Design science efforts should be relevant
to the practitioners who plan, manage, design, implement,
operate, and evaluate information systems and those who
plan, manage, design, implement, operate, and evaluate the
technologies that enable their development and implementation (p. 85).
The third guideline focuses on design evaluation. The
designed artifact could be evaluated based on metrics such
as functionality, completeness, consistency, accuracy, etc.
Because design is inherently an iterative activity, the
evaluation phase provides the necessary feedback to the
construction phase on the quality of the design process and
the design artifact being developed [20].
The fourth guideline deals with research contributions. To
be effective, design-science research must make clear
contributions with respect to the design artifact, design
foundations, or design methodologies. The most frequent
type of contribution is the artifact itself, which helps to
address unresolved problems and provides significant
value to the target community.
The fifth guideline is research rigor. Design-science
research should apply rigorous methods for developing
and evaluating the artifact. But, as Hevner et al. [20] argue,
research rigor should not be emphasized at the expense of
relevance. Rigor is introduced by basing the research on
theoretical foundations and through the effective use of
research methodologies.
The last two guidelines address search and communication
issues. Design involves searching a very large space for a
satisficing solution [62]. Heuristic strategies are usually
employed to make the search process manageable. Finally,

VOL. 38,

NO. 2,

MARCH/APRIL 2012

the results of design-science research must be presented


effectively to both technology-oriented and managementoriented audiences.
In addition, a design theory should have a purpose and
scope, stating what the artifact is for [18]. Our aim, in this
research, is to design and develop a data warehousing
process maturity model. But the question that might arise
is why we need a separate DWP maturity model when we
already have CMM and its variants for software development. A software development process is defined as a set
of activities, methods, transformations, and practices that
people employ to develop and maintain software and its
associated products [21], [22], [23]. It includes activities
such as requirements analysis and definition, software design,
implementation, system testing, and maintenance. In addition
to these, a data warehousing process includes tasks that
are quite different, such as data staging, metadata change
management, metadata quality management, data warehouse governance, end-user cube design, information
delivery management, etc. Even the requirements analysis
process for DW development is quite different. As
Kimball et al. [28] note:
The approach used to gather knowledge workers analytic
requirements differs significantly from more traditional,
data-driven requirements analysis. Data warehouse designers must understand the key factors driving the business
to effectively determine business requirements and translate
them into design considerations (p. 34).

In their business dimensional life cycle approach, the focus


is on the analytic requirements elicited from business
managers and executives for designing dimensional data
marts. A DW process is thus quite different from a
traditional software development process, thereby necessitating a separate DWP-M model, focusing on both data
warehouse development and operations.
Recognizing the distinct aspects of a DW process, we
also adapted Humphrey and Kellners [23] prescriptions for
a process model to the DW context, in addition to following
the guidelines for design-science research. While developing the model, we addressed the following questions:
1.
2.
3.
4.

What are the benefits of a mature DWP?


What do DW managers and practitioners perceive
DWP maturity levels to be?
What are the key process areas for each of those
maturity levels? and
What are the activities in each of those process
areas?

4.2 Initial Model


As discussed in Section 4.1, the designed artifact could be in
the form of a construct. Constructs are the representations of
the entities of interest. The ultimate artifact we are
interested in designing is the DWP maturity model. Given
the complexity of a DWP, the development of this model
necessitates the design of intermediate constructs. First, we
need to design the maturity construct. Second, because
maturity models typically define a number of levels, we
also need to define the maturity level construct, which
represents how mature a DWP is. Third, we need to specify
a set of KPAs at each maturity level, so we also have to
design the KPA construct. Finally, each KPA includes a

SEN ET AL.: A MODEL OF DATA WAREHOUSING PROCESS MATURITY

set of activities that need to be performed as part of a DWP,


so we will have to design the activity construct as well.
There are, therefore, four constructs for the DWP-M model:
maturity, maturity level, KPA, and activity. But, as noted
before, design is also a process. In this section, we describe
the development of the DWP-M model in terms of the four
constructs and the process.
Mullins [40] mapped the levels in CMM to levels based
on how data are managed within organizations; others too
have echoed similar ideas [9], [30]. Marco in a series of
articles [33], [34], [35] and Watson et al. [69] also
emphasized the need for a CMM-like maturity model for
a DWP. Marco applied CMM to data warehousing and
developed six levels of DW maturity, from Level 0 (not
performed) to Level 5 (continuously improving). Based
on the extant literature on maturity models in different
fields (discussed in Section 3) and the work in [9], [30], [33],
[34], [35], and [40], Sen et al. [60] provided an initial set of
specifications for a maturity model. The five levels of their
initial model are described below. This initial model only
specifies the levels and does not describe the KPAs and the
associated activities.
Level 1 (Initial). A level-1 process has no strict rules or
procedures for data management [33], [34], [35], [40]. Along
similar lines, a data warehousing process at level-1 maturity
lacks strict rules or procedures. Data resides in multiple
files and databases using multiple formats. Redundancy is
rampant. Independent, nonconforming data marts are likely
to be very common at this level. Changes are typically made
on the fly as requested by the application program
development unit. The quality of the data depends on the
skills of the technical programmer analysts, database
analysts and designers, and coders. Groups take on large
and complex projects with little knowledge of their impact,
resulting in project cancellations or warehouses with lowquality data and reports. At this level, redundant data marts
are often created, in addition to process and technology
redundancy [34]. DW projects at level 1 tend to be
expensive; while some are successful, many fail badly.
Level 2 (Repeatable). An organization at level-2 DWP
maturity has a data management policy that specifies how
and when data structures are created, changed, and
managed. Although a policy is in place, it has not been
institutionalized [40]. This level witnesses fewer independent data marts than in level 1 [35]. A database administrator
(DBA) is usually assigned at this level. Some standard
practices such as managed schema changes, performance
monitoring, and database tuning are performed at this level.
Some organizations at this level may have several data
warehousing initiatives and associated activities. Some of
these initiatives may have robust plans and may track the
data warehousing efforts. Although repeatable processes
exist for a department or a line of business, they are followed
by that group, but not by the entire organization [40].
Level 3 (Defined). An organization at this level has a stated
policy of treating data as a corporate asset. Best practices for
developing, maintaining, and operating the data warehouse
are documented and used across the enterprise. The data
management policy becomes a core component of the
application development life cycle [40]. The policy is enforced

341

and tested to ensure that data quality requirements are met. A


level-3 organization typically understands the business
meaning of data and creates a data administration (DA)
function in addition to the DBA function; there is usually a
good interaction between these two functions and an
appropriate use of DW tools. Usually there are very few
independent data marts, and more projects tend to succeed
than fail [35].
Level 4 (Managed). An organization at this level
introduces a managed metadata environment [40]. This
enables the data management group to catalog and
maintain metadata for corporate data structures. The
organization starts conducting data audits to measure data
quality. Measurable process goals are established for each
DW process [35]. Quantitative/statistical techniques are
used to analyze the collected measurements. DW projects
are consistently successful and the organization can predict
their future performance with reasonable accuracy [35].
Level 5 (Optimizing). An organization at this level uses
practices learned in levels 1 through 4 to continually
improve data access, data quality, and data warehouse
performance [40]. Very low levels of data, process, and
technology redundancy exist; any remaining redundancy is
well documented and understood [35]. The organization
aligns its processes with its strategic business goals and
tries to optimize its investments in data warehousing.
The DWP-M model that we propose includes and builds
on the five levels discussed above. Each of the KPAs that we
identify in this study is assigned to a specific level. KPAs for
development and operations are captured in the model.
Based on whether or not a firm performs the activities
within each KPA at a given level, its practices are deemed to
conform or not conform to that level.

4.3 Knowledge Acquisition


We interacted with DW managers and practitioners to elicit
the knowledge required for developing the model. Specifically, we conducted multiple brainstorming sessions and
interviews with key DW professionals from industry to
identify, analyze, and understand the maturity levels, as
well as the key process areas and associated activities for
each level.
Three workshops were organized in June 2003, February
2004, and June 2004. Invitations were extended to a number
of medium to large-sized US corporations. Knowledgeable
DW executives from 13 companies volunteered to participate. These companies covered the manufacturing (aviation, electronics, and computer), retail, service (hospital,
insurance, rental agency, and banking), and e-tailing
(Internet travel agency) industry sectors. The group
included data warehouse sponsors/users (covering business intelligence, e-intelligence, and data mining) and data
warehouse managers (covering ETL developers, data warehouse administrators, and end-user developers). The growing importance of the IS-business relationship function and
the emergence of IS as a service provider [42], [43]
prompted us to gather feedback from DW executives
working for a large cross section of industries and playing
diverse roles at different levels in their organizations. The
participants had, on average, five to seven years of
experience in data warehousing, and their job titles

342

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING,

included managers, directors, and vice presidents. The


panel consisted of 20 members from the 13 participating
companies. These members had either managed, used, or
sponsored very large data warehouse projects ranging from
600 gigabytes to 700 terabytes in size. The database
management systems used by these companies included
Teradata, Oracle, and SQL Server. Overall, the group
represented extensive expertise, experience, and diversity
in perspectives.
Information acquisition was done in multiple modes,
depending on the intent. At the workshops, we used
brainstorming sessions [36] with the group to generate ideas
for model conceptualization. Our objectives in this phase
were to assess the potential value of a DWP maturity model,
and to identify its maturity levels and the key process areas.
These sessions also exposed us to the DWP tasks in diverse
business domains and helped us develop the DWP-M model.
We used the consensus decision-making mode to evaluate the
evolving model. Such types of techniques are very useful after
brainstorming with multiple experts [36]. Consensus decision making attempts to find the best solution to a problem by
letting the group weigh in the advantages and disadvantages
of each alternative solution. We accomplished this by
collecting the panelists judgments and votes on different
process areas in the DWP-M model at the end of each
workshop. We also subsequently sent e-mails to the panel
members to solicit their detailed opinions on various topics.
Last, we used the concept-sorting mode [36] to flesh out
the key process areas and the corresponding activities for
each maturity level. This mode of knowledge acquisition is
useful once the maturity model is outlined and the main
key process areas have been identified.

4.4 Evolution of the DWP-M Model


Designing the DWP-M model involved searching a large space
of possible solutions. We describe below the method we used
to make the search process manageable. The initial model that
we had designed went through several rounds of changes
and evolution based on feedback from industry experts.
We presented the initial version of the DWP-M model to
the industry group at the first brainstorming session in a
workshop held in June 2003. We had asked the participants
about the key outcomes of a mature data warehousing
process. We present below a summary of their consensual
views on the key outcomes, along with their rationale:
1.

2.

3.

Predictability of data warehouse project duration. One of


the main reasons of data warehouse failures is that
the project durations are not met. Data warehouse
projects tend to be very expensive [30]. A mature
DWP would help to develop an ability within the
organization to create/deliver projects with predictable durations.
Ability to perform better data analysis. The major
objective of a data warehouse is to provide highquality data with a single version of the truth so
that extensive analytics can be performed. A mature
DWP would enable and facilitate better data analysis.
Good documentation. Data warehouse projects and
operations involve many tools and people, as well
as heterogeneous data sets. Coordination and
intergroup communication are vital for these kinds
of projects. A mature DWP provides a good set of

VOL. 38,

NO. 2,

MARCH/APRIL 2012

documentation at every step, which is vital for the


success of the project.
4. Trust in the data. Companies cannot survive with bad
data in todays environment. With a variety of
source systems and reporting/analysis tools, it is
absolutely necessary to have trust in the quality of
data. DWP maturity would force the organization to
develop good data quality and data governance
strategies so that the users and sponsors have trust
in the data and the reports.
5. Satisfied user base. A mature DWP will satisfy the user
base with high data quality and foster user trust.
6. Verifiable ROI. One of the biggest problems for the
data warehousing group within an organization is to
get funding for data warehouse projects. A mature
DWP will make the job of getting funding easier
because upper management will have a higher level
of confidence in the data warehouse teams ability to
deliver value.
7. Ability to see process improvement. The software
industry has made great strides in using the CMM
over the last two decades. From just a couple of firms
at CMM maturity level 5 (the highest of the five levels)
only two decades ago, there are now over a hundred
organizations at that level [56]. The success of these
firms has been attributed to the fact that they have
adhered to sound software engineering principles
and practices. Compared to the general field of
software engineering, data warehousing is a relatively new discipline. To better understand the
problems afflicting DW implementations, the DW
community needs to try to focus on data warehousing
as a process, similar to what the software engineering
field has done. Such an endeavor would go a long
way in helping organizations to better address DW
problems and overcome failures.
More brainstorming sessions with the group were undertaken in a February 2004 workshop using the initial model.
The group session lasted for over two and a half hours. The
objective of this workshop was to allow the group to identify
as many relevant KPAs as they believed were necessary
using the initial version of the model as a starting point.
A first version (version 1.0) of a 5-level DWP maturity
model focusing on the development process was developed
in May 2004. This version was presented to the group of
around 20 experts in data warehousing at a workshop held
in June 2004. Once again, the group session lasted for over
two and a half hours. As with the earlier workshops,
brainstorming was the principal technique in the beginning
phase of the group session. The objective in this workshop
was to not only evaluate the comprehensiveness of the
DWP maturity model in terms of KPAs, but also to get a
sense of appropriateness in assignment of these KPAs to the
maturity levels. A number of suggestions emerged for
(re)assignment of the KPAs.
We developed descriptions for each level of maturity and
a tentative list of 29 KPAs (vis-a`-vis 19 KPAs in CMM) for
levels 2 through to 5, along with activities to be performed
within each KPA as shown in Table 2. This resulted in a
total of 157 activities covering the 29 KPAs43 activities
covering 10 KPAs at Level 2; 51 activities supporting
10 KPAs at Level 3; 27 activities covering 4 KPAs at Level 4;
and 36 activities covering 5 KPAs at Level 5.

SEN ET AL.: A MODEL OF DATA WAREHOUSING PROCESS MATURITY

343

TABLE 2
Data Warehousing Maturity Levels and KPAs (FIRST Version-June 2004)

We sent a second version of the DWP maturity model


(version 1.0.1) to each of the participants during July and
August 2004 via e-mail with a request to take a detailed
look and provide their feedback. Eighty percent of the
original participants responded with comments/critiques
on version 1.0.1.
At the end of these major sessions, we found that there
were still quite a few conflicting ideas in terms of the
KPAs, particularly in where and how to assign those
KPAs. After another round of revisions, we developed a
third version of the model (version 1.1). While moving to
version 1.1, we developed several new insights. First, our
DWP-M model actually follows the level concept
espoused by other maturity models (see Section 3). Second,
our assumption that DWP is a process that is quite

different from software engineering, IT services, and other


processes got validated. Third, as expected, we found that
some tasks at each of the DWP-M levels share similarities
with tasks in other maturity models.

4.5 Design Evaluation


As noted earlier, one of the guidelines of design-science
research is that the designed artifact should be evaluated
and that the evaluation should provide the feedback needed
to refine the artifact being developed [20].
In the evaluation phase, it was necessary to verify
whether the DWP-M model (version 1.1) that we had
developed was really comprehensive enough, represented
the right KPAs and associated activities, and could form a
basis for progress within firms adopting this model for

344

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING,

more consistent, repeatable, and better development and


management of DW projects.
One of the researchers made site visits during the fall of
2004 to four of the 13 companies that had participated in the
workshops for individual face-to-face meetings. Two of
them were from engineering, one from retail, and one from
service industry sectors. These firms were chosen to get a
reasonable representative sample of DWP efforts, which
ranged from two years to more than five years. During a
typical visit, the researcher met with three to four members
of the company for one-on-one interviews, each lasting for 2
to 3 hours.
We made significant modifications and refinements to
the model based on the suggestions of the interviewees.
This was the first time that the participants were evaluating
the entire model instead of brainstorming the model
specifics. Many interesting observations were made, starting from simple evaluations to in-depth remarks. Simple
evaluations included critically studying the DWP tasks at
each level and their eventual positioning in one of the
maturity levels. The in-depth remarks were directed more
toward a major change in the philosophy and anchoring of
the model building process. For example, participant mgr-1
in eng1 company (names of the companies and the
executives are suppressed to protect their identities)
suggested, ...we will address these levels...how do you
check these levels? ...they seem kind of little different from
the project side versus the operations side... Another
participant, mgr-2, also in eng1 company, supported this
delineation of project and production as we may be...in the
project side...in level 5, but in the production side we may
be in chaos. Similar thoughts were also echoed by others
participating in the interview process. In essence, seeds
were being sowed for a radical departure for the model to
include DW operations in addition to the processes
emphasizing DW development.
We had initially focused on the DW development
process. Similar to CMM, version 1.1 of our DWP-M model
captured only the development aspects. Although we had
suspected that DWP might need to go beyond mere DW
development, we had not specifically considered these
aspects in the design of our DWP-M model. However,
during the on-site interviews, it started unfolding that,
compared to traditional software development, data warehouses require much more continued postdevelopment
support in the form of DW operations and customer service.
Furthermore, a DW environment involves multiple stakeholder groups at all times.
Recognizing the need for significant changes to the
model with respect to customer support and services, we
reviewed in greater detail the theory in services and
relationship marketing [47]as well as its application in
the general IS literature [26], [27], [51]and in IT services
[42], [43], [65], [66]. Drawing upon ideas from those studies
and incorporating the feedback we received during the onsite interviews, we developed a fourth version of the DWP-M
model (version 1.2). For instance, a number of aspects of the
ITIL framework are covered by the operations/service
KPAs of our DWP-M model (see Fig. 2 later). For example,
KPA 2.10 accommodates ITILs service support aspects 1a,
1b, and 1c; KPA 3.18 covers ITILs 1e and 1f; KPA 4.5 covers
ITILs service delivery aspects 2a; KPA 3.16 covers ITILs 2b;
KPA 3.4 covers ITILs 2c, 2d, and security management;

VOL. 38,

NO. 2,

MARCH/APRIL 2012

KPA 4.6 covers ITILs 2e; KPA 3.7 covers ITILs ICT
infrastructure management; and KPA 3.17 covers ITILs
software asset management.
The DWP-M model, however, is substantially different
from ITIL. First, its focus is on both development and
service, unlike ITIL, which focuses primarily on service
activities. Second, our model places heavy emphasis on
continuous improvement, including prevention (in maturity level 5), while some research suggests that ITIL needs
to be complemented by six-sigma techniques to bring an
engineering orientation and lean techniques to promote
continuous improvement [72]. Third, the focus of our model
is exclusively on the data warehousing process, rather than
on IT governance in general. Finally, unlike our DWP-M
model, which requires a clear progression through the five
maturity levels and associated practices, ITIL can be
implemented on an as-needed basis, focusing on those
parts of the IT service delivery and management processes
that are broken [46].
In the new version of the model, the number of KPAs
increased from 29 to 40, and the total number of activities
covering those KPAs increased from 157 to 221, focusing on
both development and operations/customer service aspects. More than the increase in the number of KPAs and
activities, the scope of the model expanded to include DW
operations and services. There were a number of KPAs that
were common across these two aspects. At this point in
time, it was also necessary to identify and decide which
category of stakeholders would need to interact with the
DW teams in the real world in light of the different
emphasesdevelopment and operations/support services.
A fourth workshop was organized in March 2005.
Invitations were extended to the same companies that had
participated in the three earlier workshops. This was the
first time that the entire group of participants would see a
very different model with two different emphases and a
number of new KPAs.
Based on their comments and critiques, a fifth version
(version 1.5) was developed and presented to the panel of
DW experts via e-mail in the summer of 2005. The
participants were asked to propose, discuss, and arrive at
a consensual assignment of each KPA to DW development
staff and/or DW Operations Staff, and identify their
connections with the Business Users.
About 60 percent of the participants responded with
comments and critiques on version 1.5 of the model. After a
major cleanup, we created a sixth version (version 2.0) and
organized a fifth workshop in September 2005. Eight of the
original 13 companies and 12 experts from these companies
participated this time. The group session, as before, lasted
for about two and a half hours. The participants again
engaged in brainstorming to generate additional KPAs/
activities and ideas with respect to KPA assignment.
Based on the feedback, we revised the model. A seventh
version (version 2.1) was e-mailed to the participants. We
received individual feedback from a subset of the
participants this time. We created another revised DWPM Model (version 3.0) and sent out invitations for a sixth
workshop in March 2006. Nine companies and 10 experts
from these companies participated. Version 3.0 of the
DWP-M Model was presented to the group of experts
with a request to examine each maturity level and the

SEN ET AL.: A MODEL OF DATA WAREHOUSING PROCESS MATURITY

assigned KPAs (and their associated activities) in detail


one more time. At the end of this session, the group
appeared to have reached closure. While assigning the
KPAs to different maturity levels, the workshop participants also identified the dependencies among them. We
describe those relationships below.

4.5.1 Relationships among Process Areas


In this section, we describe how we developed and
captured the relationships among process areas. The
relationships depict the interactions and help us understand
how a KPA builds on other KPAs. As in CMMI, an
interaction in our model shows the flow of information and
artifacts from one KPA to another, and the prerequisite
KPAs that need to be satisfied before a KPA can be
implemented successfully.
We acquired the knowledge of interactions among KPAs
from the panel of experts through various channels, including brainstorming sessions, teleconferences, and e-mail
discussions. The panel helped us develop the relationships
in multiple steps and over multiple sessions. First, we
developed the relationships among KPAs for levels 2 and 3,
then the relationship for levels 4 and 5, and finally the
relationships between KPAs across different maturity levels.
Based on the inputs from the expert panel, we developed the
relationships between KPAs. At each subsequent workshop/
teleconference/e-mail session, the experts evaluated the
existing set of interactions and provided feedback to modify
and extend the model.
In Fig. 1a, we show the interdependencies among KPAs
at Level 2 of the DWP-M model. The information relationships (shown as solid arrows) are annotated with the
documents and data that flow between the KPAs. The
activities of these KPAs are performed to create a compelling business case to scope out and plan for a DW project.
DW sponsor assurance (KPA 2.1) targets potential sponsors to
generate awareness and interest in DW. Based on assurances from the sponsors, plans for performing DW tasks
and managing the DW process are established (KPA 2.2:
DW program planning). Senior management sponsorships for
DW projects are obtained and guidelines for scope, time,
and budget are established. Based on these inputs, the
Business justification KPA (2.4) identifies the goals and
objectives for DW projects and develops a business case,
which includes benefit, cost, and time estimates. The
business justification is then used to garner and sustain
interest in DW within the organization.
Fig. 1b shows several information and prerequisite
relationships (shown used dashed arrows) among some
KPAs at Levels 3 and 4. These KPAs include activities for
data warehouse governance and service level assurance.
The DW governance KPA (KPA 3.14) outlines the decision
rights and underlying processes to ensure consistent
enforcement of the accountability framework. As the
governance structure needs to be communicated to different organizational units, a list of stakeholders should be
available from the Stakeholder management process KPA
(KPA 3.5), along with communication plans developed by
the Intergroup coordination KPA (KPA 3.8). The DW
governance KPA is also responsible for enunciating and
enforcing the DWP principles created by DWP definition
KPA (KPA 3.1), as well as enforcing organizational
activities described by Organizational process focus

345

(KPA 3.2). Using resource plans from Resource management


(KPA 3.17), the DW governance KPA develops and proposes
DW investment and modernization plans. To develop
service commitments in Service level agreement (KPA 3.13),
one has to know the governance structure and recovery
plan (Recovery managementKPA 3.4) so that it is clear
whom to call in case of emergencies.
Fig. 1b also shows the KPAs at Level 3 that need to be
satisfied before a couple of KPAs at Level 4 can be
implemented. For example, Quantitative process management (KPA 4.2) has to ensure that measurement data are
collected as per the procedures identified in the quality
plans for KPAs 3.13 and 3.15. Similarly, the measuring
domains established by the SLA in KPA 3.13 must be
adhered to before Service level management (KPA 4.5) can
be implemented.
After all the KPAs and their relationships were identified, the research team sent the refined maturity model
(Version 3.1) to the participants to get their final confirmation of the accuracy and fidelity of the captured knowledge.
Feedback from the participants indicated that theoretical
saturation had been reached; this latest version was agreed
upon as the final version of the DWP-M model. The final
version of the model consists of a total of 41 KPAs and
219 activities with 11 KPAs and 47 activities in Level 2;
19 KPAs and 104 activities in Level 3; 6 KPAs and
35 activities in Level 4; and 5 KPAs and 33 activities in
Level 5. The final DWP-M model (Version 3.2) is shown in
Table 3 and Fig. 2. The details of these KPAs and their
assignment to each level and across the two aspectsDW
development and DW operations/customer service and
supportand illustrations of a few KPAs and their
associated activities are provided in the Appendix, which
can be found in the Computer Society Digital Library at
http://doi.ieeecomputersociety.org/10.1109/TSE.2011.2.

4.5.2 Validation of the Final DWP-M Model


Evaluation consists of both verification and validation. We
have already described the verification process, which
entailed checking if the model has been developed
correctly. Validation, on the other hand, focuses on
determining whether the model satisfies the users needs.
The International Organization for Standardization (ISO)
has developed a set of HCI standards, which include those
for usability and product quality. Usability is defined as
the extent to which a product can be used by specified
users to achieve specified goals with effectiveness, efficiency, and satisfaction in a specified context of use (ISO
9241-11). The term quality in use is also used to cover this
broader objective of usability. It measures the degree of
excellence, and can be used to validate the extent to which a
product meets the needs of the users [3]. Its components
include functionality (effectiveness), productivity (efficiency), and usability (satisfaction).
Functionality is assessed based on measures such as
accuracy, completeness, and suitability. Productivity is
determined based on the time, mental effort (ease of use),
and resources needed to complete a representative task
using the product. Usability is defined in terms of understandability, learnability, operability, and attractiveness.
Based on the ISO standards, we developed an instrumentwhich includes metrics for functionality, usability,
and productivityfor validating the quality in use of the

346

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING,

VOL. 38,

NO. 2,

MARCH/APRIL 2012

Fig. 1. Relationships among KPAs in DWP maturity model (partial): (a) Relationships in level 2. (b) Relationships in levels 3 and 4.

DWP-M model. The instrument consists of 25 statements,


including those for assessing functionality/effectiveness,
such as:

and also several statements for assessing productivity and


usability of the model, based on metrics for ease of use,
satisfaction, and understandability, such as:

The KPAs in Level X have been correctly assigned.


The DWP-M model consists of the KPAs needed to
correctly define DW process maturity.
The KPAs in the DWP-M model are effective in
determining DW process maturity.
Overall, the DWP-M model is effective at communicating the activities that my organization needs to
perform in key areas to achieve higher levels of
process maturity.

It takes a lot of mental effort to assess if my


organization satisfies a KPA in the DWP-M model.
. The description of the KPAs in Level X is not easy to
understand.
. The rationale for assigning the KPAs to their
corresponding levels is not difficult to understand.
. Overall, I am satisfied with the DWP-M model.
We sought the services of three separate data warehousing experts for validating the final model. The first

.
.
.
.

SEN ET AL.: A MODEL OF DATA WAREHOUSING PROCESS MATURITY

TABLE 3
FINAL Version of Data Warehousing Maturity Levels and KPAs (Distinguishing Development and Operations)

347

348

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING,

VOL. 38,

NO. 2,

MARCH/APRIL 2012

Fig. 2. Data warehousing process maturity model and KPAs (FINAL version).

expert, a Director of Data Warehousing at one of the largest


web-based travel companies, was also one of the key
persons involved during the knowledge acquisition phase
of model development. The two other experts were not
involved in the model development at all. Because they had
not seen the model before, their assessments can be
considered to be totally unbiased. One of them is a BI
Analyst at a major US-based manufacturing firm and the
other is Director of BI & Data Warehousing at one of the
largest healthcare companies in the US.

All three experts were provided with detailed documentation of the DWP-M model. The experts were asked to
respond to each of the 25 statements on a 5-point Likert
scale, with 1 being strongly disagree and 5 being
strongly agree. To guard against mechanistic responses,
several questions were framed in the negative (e.g., The
DWP-M model is not helpful in assessing my organizations DW
process maturity). We determined the interrater reliability
among the three raters using the intraclass correlation
coefficient [37] using SPSS. The average measure for the

SEN ET AL.: A MODEL OF DATA WAREHOUSING PROCESS MATURITY

intraclass correlation coefficient for the subjective scores


assigned by the three raters is 0.901, which is significant at
the p < 0.000 level. The coefficient is higher than the normal
threshold of 0.80, indicating that there is a high degree of
reliability between the scores assigned by the three experts.
We followed the practice of past assessment studies
which employed an overall measure of usability. For
example, Brooke [5] developed the System Usability Scale
(SUS), which provides a global view of subjective
usability assessments. SUS computes a composite measure
representing overall usability by summing the scores on
each item.
We developed our instrument in a similar fashion, but
instead of using a 10-item questionnaire, we made it more
comprehensive by including 25 items. We determined the
overall quality in use of the DWP-M model, which is a
composite score based on the individual items. The overall
score is computed by summing the scores across all the
25 items, with each item contributing between 0 and 4 (for a
positive item, the contribution toward the overall score is
the item score minus 1, and for a negatively phrased item,
the contribution is 5 minus the item score). The overall
quality in use score therefore ranges from 0 to 100. Based on
the responses of the three experts, the overall quality in use
scores for the DWP-M model are 73, 75, and 74. These scores
are quite high when one considers the fact that human
respondents are typically averse to rating any product at
extreme ends of a scale. The average quality in use score is
74, providing strong evidence of the external validity of our
modelin the eyes of DW expertswith respect to
functionality, productivity, and usability.
We also asked the experts to provide their estimates of
the time it would take their organizations to transition to the
next higher level of maturity (as described in the DWP-M
model). The first two experts indicated it would take two
years, while the third expert indicated it would take one
year. The interesting thing to note is that both of the first
two experts rated the current DWP maturity of their
organizations at Level 2, while the third expert rated his
at Level 1. The longer time frame estimates of experts 1 and
2 could be because it takes more time to ascend up the
maturity ladder at higher levels.
Finally, we asked two questions to solicit the experts
responses on whether achieving higher levels of DWP
maturity would help their organizations: 1) implement DW
projects more consistently with respect to time, cost, and
quality targets and 2) reduce the overall cost of providing
correct information and DW services over the long term. All
three experts were in almost unanimous agreement and
rated the statements very high (4.67 on a 1-5 scale),
implying strongly that higher maturity levels would result
in significant cost savings.

DISCUSSION

In this paper, we have described the development of a


design artifactthe DWP-M model. A data warehousing
process revolves around data. The DWP-M model reflects
this emphasis by focusing on activities geared toward data
extraction, data transformation, data loading, data analysis,

349

data change management, metadata management, data


quality assurance, data warehouse governance, etc.
In instances where KPAs overlap between DWP-M and
CMM/CMMI, there could be a KPA in CMM that is not
assigned to the same level in DWP-M. For example, the
configuration management KPA, which is at level 2 in
CMM, is at level 3 in the DWP-M model. It is important to
note that the objective of configuration management in data
warehousing is different from that in software development. In software development, the process involves,
among other things, products like use cases, object classes,
messages, operations, packages, components, etc., which
are closely linked to one another. On the other hand, a DW
process involves diverse products, such as ETL scripts,
operational data, historical data, operational metadata, ETL
metadata, end-user metadata, ETL scripts, OLAP cubes, etc.,
which need to be configured. These DW products belong to
development and/or operations, and are used differently in
many subprojects with their own lifecycles. These subprojects are usually staffed by different teams that can be
dispersed in different parts of the world. For example, one
of the experts in the panel belongs to a company where ETL
projects are done offshore, while reporting services projects
are done in the US. Configuration management in DWP
integrates multiple heterogeneous components and projects
across different sites, and is quite different from and more
advanced than configuring software.
Note, however, that some activities of the configuration
management KPA in CMM are indirectly supported by
level 2 KPAs in DWP-M, such as DW program planning,
scope design and verification, and issue tracking. It is also
important to note that the interdependencies among KPAs
(see Fig. 1) helped us to group them and assign them to
appropriate DWP maturity levels. For instance, configuration management, along with integrated infrastructure
management (KPA 3.7) and DW product engineering
(KPA 3.10), interacts with alignment of architecture
(KPA 3.6), necessitating that it reside in level 3. In contrast,
the KPAs at level 2 are more fundamental than configuration management, focusing on activities such as sponsor
assurance, program planning, project planning, requirements management, scope design, etc.
Another example of a disparity between level assignments in CMM and DWP-M is the assignment of the data
change management KPA to level 4 in DWP-M. This is in
contrast to CMM, where change management KPAs are
usually assigned to level 5. The metadata change management KPA is assigned to level 5, adhering to the
convention of placing metadata one level higher than data,
and metamodels one level higher than models in the OMG
Meta Data Architecture [52].
The DWP-M model development process went through
several rounds of rigorous knowledge acquisition and
evaluation sessions. The iterations allowed us to obtain
critical feedback from the industry experts during the
evaluation to extend and refine the construction of the design
artifacts. We rigorously verified each version of the model
using multiple approaches (see Section 4.5). We complemented the on-site visits with group processes involving
multiple industry experts from different organizations. In
essence, individual site visits (and the multiple detailed
interviews that were individually conducted on-site) were

350

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING,

interspersed with group-based processes involving industry


experts at several workshops, organized especially to capture
group dynamics and collective learning. Such an approach
served as a very powerful mechanism for evaluation of the
intellectual capital underlying the final DWP-M model.
Drawing upon ISO standards relating to functionality,
productivity and usability, we also had the DWP-M model
validated by DW experts. These validation results provide
strong support for the quality in use of our model, in
terms of functionality, productivity, and usability. Another
important finding is the consensus among the experts that
higher levels of DWP maturity would help an organization
in implementing its DW projects more consistently based on
time, cost, and quality targets, and would also help it
reduce the overall cost of providing services.
Our work should be viewed as an initial attempt to
develop a design theory [18], [68] for a data warehousing
process. We have defined in detail the constructs needed to
represent the entities of interest in the DW domain and
composed the maturity artifact based on those constructs.
We clearly specified the models purpose and scope, or its
metarequirements. Our initial design theory provides an
architecture, albeit partial, in terms of maturity levels and
KPAs. It lays the groundwork for future implementation as
a full-scale DWP maturity assessment system.

CONCLUSION AND FUTURE DIRECTIONS

The main contribution of this study is in the development of


an innovative artifactthe DWP-M modelwhich addresses the pressing issues associated with a DWP. The
model defines several KPAs and activities, which would
enable a firm to examine its DWP, identify the problems,
and help it to attain a higher level of maturity by addressing
those problems. These KPAs and activities are also design
artifacts, more specifically the constructs or representations
of interest in the DWP maturity domain. The DWP-M
model also captures the relationships among process areas,
depicting the interactions among the KPAs in terms of
information flow and prerequisite KPAs.
The DWP-M model addresses several important and
relevant problems that organizations face in their DW
initiatives, including those related to data quality, data
changes, metadata management, data warehouse governance, trust, and end-user satisfaction. The model has a total of
41 KPAs, several of which are unique to a data warehousing
process. For example, it includes KPAs such as DWP
definition, business metadata management, DW product
engineering, information delivery management, DW governance, integrated metadata quality management, data
change management, metadata change management, DW
technology change management, etc., that are germane to
DW process maturity but that fall outside the scope of
traditional software process maturity assessment.
We proposed that an immature DWP could be a major
reason behind the failure of so many DW initiatives. By
providing consistent, high quality, and single-version of
the truth data in a timely manner to business managers
and executives, a mature DWP could mitigate large-scale
failures. But because of its nascent stage, the concept of
DWP maturity has hardly been addressed.

VOL. 38,

NO. 2,

MARCH/APRIL 2012

A mature DWP promotes the ability to effectively and


efficiently manage data warehouse development and its
operations. It accurately communicates the DWP steps so
that development, operations, and service personnel can
carry out activities in conformity with a planned process. In
a mature setting, DWP steps are systematically enforced
and documented, and there is scope for continuous
improvement. There is organization-wide involvement
and managers are able to monitor and predict the quality
of DW products and the processes that produce them on an
objective basis. The comprehensive DWP-M model we have
presented in this paper would help organizations effectively address the problems they face with respect to
immature DW processes.
DW compatibility and complexity can influence the
implementation and diffusion, or spread of use, of data
warehousing within an organization [53]. Technical incompatibilities with respect to standards, data modeling, data
staging, and platforms can negatively influence DW implementation and diffusion. Deploying a data warehouse in
the workplace may not only change the operational processes
and individual work roles, but it may also require major
rewrites or reprogramming of existing systems, thereby
increasing the complexity manifold. To address issues
related to DW compatibility and complexity, we included
two KPAs: data warehouse governance and data governance.
The data warehouse governance KPA is unique in that it sets
up an organizational structure that outlines decision rights
and underlying processes to ensure consistent enforcement
of accountability. The data governance KPA develops and
enforces a plan of interaction with the end-user community
and DW project workersfocusing on issues such as data
quality, data availability, data accessibilityto understand
and address their concerns.
The DWP-M model can be employed to assess the
maturity of a firms data warehousing process. Using the
model, a firm can identify the strengths and weaknesses of
its DWP, based on the extent to which it satisfies a set of
core KPAs. The model would also provide useful guidelines
to firms interested in transitioning to higher maturity levels.
Systematic use of the DWP-M model will help a firm
measure its ability, commitment, goals, and roadblocks
with respect to its performance on the KPAs. We envision
the following streams of research emerging out of our
maturity model. Following the software process maturity
paradigm [29], the first stream of research would focus on
organizational attempts at characterizing DW practices by
empirically examining the consensual benefits attributed to
a mature DWP. For instance, it is important that the
maturity model be used to systematically measure a
companys ability, commitment, goals, and roadblocks for
evaluating its performance on the KPAs and for developing
benchmarks to transition to higher levels of maturity. In this
research stream, the basic premise is that consistent
application of well-defined and measured DW processes,
coupled with continuous process improvement, will
streamline DW project management and substantially
improve the productivity and data quality of data warehouses. Such an endeavor would necessitate the development of a DWP Assessment Instrument consisting of detailed
metrics and a process for calibrating and assessing DWP
maturity of DW units within firms.

SEN ET AL.: A MODEL OF DATA WAREHOUSING PROCESS MATURITY

A second stream of research could focus on the elements of


the DWP-M model itself. Based on the evaluation results, the
model appears to be comprehensive and complete. But it is
unclear if all the KPAs and their activities are of equal value
with respect to DWP maturity assessment. There is also a
need to know if all the activities within each KPA are
quantifiable and measurable. It would be also interesting to
conduct field studies (in the form of surveys) that relate a
number of organizational (e.g., size, system architecture,
structural attributes, resources, management attitude, and
culture) and environmental (e.g., industry maturity, institutional and competitive forces, industry and technology
support structures) determinants of efforts that firms exert
in pursuing initiatives to upgrade their DWP maturity levels.
Finally, the case study approach could be used to
investigate the results of applying the DWP-M model in a
real-world organizational setting. Yin [75] defines the
scope of a case study as an empirical inquiry that
investigates a contemporary phenomenon within its reallife context, especially when the boundaries between the
phenomenon and context are not clearly evident. This is
pertinent to this study because an important future
direction would be to employ our model to assess DWP
maturity in different organizational settings and, based on
those assessments, test a set of hypotheses relating to the
consequents of maturity.

ACKNOWLEDGMENTS
The authors would like to thank the workshop panel
members for their active participation, support, and feedback during the development of the DWP maturity model.
The authors would also like to thank Dr. Hausi Muller and
the three anonymous referees for their insightful and
constructive reviews, which helped improve the quality of
the manuscript significantly.

REFERENCES
[1]

M. Agarwal and K. Chari, Software Effort, Quality, and Cycle


Time: A Study of CMM Level 5 Projects, IEEE Trans. Software
Eng., vol. 33, no. 3, pp. 145-156, Mar. 2007.
[2] S. Adelman and L. Moss, Data Warehouse Failures, The Data
Administration Newsletter, www.tdan.com/i014fe01.htm, Oct.
2000.
[3] N. Bevan, International Standards for HCI, http://nigelbevan.
com/papers/International_standards_HCI.pdf), May 2006.
[4] M. Beyer, Key Issues for Data Warehousing, Gartner Reports, ID
Number: G00147102, Mar. 2007.
[5] J. Brooke, SUSA Quick and Dirty Usability Scale, Usability
Evaluation in Industry, P.W. Jordan, P. Thomas, B.A. Weerdmeester,
and A.L. McClelland, eds., Taylor and Francis, 1996.
[6] V. Chiesa, P. Coughlan, and C. Voss, Development of a Technical
Innovation Audit, J. Product Innovation Management, vol. 13, no. 2,
pp. 105-136, 1996.
[7] D. Clifford and J.V. Bon, Implementing ISO/IEC 20000 Certification:
The Roadmap. Van Haren Publishing, 2008.
[8] P.B. Crosby, Quality Is Free: The Art of Making Quality Certain. New
Am. Library, 1979.
[9] Dataflux, Enterprise Data Management Maturity Model, www.
dataflux.com, 2005.
[10] W. Eckerson, Achieving Business Success through a Commitment to High Quality Data, The Data Warehouse Inst. Report Series,
www.dw-institute.com, pp. 1-33, 2002.
[11] W. Eckerson, Gauge Your Data Warehouse Maturity, DM Rev.,
Nov. 2004.
[12] L.P. English, Information Quality Management Maturity: Toward the Intelligent Learning Organization, white paper,
Information Impact Intl, Inc., May 2004.

351

[13] P. Fraser and M. Gregory, A Maturity Grid Approach to the


Assessment of Product Development Collaborations, Proc. Ninth
Intl Product Development Management Conf., May 2002.
[14] T. Friedman, Data Quality Firewall Enhances Value of the Data
Warehouse, Gartner Reports, Apr. 2004.
[15] T. Friedman, Key Issues of Data Quality: 2007, Gartner Reports,
Mar. 2007.
[16] B. Golden, Succeeding with Open Source. Addison-Wesley, 2005.
[17] S. Gregor, The Nature of Theory in Information Systems, MIS
Quarterly, vol. 30, no. 3, pp. 611-642, 2006.
[18] S. Gregor and D. Jones, The Anatomy of a Design Theory,
J. Assoc. for Information Systems, vol. 8, no. 5, pp. 312-335, 2007.
[19] J.D. Herbsleb, D. Zubrow, D. Goldenson, W. Hayes, and M. Paulk,
Software Quality and the Capability Maturity Model, Comm.
ACM, vol. 40, no. 6, pp. 30-40, 1997.
[20] A.R. Hevner, S.T. March, J. Park, and S. Ram, Design Science in
Information Systems Research, MIS Quarterly, vol. 28, no. 1,
pp. 75-105, 2004.
[21] W.S. Humphrey, Managing the Software Process. Addison-Wesley,
1989.
[22] W.S. Humphrey, A Discipline for Software Engineering. AddisonWesley, 1995.
[23] W.S. Humphrey and M. Kellner, Software Process Modeling:
Principles of Entity Process Models, Technical Report CMU/SEI89-TR-2, Software Eng. Inst., Carnegie Mellon Univ., Pittsburgh,
Pa., 1989.
[24] W.C. Ibbs and Y.H. Kwak, Assessing Project Management
Maturity, Project Management J., vol. 31, no. 1, pp. 32-43,
2000.
[25] W.H. Inmon, Building the Data Warehouse, third ed. Wiley,
2002.
[26] J.J. Jiang, G. Klein, and C.L. Carr, Measuring Information System
Service Quality: SERVQUAL from the Other Side, MIS Quarterly,
vol. 26, no. 2, pp. 145-166, 2002.
[27] W.J. Kettinger and C.C. Lee, Global Measures of Information
Service Quality: A Cross-National Study, Decision Sciences,
vol. 26, no. 5, pp. 569-588, 1995.
[28] R. Kimball, L. Reeves, M. Ross, and W. Thornthwaite, The Data
Warehouse Lifecycle Toolkit. Wiley, 1998.
[29] M.S. Krishnan and M.I. Kellner, Measuring Process Consistency:
Implications for Reducing Software Defects, IEEE Trans. Software
Eng., vol. 25, no. 6, pp. 800-815, Nov./Dec. 1999.
[30] D. Laney, Data Warehouse DBMS Support Costs Either an Arm
or a Leg, white paper, Meta Group, June 2000.
[31] R.G. Little and M.L. Gibson, Perceived Influences on Implementing Data Warehousing, IEEE Trans. Software Eng., vol. 29, no. 4,
pp. 290-296, Apr. 2003.
[32] V. Malheiros, F.R. Paim, and M. Mendon, Continuous Process
Improvement at a Large Software Organization, Software Process
Improvement and Practice, vol. 14, no. 2, pp. 65-83, www.
interscience.wiley.com, Mar./Apr. 2009.
[33] D. Marco, Capability Maturity Model: An Introduction, Information Management Magazine, www.information-management.
com/issues/20020801/5567-1.html, Aug. 2002.
[34] D. Marco, Capability Maturity Model: Applying CMM Levels to
Data Warehousing, Information Management Magazine, www.
information-management.com/issues/20021001/5800-1.html),
Oct. 2002.
[35] D. Marco, Capability Maturity Model: Applying CMM Levels to
Data Warehousing, Information Management Magazine, www.
information-management.com/issues/20021101/5989-1.html,
Nov. 2002.
[36] K.L. McGraw and K. Harbison-Briggs, Knowledge Acquisition:
Principles and Guidelines. Prentice-Hall, 1989.
[37] K.O. McGraw and S.P. Wong, Forming Inferences about Some
Intraclass Correlation Coefficients, Psychological Methods, vol. 1,
no. 1, pp. 30-46, 1996.
[38] G.A. Moore, Crossing the Chasm, revised ed. Collins Business, Aug.
2002.
[39] M. Mullaly, Longitudinal Analysis of Project Management
Maturity, Project Management J., vol. 37, no. 3, pp. 62-73,
2006.
[40] C. Mullins, The Capability Maturity ModelFrom a Data
Perspective, The Data Administration Newsletter, www.tdan.com,
Dec. 1997.

352

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING,

[41] NASCIO (Natl Assoc. of State Chief Information Officers)


Enterprise Architecture Maturity Model: Version 1.3, white
paper, http://www.nascio.org/) Dec. 2003.
[42] F. Niessink and H. van Vliet, Towards Mature IT Services,
Software ProcessImprovement and Practices, vol. 4, no. 2, pp. 55-71,
1998.
[43] F. Niessink, V. Clere, T. Tijdink, and H. van Vliet, The IT Service
Capability Maturity Model, working paper, Dept. of Computer
Science, Vrije Universiteit De Boelelaan, Amsterdam, 2005.
[44] Office of Government Commerce, service support, The Stationery Office, 2000.
[45] Office of Government Commerce, service delivery, IT Infrastructure Library, The Stationery Office, 2001.
[46] A. Orr, J. Turner, O.N. Kunka, and G. Bullen, Harnessing the
Power of ITIL, ExecBlueprints, pp. 1-18, www.execblueprints.
com, 2008.
[47] A. Parasuraman, V.A. Zeithaml, and L.L. Berry, A Conceptual
Model of Service Quality and Its Implications for Future
Research, J. Marketing, vol. 49, no. 4, pp. 41-50, Fall 1985.
[48] M.C. Paulk, Practices of High Maturity Organizations, Proc. 11th
Software Eng. Process Group Conf., Mar. 1999.
[49] M.C. Paulk, B. Curtis, M.B. Chrissis, and C.V. Weber, Capability
Maturity Model, Version 1.1, IEEE Software, vol. 10, no. 4, pp. 1827, July 1993.
[50] M.C. Paulk, C.V. Weber, B. Curtis, and M.B. Chrissis, The
Capability Maturity Model: Guidelines for Improving the Software
Process. Addison-Wesley, 2003.
[51] L.F. Pitt, R.T. Watson, and C.B. Kavan, Service Quality: A
Measure of Information Systems Effectiveness, MIS Quarterly,
vol. 19, no. 2, pp. 173-187, 1995.
[52] J. Poole, D. Chang, D. Tolbert, and D. Mellor, Common Warehouse
Metamodel: An Introduction to the Standard for Data Warehouse
Integration. Wiley, 2002.
[53] K. Ramamurthy, A. Sen, and A.P. Sinha, Data Warehousing
Infusion and Organizational Effectiveness, IEEE Trans. Systems,
Man, and CyberneticsPart A: Systems and Humans, vol. 38, no. 4,
pp. 976-994, July 2008.
[54] P. Russom, Taking Data Quality to the Enterprise through Data
Governance, TDWI Report Series, The Data Warehousing Inst.,
pp. 1-24, www.tdwi.org, Mar. 2006.
[55] SEI, Capability Maturity Model Integration (CMMISM), Version 1.1,
CMU/SEI-2002-TR-029, Aug. 2002.
[56] SEI, Process Maturity Profile: Software CMM, 2004 Mid-Year Update
Report, Software Eng. Inst., Carnegie Mellon Univ., Pittsburgh,
Penn., Aug. 2004.
[57] SEI, CMMI for Services: Initial Draft, Software Eng. Inst., Carnegie
Mellon Univ., Sept. 2006.
[58] A. Sen and A.P. Sinha, A Comparison of Data Warehousing
Methodologies, Comm. ACM, vol. 48, no. 3, pp. 79-84, 2005.
[59] A. Sen and A.P. Sinha, Toward Developing Data Warehousing
Process Standards: An Ontology-Based Review of Existing
Methodologies, IEEE Trans. Systems, Man, and Cybernetics
Part C, vol. 37, no. 1, pp. 17-31, Jan. 2007.
[60] A. Sen, A.P. Sinha, and K. Ramamurthy, Data Warehousing
Process Maturity: An Exploratory Study of Factors Influencing
User Perceptions, IEEE Trans. Eng. Management, vol. 53, no. 3,
pp. 440-455, Aug. 2006.
[61] J.P. Shim, M. Warkentin, J.F. Courtney, D.J. Power, R. Sharda, and
C. Carlsson, Past, Present, and Future of Decision Support
Technology, Decision Support Systems, vol. 33, no. 2, pp. 111-126,
2002.
[62] H.A. Simon, The Sciences of the Artificial, third ed. MIT Press, 1996.
[63] SonicSoftware.com. A New Service-Oriented Architecture (SOA)
Maturity Model,Research Report, www.sonicsoftware.com/
index.ssp, Sept. 2005.
[64] C. Todman, Designing a Data Warehouse: Supporting Customer
Relationship Management. Prentice Hall, 2001.
[65] J. van Bon, M. Pieper, and A. Van Der Veen, Foundations of IT
Service Management Based on ITIL, second ed. Van Haren Publishing, 2005.
[66] J. van Bon and V. Tieneke, Frameworks for IT Management. Van
Haren Publishing, 2006.
[67] P. Vassiliadis, Data Warehouse Modeling and Quality Issues,
PhD thesis, Natl Technical Univ. of Greece, Athens, Greece, 2001.
[68] J.G. Walls, G.R. Widmeyer, and O.A. El Sawy, Building an
Information System Design Theory for Vigilant EIS, Information
Systems Research, vol. 3, no. 1, pp. 36-59, 1992.

VOL. 38,

NO. 2,

MARCH/APRIL 2012

[69] H. Watson, T. Ariyachandra, and R.J. Matyska Jr., Data Warehousing Stages of Growth, Information Systems Management,
vol. 18, no. 3, pp. 42-51, Summer 2001.
[70] R.T. Watson, L.F. Pitt, and C.B. Kavan, Measuring Information
Systems Service Quality: Concerns for a Complete Canvas, MIS
Quarterly, vol. 21, no. 2, pp. 209-221, 1998.
[71] Wikipedia, Data Warehouse, http://en.wikipedia.org/wiki/
Data_warehouse, 2011.
[72] Wikipedia Information Technology Infrastructure Library,
http:/en.wikipedia.org/wiki/Information_Technology_Library,
May 2010.
[73] R. Winter and R. Burns, Managing Data Warehouse Growth,
Intelligent Enterprise, www.intelligententerprise.com/showArticle.
jhtml?articleID=193105574, Nov. 2006.
[74] B.H. Wixom and H.J. Watson, An Empirical Investigation of the
Factors Affecting Data Warehousing Success, MIS Quarterly,
vol. 25, no. 1, pp. 17-41, Mar. 2001.
[75] R.K. Yin, Case Study Research: Design and Methods, third ed. Sage
Publications, 2002.
Arun Sen received the MTech degree in
electronics from the University of Calcutta, India,
and the MS degree in computer science and the
PhD degree in information systems from Pennsylvania State University. He is a professor in
the Department of Information and Operations
Management, Mays Business School, Texas
A&M University. He has published more than
45 research papers in journals such as MIS
Quarterly, Information Systems Research, IEEE
Transactions on Software Engineering, IEEE Transactions on Systems,
Man, and Cybernetics, IEEE Transactions on Engineering Management,
Decision Sciences, Communications of the ACM, Information Systems,
Computers and OR, Omega, European Journal of Operational
Research, Decision Support Systems, Journal of Management Information Systems, Information and Management, and Omega. He has
served as an associate editor of the Journal of Database Management.
He has also been an editor of special issues for Decision Support
Systems, Communications of the ACM, Database, and Expert Systems
with Applications. He was the chair of the INFORMS College on
Information Systems, and a program chair for the Workshop on
Information Technologies and Systems (WITS) in 1996. His research
interests include data warehouse maturity, decision support systems,
database management, repository management and software reuse,
case- based reasoning, and e-Commerce.

SEN ET AL.: A MODEL OF DATA WAREHOUSING PROCESS MATURITY

K. (Ram) Ramamurthy received the PhD


degree in business with an MIS concentration
from the University of Pittsburgh. He is a
professor and James R. Mueller Distinguished
Scholar of MIS at the Sheldon B. Lubar School
of Business, University of Wisconsin-Milwaukee.
He has 20 years of industry experience, holding
several senior technical and executive positions.
He served as an associate editor of MIS
Quarterly for four years. He has published more
than 46 research articles in major scholarly journals, including MIS
Quarterly, Journal of Management Information Systems, IEEE Transactions on Software Engineering, IEEE Transactions on Systems, Man
and Cybernetics, Decision Sciences, Decision Support Systems,
European Journal of Information Systems, Information & Management,
Journal of Organizational Computing and Electronic Commerce,
International Journal of Electronic Commerce, IEEE Transactions on
Engineering Management, International Journal of Production Research, International Journal of Human-Computer Studies, Journal of
International Marketing, OMEGA, and INFOR. His current research
interests include electronic commerce with interorganizational systems/
EDI and the Internet; adoption, assimilation, and diffusion of modern IT;
data resource management and data warehousing; IT business value;
IT outsourcing; decision and knowledge systems for individuals and
groups; and TQM including software quality. He is a charter member of
the Association for Information Systems.

353

Atish P. Sinha received the PhD degree in


business, with a concentration in artificial intelligence, from the University of Pittsburgh. He is a
professor of MIS at the Sheldon B. Lubar School
of Business, University of Wisconsin-Milwaukee.
His research has been published in several
journals, including Communications of the ACM,
Decision Support Systems, IEEE Transactions
on Engineering Management, IEEE Transactions on Software Engineering, IEEE Transactions on Systems, Man, and Cybernetics, Information Systems
Research, International Journal of Human-Computer Studies, Journal
of the Association for Information Systems, and Journal of Management
Information Systems. He chaired the Sixth Design Science Research in
Information Systems and Technology (DESRIST) Conference in 2011
and the 16th Workshop on Information Technologies and Systems
(WITS) in 2006. He served as an associate editor of MIS Quarterly for
special issues in design science research and business intelligence. His
current research interests are in the areas of business intelligence, data
mining, text mining, data warehousing, web analytics, and serviceoriented computing. He is a member of the ACM, AIS, and INFORMS.

. For more information on this or any other computing topic,


please visit our Digital Library at www.computer.org/publications/dlib.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Das könnte Ihnen auch gefallen