A UML-based Data Warehouse Design Method PDF

Decision Support Systems 42 (2006) 1449 1473
www.elsevier.com/locate/dss
A UML-based data warehouse design method

Nicolas Prat a , Jacky Akoka b , Isabelle Comyn-Wattiau c,
a
ESSEC, Avenue Bernard Hirsch, BP 50105, 95021 Cergy Cedex, France
b
CEDRIC-CNAM and INT 292, rue Saint-Martin 75141 Paris Cedex 03, France
c
CEDRIC-CNAM and ESSEC 292, rue Saint-Martin 75141 Paris Cedex 03, France
Received 7 August 2004; received in revised form 20 July 2005; accepted 1 December 2005
Available online 23 January 2006
Abstract
Data warehouses are a major component of data-driven decision support systems (DSS). They rely on multidimensional models.
The latter provide decision makers with a business-oriented view to data, thereby easing data navigation and analysis via On-Line
Analytical Processing (OLAP) tools. They also determine how the data are stored in the data warehouse for subsequent use, not
only by OLAP tools, but also by other decision support tools. Data warehouse design is a complex task, which requires a
systematic method. Few such methods have been proposed to date. This paper presents a UML-based data warehouse design
method that spans the three design phases (conceptual, logical and physical). Our method comprises a set of metamodels used at
each phase, as well as a set of transformations that can be semi-automated. Following our object orientation, we represent all the
metamodels using UML, and illustrate the formal specification of the transformations based on OMG's Object Constraint
Language (OCL). Throughout the paper, we illustrate the application of our method to a case study.
2005 Elsevier B.V. All rights reserved.
Keywords: Data warehouse; On-Line Analytical Processing (OLAP); Decision support; Conceptual design; Logical design; Physical design
1. Introduction systems are based on a multidimensional model. This

model provides managers with a business-oriented view
Data warehouses are an essential component of data- of data. It facilitates data navigation, analysis, and ulti-
driven decision support systems (DSS) [34]. They have mately decision making. Depending on the underlying
become the focal point for decision support in organiza- database, OLAP tools are traditionally subdivided into
tions today [40]. Moreover, empirical evidence suggests two main categories [10]: Multidimensional OLAP
that DSS users can improve decision performance by (MOLAP) tools store data in a proprietary multidimen-
implementing a data warehouse [26]. In order to gain sional database system. The multidimensional compo-
business insight from data stored in data warehouses, nent of Oracle 9i Release 2 (formerly known as Express
decision makers typically use OLAP, query and report- and subsequently referred to as Oracle MOLAP), and
ing, and/or data mining tools. For OLAP tools alone, Hyperion Essbase, are representative of this category.
[36] estimates the worldwide market at 6 billion dollars Relational OLAP (ROLAP) tools simulate a multidi-
in 2007, compared with 1 billion dollars in 1996. OLAP mensional model with a relational database, usually
based on Kimball's star or snowflake schemas [16].
Corresponding author. From the above considerations, it ensues that the
E-mail addresses: prat@essec.fr (N. Prat), akoka@cnam.fr multidimensional model (1) provides decision makers
(J. Akoka), wattiau@cnam.fr (I. Comyn-Wattiau). with a view to the data warehouse, via OLAP tools, and
0167-9236/$ - see front matter 2005 Elsevier B.V. All rights reserved.
doi:10.1016/j.dss.2005.12.001
1450 N. Prat et al. / Decision Support Systems 42 (2006) 14491473
(2) determines how the data are stored in the data ware- base schema depending on the target implementation
house for subsequent use, not only by OLAP tools, but tool. Our method comprises a set of metamodels used at
also by other decision support tools, e.g. data mining each phase (in particular a unified multidimensional
[33]. Consequently, multidimensional modeling is cen- metamodel used at the logical phase), as well as a set
tral to data warehouse design. of transformations that can be semi-automated. Follow-
The data warehouse design process is crucial and ing our object orientation, we represent the metamodels
should be supported by an appropriate method. A using UML and specify the transformations formally
design method is essential not only to ensure the data using OMG's Object Constraint Language (OCL)
warehouse quality, but also to facilitate its frequent [25]. The paper is organized as follows. Section 2
evolutions imposed by the environment or the decision describes related research. Section 3 presents our uni-
makers' changing requirements. fied multidimensional metamodel, which is at the center
A complete data warehouse design method should of our design method. Section 4 presents the design
span the three abstraction levels recommended by method and illustrates step by step how the method is
ANSI/X3/SPARC, namely conceptual, logical and phy- applied to a case study. Section 5 concludes and
sical. These levels are widely accepted as a sound describes further research.
framework to guide the modeling process of operational
databases. We strongly argue that this framework 2. Related research
remains relevant in the context of data warehouses,
even if the three levels need to be adapted to this There exists a fair number of papers describing mul-
context. The need to adapt these three levels to the tidimensional metamodels [5,39]. The Common Ware-
domain of data warehousing is frequently acknowl- house Metamodel (CWM) [25] is a recent endeavor to
edged [13,14], although there is no agreement concern- standardize data warehousing and business intelligence
ing the content of each level and hence of each data applications based on UML. The OLAP package
warehouse design phase. describes the multidimensional metamodel, indepen-
Despite the existence of several methods presented dently of any ROLAP or MOLAP implementation.
in the literature, the data warehouse community still The multidimensional package is a metamodel for
lacks a widely accepted method, covering all aspects MOLAP tools. MOLAP tool-specific metamodels (e.g.
of data warehouse design and specifying the various the metamodel of Oracle MOLAP) are defined as exten-
design steps. This lack of a general data warehouse sions of this metamodel.
design method follows from the absence of a standard, Data warehouse design approaches can be analyzed
tool-independent multidimensional formalism i.e. mul- from three vantage points. The first one is related to
tidimensional metamodel.1 The multidimensional meta- OLAP tools. Vendors of such tools claim that cube
model is central to data warehouse design; however, no design (i.e. multidimensional modeling) is an intuitive
consensus has emerged concerning this metamodel, as and quasi-immediate process, which does not need a
opposed to the relational standard in the On-Line Trans- sophisticated design method. Oddly enough, the very
actional Processing (OLTP) world. same arguments were used by relational database ven-
Considering the need for a systematic data ware- dors 20 years ago. There are plenty of failure stories in
house design method and the limitations of current database design due to lack of methods. The second
methods, this paper presents a UML-based data ware- vantage point is related to multiple source integration
house design method that spans conceptual, logical and approaches where the data warehouse schema results
physical design. Starting from user requirements, the from an integration of source database schemas [15].
conceptual phase leads to a UML model. To this end, The hypothesis underlying these approaches is that the
UML is enriched with concepts relevant to multidimen- data warehouse must contain exactly the existing data
sional systems. The logical phase maps the enriched sources. This hypothesis is rarely realistic. The third
UML model into a multidimensional schema, indepen- vantage point is related to the adaptation of classical
dently of any implementation tool. The physical phase database design methods. The main advantage is the
maps the multidimensional schema into a physical data- clear separation between semantic considerations and
physical constraints. Moreover, in this approach, data
warehouse design explicitly takes into account user
1
Following the convention adopted by the Object Management
requirements. However, it must be followed by a data
Group [25], we will use the term metamodel to refer to modeling confrontation phase enabling the mapping between the
languages (e.g. relational, UML, etc.). data warehouse schema and the data sources. In this
N. Prat et al. / Decision Support Systems 42 (2006) 14491473 1451
paper, we propose a step forward: we enrich this classi- Its vocation is to enable the design of all components
cal approach by defining a data warehouse design including the data warehouse, the data marts and the
method starting from user requirements and leading to ETL processes. However, the current version con-
a physical schema by means of specific transformation centrates on the data warehouse design.
rules. The following state-of-the-art review will concen- It is implementation-independent by providing rules
trate on this third stream. for MOLAP as well as ROLAP tools.
Published models or methods, very often, encompass The formalization is based on meta-models describ-
only some phases of data warehouse design. These ing the three abstraction levels (conceptual, logical,
previous research works can be characterized according and physical). Transformation rules are provided to
to the following criteria: map one level into the following one and are for-
malized using the OCL notation.
The underlying conceptual model, such as the A high level of automation is obtained by inserting
Entity-Relationship (ER) model [8], an extended the best practices in the rules by means of heuristics.
ER model [29,32], an object-oriented model like A CASE tool is under prototyping.
UML [3,21,23], a specific model [16]. Let us men- Standardization relies on the combination of UML
tion that some design methods don't have an explicit with its associated OCL formalism.
conceptual model [9].
The associated paradigm, i.e. bottom-up when start- [21] seems to be the most comprehensive method
ing from operational data sources [22], top-down for data warehouse design meeting the main require-
based on user requirements [37], or mixed when ments described above. We propose to go beyond by
combining the two approaches [32]. providing a higher level of automation and a higher
The components of the system taken into account, degree of standardization. Moreover, our contribution
such as the data warehouse [13], the data marts [6], also encompasses an important effort in unifying
the ETL (Extract-Transform-Load) process [21], etc. multidimensional models. The next section is dedi-
The target system being implemented, such as cated to the description of our unified multidimensional
ROLAP environments [14], MOLAP environments metamodel.
[2], both environments [8]. Other approaches are
independent of implementation issues [37]. 3. A unified multidimensional metamodel
The degree of formalism used, ranging from a rough
set of guidelines [12] to a rigorously formalized Among the many multidimensional metamodels pro-
notation including meta-modeling [21]. posed in the literature, no standard emerges. Further-
The abstraction levels covered by the method, more, no consensus has been reached yet concerning
namely the conceptual, logical and physical the level of the multidimensional metamodel (physical,
phases. As an illustration, some approaches are logical or conceptual). The star and snowflake models
dedicated to the conceptual phase [37], others presented in [16] have often been considered being at
merge the conceptual and logical phases [32], and the physical level. More recent publications have placed
others concentrate on a unique logicalphysical the multidimensional metamodel at the logical level
phase [16]. [25,39] or even at the conceptual level [13,14].
The level of automation made possible: this level We firmly contend that the multidimensional meta-
encompasses simple design rules [16] and can be model belongs to the logical level. This claim is justi-
spanned to a CASE tool [28]. fied by the following considerations:
To the best of our knowledge, no single method is This metamodel does not belong to the physical
widely accepted. Based on this categorization, we can level, since multidimensional models are often used
characterize our method as follows: as pivot models, enabling mapping to different phy-
sical tool implementations.
It is based on the UML notation, facilitating the Moreover, it should not be considered at the concep-
process by referring to a well-known formalism. tual level since its concepts (e.g. the concept of
It combines a top-down approach (data warehouse dimension) do not have the semantic richness of
design starts from user requirements) with a bottom- traditional conceptual metamodels, e.g. ER or
up approach (the resulting data warehouse is con- UML. Actually, proposals based on a combination
fronted with operational data sources). of multidimensional metamodels with ER or UML
concepts are often justified by the poor semantics of house design method, making the link between the
the basic multidimensional metamodel. conceptual and physical design phases. Among extant
If we situate the multidimensional metamodel at the multidimensional metamodels, some key concepts
conceptual level, then it seems coherent to consider appear recurrently, but we found no metamodel encom-
at the logical level metamodels like MOLAP or passing all these concepts and accurately expressing
ROLAP. However, once a conceptual multidimen- their relationships. Therefore, we defined a multidimen-
sional metamodel has been defined, it makes little sional metamodel, unifying the concepts of the main
sense to define a standard logical MOLAP metamo- multidimensional metamodels. This metamodel is gen-
del, since no such standard exists in reality. Further- eric, i.e. can be mapped into any OLAP tool (this paper
more, in the case of ROLAP, the relational layer illustrates MOLAP tool implementation; ROLAP
must be considered as the physical structure for implementation is described in [30]).
data storage [5], therefore ROLAP cannot be con-
sidered belonging to the logical level either. 3.1. Multidimensional concepts
There is indeed a strong parallel between the rela-
tional metamodel in the OLTP world and the multi- Multidimensional metamodels store data in hyper-
dimensional metamodel in the OLAP worlde.g. cubes, often simply referred to as cubes. Fig. 1 illus-
the definitions or attempts to define an associated trates a cube, which represents the fact Sale.
query language and a normalization theory (even The key multidimensional concepts are cubeswhich
though the concept of normalization in the OLAP represent facts of interest for analysisand dimensions,
context differs from the traditional concept of nor- i.e. the axes of the cubes. Facts are described by mea-
malization in the OLTP context). The analogy sures. A measure is an indicator, a value of interest for
between these two metamodels is often made the decision maker. Measures are quantifying values;
[2,11], confirming that the multidimensional meta- they are generally numeric. In a cube, the measures
model should be situated at the same level as the correspond to the cells. Every dimension may consist
relational metamodel in transactional database of one or several aggregation level(s), called dimension
design, i.e. the logical level. levels. Dimension levels are organized in hierarchies,
i.e. aggregation paths between successive dimension
After an overview of the basic multidimensional levels (for example, DayWeekQuarterYear in
concepts, we present our unified multidimensional Fig. 1). Hierarchies are crucial to multidimensional
metamodel and its associated graphical notation. The modeling since they are used in conjunction with aggre-
metamodel is a central component of our data ware- gation functions to aggregate (rollup) or detail (drill-
3 March 02 4 6 8 11 5 9 9
4 March 02 6
QUARTER
TIME
MONTH
YEAR
5 March 02 9
DAY
6 March 02 9 sale amount

7 March 02 3
Bordeaux
WEEK
8 March 02 1 Brest
Lyon
Ly
9 March 02 12 Nantes
TY
Paris
CI
IC
Y
R
P1 P2 P3 P4 P5 P6 P7
ST
H
N
DI
PRODUCT P
IO
G
product name
Legend A
RE
weight R
G
Measure SUBCATEGORY
O
E
DIMENSION G
DIMENSION LEVEL CATEGORY
Dimension level attribute
Hierarchy PRODUCT
Fig. 1. Multidimensional representation of data.

down) measures. Hierarchies are often completed with (through classification relationships) and then hierar-
the special dimension level All, thereby enabling chies into dimensions. A classification relationship
aggregation of measures at the highest possible level. links a child dimension level to a parent dimension
Dimension levels may be described by attributes. Con- level. Classification relationships are sometimes called
trary to measures, dimension level attributes are not the drilling or rollup relationships in other multidimen-
object of multidimensional analysis. sional metamodels. Similarly to [38], we define a hier-
Instances of dimension levels are called dimension archy as a meaningful sequence of classification
members. For a given measure in an n-dimensional relationships where the parent dimension level of a
(hyper)cube, a combination of n dimension members, classification relationship is also the child of the next
e.g. (3 March 02, P1, Paris), uniquely identifies a classification relationship. In other words, a hierarchy is
cell and therefore a measure value (4000 Euros). More a meaningful aggregation path between dimension
specifically, for each axis, the dimension members used levels. An aggregation path is meaningful if valid
as coordinates are instances of the least aggregated sequences of drill-down and/or rollup operations can
dimension level. In the sequel, we will refer to these be performed by following the path. Note that different
dimension levels (in our example, Day, Product and hierarchies may share common dimension levels and
City) as the base dimension levels of their respective even common classification relationships. From the
dimensions (Time, Product and Geography). definition of hierarchies, dimensions are in turn defined
as meaningful groupings of hierarchies. Dimension
3.2. The unified multidimensional metamodel levels own dimension level attributes, which may be
identifying or non-identifying attributes. Some multi-
The key concepts of the unified multidimensional dimensional metamodels do not distinguish between
metamodel (Fig. 2) are dimensions and facts. These dimension levels and dimension level identifying attri-
two concepts are interrelated and composed of hierar- butes. However, dimension levels and their identifying
chies and measures, respectively. Dimensions are attributes are not semantically equivalent; therefore they
defined by grouping dimension levels into hierarchies need to be represented as distinct concepts in the
MultidimensionalSchema
ModelElement
name : Name
1 +multidimensionalSchema
1..* +ownedElement
MultidimensionalSchemaElement
Dimension
1 Fact
*
+dimension
+fact
1..* +hierarchy DimensionLevel 1..* 1
Hierarchy isTime : Boolean +dimensionLevel Dimensioning +fact
minimal : Boolean
1 +measure
1 1 +owner
+hierarchy * {ordered}
1..* +parent +child
Measure
+target +source
1..* * * 1..* {ordered}
+attribute
{ordered} ClassificationRelationship DimensionLevelAttribute
+classificationRelationship
ApplicableAggregationFunctions
aggregationFunctions : Set(AggregationFunction) {disjoint,complete}
levels : Integer
IdentifyingAttribute NonIdentifyingAttribute
* 1..*
+hierarchy +measure
Fig. 2. Unified multidimensional metamodel.

multidimensional metamodel. Using the isTime attri- tion functions is crucial [19,20,27]. For every measure,
bute, we distinguish between temporal and non-tem- for every dimension level dimensioning the measure
poral dimension levels. This distinction is implicit at (i.e. dimensioning the fact which bears the measure),
the physical level i.e. in OLAP tools, which offer built- the set of aggregation functions applicable along the
in functions to handle time. Due to the ubiquity of time different hierarchies starting from the dimension level
in data warehouses, as well as the specific semantics has to be specified. Based on the aggregation functions
associated with it, we believe, similarly to [25], that the available in OLAP tools, the unified multidimensional
distinction between temporal and non-temporal dimen- metamodel considers the following functions: SUM,
sion levels should not be postponed to the physical level AVG, MIN, MAX, MED (median), VAR (variance),
and should already appear at the logical level, i.e. in the STDDEV and COUNT. Following [27,31], we distin-
multidimensional metamodel. guish between three classes of aggregation functions.
Facts are composed of measures. We do not require The first class of functions, which includes all aggrega-
measures to be of numeric type, as long as their values tion functions ({SUM, AVG, MIN, MAX, MED, VAR,
are totally ordered. For example, a measure can be an STDDEV, COUNT}), is applicable to measures that can
enumeration type. Some facts have no measure (this be summed. The second class of functions ({AVG,
corresponds to the notion of factless fact table in [16]). MIN, MAX, MED, VAR, STDDEV, COUNT}) applies
Facts are dimensioned by dimension levels, called to measures that can be used for average calculations.
base dimension levels in Section 3.1. The relation- The last class contains the single function COUNT. In
ship between a fact and each of its dimension levels is the Sale example, considering the measure sale
called dimensioning. Among the dimension levels amount, the function SUM (and therefore all other
related to a fact, some may not be necessary to func- aggregation functions) applies to all the hierarchies
tionally determine the fact. They are defined as not starting from the dimension levels Day, Product and
minimal in the multidimensional schema (in reference City. Our model explicitly acknowledges that for a
to the concept of minimal functional dependency in given measure, applicable aggregation functions depend
relational theory). For example, a fact Rental may be not only on dimensions, but on hierarchies within
dimensioned by Vehicle, Customer and Date. All these dimensions. Moreover, for a given hierarchy, the aggre-
dimension levels are relevant for analyzing the rental gation functions can be applicable only to the first n
and its associated measures. However, a given Vehicle levels of the hierarchy (this will be illustrated in Section
and a given Date uniquely identify a Rental. Therefore, 4 with the case study).
the dimension level Customer is not minimal. It is worth Our unified multidimensional metamodel shares
noting that the concept of minimal is used with var- many concepts with the OLAP metamodel in the
ious meanings in multidimensional modeling. Our defi- CWM architecture [25]. However, OMG's OLAP meta-
nition of minimal dimension level, based on functional model is currently not acknowledged as the standard
dependencies, is in line with the notion of multidimen- multidimensional metamodel by the research commu-
sional normal form presented in [19]. However, follow- nity. Furthermore, this metamodel lacks some important
ing [24], we may consider that a multidimensional concepts found in other multidimensional metamodels
schema is minimal if it contains only dimensions men- (for example, the notion of dimension level attribute
tioned in user queries (or more generally, user require- should be represented as a concept in its own right).
ments). In this sense, a multidimensional schema may OMG's OLAP metamodel also incorporates some con-
be minimal even if it contains non-minimal dimension cepts that are peripheral to multidimensional modeling
levels. When they are derived from user requirements, and/or are related to implementation, i.e. belong to the
non-minimal dimension levels are rightly represented in physical level. In order to enable user interaction and
multidimensional schemas. However, they will often be visualization of multidimensional schemas by means of
implemented specifically in MOLAP or ROLAP tools. a graphical notation, we argue that logical and physical
Therefore they are represented explicitly as non-mini- concepts must remain separate and the number of multi-
mal dimension levels. dimensional concepts must be kept relatively small. Fig.
Different facts may share the same set of dimension 3 illustrates the graphical notation associated with the
levels (in our example, we could imagine the fact unified multidimensional metamodel, using the Sale
Stock, indicating the quantity stored by Product, Day example. By default, the name of a dimension is the
and City). name of its base dimension level; otherwise the name of
In order to enable correct aggregation of measures the base dimension level is followed by the name of the
along hierarchies, the definition of applicable aggrega- dimension. When a dimensioning is not minimalthis
Category
category
Subcategory Fact
measure 1
subcategory
...
measure n
Product
product code Dimension level
product name
identifying attribute
weight
non-identifying attribute 1
...
non-identifying attribute n
All Year Quarter Month Day (Time) Sale
all year quarter month day sale amount Temporal dimension level
Week City (Geography) Base dimension level (Dimension)

week city
fact - dimension level

District relationship
district (dimensioning)
inter - dimension level

Region relationship
region (classification)
Fig. 3. Graphical notation for multidimensional schemas.
does not happen in the Sale exampleit is represented by the UML specification. This specific notation
with a dotted line instead of a plain line. makes multidimensional schemas more concise and
In order to define our graphical notation, we have more readable. However, alternatively, the standard
used built-in extensibility mechanisms of UML, namely UML notation may be used, e.g. for representing multi-
stereotypes and tagged values. We have also defined a dimensional schemas with a UML CASE tool. In Fig. 4,
specific graphical notation for stereotypes, as permitted using an excerpt from Fig. 3, we illustrate the
<<Dimension level>>
Subcategory
Subcategory
subcategory {id}
subcategory
+child
<<classification relationship>>
+parent
<<Base dimension level>>
Product Product
product code
product code {id}
product name product name
weight weight
<<Temporal dimension level>>
<<Base dimension level>> <<dimensioning>>
Day (Time)
Day (Time) <<dimensioning>> <<Fact>>
Sale day {id} Sale
day sale amount
sale amount
<<dimensioning>>
<<Base dimension level>>
City (Geography) City (Geography)
city city {id}
Fig. 4. Representing multidimensional schemas in standard UML notation.

correspondence between our graphical notation (left sion level. Since the specification of aggregation func-
part of the schema) and the standard UML notation tions is generally complex and may reduce readability
(right part of the schema). of graphical schemas, it should be presented in a sepa-
A graphical multidimensional schema must be com- rate table [14].
pleted with information on applicable aggregation func- The unified multidimensional metamodel presented
tions: for each measure of each fact, the applicable in this section is used as a pivot metamodel in our
aggregation functions must be specified for each dimen- design method.
4. The data warehouse design method
Our method considers both user requirements and operational data sources for data warehouse development.
However, we believe that data warehouse designers should first define the information needed by users (decision
makers), without being limited by the information stored in operational databases. This is the only way to prevent
critical business needs from being overlooked. Therefore, our method is primarily user requirements-driven. Starting
from an informal statement of the information needed by decision makers, the method is decomposed into a
conceptual, a logical and a physical phase. Finally, the data confrontation phase takes operational data into considera-
tion, defining how and to what extent the decision makers' information needs can be fulfilled by operational data
sources. The four phases are represented in Fig. 5.
The conceptual phase is decomposed into two steps. In step 1, the designer collects and represents the decision
makers' information requirements. To this end, he may use any UML-compliant system analysis method.
Requirements engineering techniques used in the OLTP world are applicable. More specifically, interviews and
joint sessions are used, as well as the study of existing reports and the prototyping of future reports. Planned
queries are also considered in order to fully specify the information needed by users for analysis and decision
User requirements
Conceptual modeling
UML mode
CONCEPTUAL
DESIGN Enrichment/transformation
Enriched/transformed UML model
LOGICAL Logical mapping

DESIGN
Unified multidimensional schema
Physical mapping
PHYSICAL
DESIGN MOLAP MOLAP ... ROLAP ROLAP ...
Express Essbase star snowflake
schema schema schema schema
DATA
Source confrontation
CONFRON-
TATION
Data warehouse metadata
Fig. 5. The four phases of the data warehouse design method.

making. User requirements are represented in a UML class diagram. This UML model is then enriched and
transformed in step 2, in order to ease the automatic mapping of the model into a logical multidimensional
schema.
In the logical phase, the UML model resulting from the conceptual phase is mapped into a multidimensional
schema expressed with the unified multidimensional metamodel presented in Section 3. This is achieved through a
set of mapping transformations.
The physical phase maps the multidimensional schema into a physical database schema, depending on the target
OLAP tool. A specific set of mapping transformations is defined for each type of tool. Due to space limitations, the
present paper focuses on Oracle MOLAP implementation.
The data confrontation phase consists in mapping the physical schema data elements to the data sources. It leads to
the definition of queries for extracting the operational data in the physical data warehouse schema. This is a crucial
phase in the data warehouse design method. However, the complex ETL techniques used in this phase are beyond
the scope of this paper.
We detail below the content of the conceptual, logical and physical design phases, i.e. the metamodels used and
the transformations performed at each phase. The transformations are described in natural language and specified
formally in OCL. A transformation is represented as an operation, which is specified with pre- and post-conditions
written in OCL. The latter is well suited to the formal specification of operations; in particular, the declarative style
of OCL emphasizes the effect of operations rather than their actual implementation. The latest version of OCL
supports collections of collections, which is useful for the specification of some transformations of our design
method. We have checked the syntactic correctness of all our specifications using the OCL checker [17]. Due to
space limitations, this paper shows the OCL specification for the first transformation only. For other examples, the
reader may refer to [30].
In order to illustrate the step-by-step application of our method, we use a case study related to media-planning
activities.
4.1. Conceptual design
The conceptual design phase of our method is grounded on the following principles. Having established that the
multidimensional metamodel belongs to the logical level, we need a conceptual modeling formalism to specify
users' information needs before proceeding to logical design. This conceptual modeling formalism needs to be
familiar to users and designers, like ER or UML for example. In order to ease automatic mapping of the conceptual
model into a multidimensional schema, we need to incorporate into the conceptual model some information specific
to multidimensional modeling. However, we should avoid to incorporate too much multidimensional information
into the conceptual model: the multidimensional metamodel belongs to the logical level and mixing multidimen-
sional concepts like dimensions or hierarchies with concepts like entities, relationships, classes or generalizations
can be a source of confusion when defining and validating the conceptual model with users. We have chosen UML
as the conceptual modeling formalism. This is in line with the current trend, e.g. the Common Warehouse
Metamodel. Furthermore, like [1,7], we see many benefits in applying the object-oriented approach to data ware-
housing and OLAP. By choosing UML for conceptual data warehouse design, we benefit from the following
advantages:
Familiarity of designers with the formalism, capitalization on previous research work concerning not only UML but
also ER (since UML can be viewed as an extension to ER).
Simplicity of UML class diagrams and richer semantics than ER diagrams. In particular, UML offers many
varieties of the basic ER relationship (e.g. aggregation and generalization), which is very useful for the definition
of multidimensional hierarchies in the logical phase [3].
Possibility of mapping the conceptual UML model into a multidimensional schema in a quasi-automated way (as
the description of the logical design phase shall illustrate).
The conceptual design phase is subdivided into two steps. Step 1 leads to a UML model, more precisely to a class
diagram without operations. Step 2 enriches and transforms this model to facilitate its automatic mapping into a
unified multidimensional schema. Four types of transformations are conducted: the determination of identifying
attributes, the determination of measures, the migration of association attributes and the transformation of
generalizations.
4.1.1. Initial UML model definition

Using any UML-compliant requirements engineering method, the designer defines the UML class diagram
representing the decision makers' initial information requirements. This first step of the data warehouse design
method uses no multidimensional concepts, thereby enabling maximal reuse of systems analysis methods commonly
used for transactional systems engineering.
We illustrate this first step using the media-planning case study, presenting the decision makers' initial require-
ments and the resulting UML class diagram (Fig. 6).
A company is faced with the definition of an optimal media-planning system. Periodically, the company
launches advertising campaigns for its products using several types of media. The media-planning problem
consists in choosing the media which will reach the maximal number of consumers and have the maximum
effect on the consumption of the advertised products. To assist managers in this complex, ill-structured problem,
the company launches a data warehousing project. The data warehouse will be built incrementally and made
available to managers via OLAP and data mining tools. The first step is the definition of the conceptual model
of the data warehouse, which is based on the following requirements (the actual case study has been simplified
for the sake of this paper). The consumer base is decomposed into predefined targets i.e. segments. A target is
identified by a code and characterized by a marital status, a minimum and maximum age, a sex and a region
(e.g. all married female consumers aged between 30 and 49, living in the South-West). For each region, decision
Product_type Region
product_type Media_type
region
1 product_unit number_of_inhabitants 1 media_type
1..*
gets media_unit
* *
Media *
Product 1
Exposure media_name
product_code percentage_of_region media_exposure advertising_price
product_name
* 1..*
1 1..* * *
Target
*
target_code
Purchase
status * in
quantity minimum_age Main_shareholder
amount maximum_age percentage_of_shares 1
sex Shareholder
* shareholder_name
*
*
for
Year
year {overlapping,
*
Influence_of_campaign complete}
influence_coefficient 1
1..* Private_shareholder Public_shareholder
Quarter public_shareholder_level
*
during quarter
*
1 {disjoint,complete}
Advertising_campaign * 1
campaign_code 1..* Person Company
Date manager_name
* dd_mm_yy
Fig. 6. Initial UML conceptual model for the media-planning example.

makers need to know the respective percentage of the different targets located in this region. Products are
characterized by a code, a name and a product type which determines the product unit. The product unit is used
to measure the consumption of the product (e.g. for products of type beer, the consumption is measured in
liters). Media are characterized by a name (e.g. Channel n) and an advertising price. The latter is expressed
per media unit. The unit depends on the media type. In the case of TV for example, the unit is one minute and
the advertising price is expressed in dollars per minute. When choosing between alternative media, knowing the
shareholding structure of the media may sometimes prove useful. For every media, the decision makers must be
able to know its main shareholder at a given date, as well as the percentage of shares it holds. A shareholder
may be either public, private or both (jointly owned companies). Private shareholders may be either individuals
or companies. Public shareholders are characterized by their level (e.g. a city, a state, the country). The
exposure of targets to media (i.e. their media consumption) is measured on a quarterly basis. The exposure is
expressed in the media unit. Similarly, the purchase of products by target is measured every quarter. It is
expressed in total quantity (based on the product unit) and in total dollar amount. Finally, the influence of past
campaigns on purchases is measured by an influence coefficient, which is estimated with data mining
techniques.
4.1.2. Enrichment/transformation of the UML model

This second step of conceptual design aims at facilitating subsequent mapping of the UML conceptual model into a
logical multidimensional schema. To achieve this, we need to extend the standard UML metamodel, adding as few
concepts as necessary to ease the automatic mapping. To extend UML, we use the extensibility mechanisms of
stereotypes and tagged values [25]. The extended UML metamodel is presented in Fig. 7 (the extensions are
represented in italics).
Classes which are not association classes are called ordinary classes. Similarly, associations which are not
association classes are called ordinary associations. The main extensions are the attribute measure of class
Attribute (which indicates if the attribute is a measure of interest), and the attribute identifyingAttribute, which
indicates if an attribute identifies its owner class. The attribute identifyingAttribute is only necessary for the
ModelElement UMLModel
name : Name
1 +UMLModel
1..* +ownedElement
+constrainedElement +constraint
UMLModelElement
* {ordered} *
Constraint
Attribute AssociationEnd Relationship

+owner +attribute measure : Boolean aggregation : AggregationKind
Class
* multiplicity : Multiplicity GeneralizationConstraint
1
{ordered}
1 * +connection
2..* {ordered}
+participant +association
+association
{disjoint,complete}
1
AttributeOfOrdinaryClass AttributeOfAssociationClass Association Generalization
identifyingAttribute : Boolean
* *
1 +generalization
{disjoint,complete}
+child {disjoint,complete}
OrdinaryClass AssociationClass OrdinaryAssociation
1
+parent +specialization
Fig. 7. Extended UML metamodel.

attributes of ordinary classes, since an association class is identified by the n-uple of identifying attributes of the
participating classes.
The four transformations performed on the UML conceptual model in order to ease its subsequent logical
mapping are presented below. Each of the four transformations is applied in turn to the media-planning case study,
explaining how the initial UML model of Fig. 6 is progressively transformed into the final model of Fig. 8. In
addition to the natural language definition, we illustrate the formal specification of the first transformation with
OCL. A transformation is represented with OCL as an operation of the UML model considered, associated with
pre- and post-conditions.
4.1.2.1. Determination of identifying attributes. Since the notion of identifying attribute is not defined in the UML
standard, we need to determine explicitly the identifying attributes of classes in order to be able to define the
dimensions of the multidimensional schema at the logical level. For each ordinary class of the UML model, the
data warehouse designer and the user have to decide which attribute identifies the class. If necessary, a specific
attribute is created in order to identify the class. Identifying attributes are specified using {id} tagged values, added to
each identifying attribute.
Our method does not allow ordinary classes to be identified by a composite identifier. If an ordinary class is
identified by several attributes, the latter are concatenated into a single identifying attribute, thus simplifying
implementation into ROLAP or MOLAP tools. This is in line with assumptions made in related work (for example,
dimension tables in Kimball's star schemas are defined as having a single part primary key [16]). The process of
determining identifying attributes of ordinary classes can be synthesized by the following transformation (semi-
formally defined and then expressed in OCL):
Box 1
Transformation Tcc1: Each attribute of an ordinary class is either an identifying attribute or not.
context UMLModel::Tcc1(ordinaryClass:OrdinaryClass)
post: --All attributes owned by the ordinary class before the
--transformation have a determined status. Among these
--attributes,at most one is an identifying attribute.
ordinaryClass.attribute@pre->forAll(a1:AttributeOfOrdinaryClass |
a1.identifyingAttribute=true or a1.identifyingAttribute=false) and
ordinaryClass.attribute@pre->select(a1:AttributeOfOrdinaryClass |
a1.identifyingAttribute=true)->size()<=1
post: --If no identifying attribute has been found among the
--attributes of the ordinary class, a new identifying 1}
--attribute has been created and inserted as the first
--attribute of the class.
if not ordinaryClass.attribute@pre->exists
(a1:AttributeOfOrdinaryClass|a1.identifyingAttribute=true)
then ordinaryClass.attribute->size()=
ordinaryClass.attribute@pre->size ()+1 and
ordinaryClass.attribute->at(1) .oclIsNew () and
ordinaryClass.attribute->at(1) .identifyingAttribute=true
else true -- "else
endif
In the media-planning case study, all ordinary classes are identified by their first attribute; therefore no special
attribute definition is required. Concerning the generalization hierarchy, the subclasses of Shareholder inherit the
identifying attribute shareholder_name.
4.1.2.2. Determination of attributes representing measures. We differentiate between attributes representing mea-
sures, and attributes expressing qualitative values. As described in Section 3, this distinction is not based on data types
even if measures are generally numeric and qualitative attributes are not. Therefore this differentiation cannot be
performed automatically. The designer and the user have to decide jointly which attributes are measures. Note that this
is unnecessary for identifying attributes determined previously, since an identifying attribute cannot be a measure. In
our UML model, attributes representing measures are specified by the {meas} tagged values. This process can be
synthesized as follows:
Box 2
Transformation Tcc2: Each attribute is either a measure or not.
In our example, all attributes of association classes are defined as measures. These attributes are: per-
centage_of_region, media_exposure, quantity, amount, influence_coefficient and percentage_of_shares. Con-
cerning ordinary classes, the data warehouse designer and user decide that the following attributes should be
considered as measures of interest: advertising_price (class Media) and number_of_inhabitants (class
Region).
4.1.2.3. Migration of association attributes. This step is concerned with N1 and 11 associations having specific
attributes (since they bear attributes, these associations are actually association classes). This case is rarely encoun-
tered in practice. If specific attributes are present in these associations, the designer should first check the validity of
this representation. Even if their presence cannot be questioned, these attributes cannot be mapped into multidimen-
sional schemas by using hierarchies, since multidimensional hierarchies do not contain information. Therefore, N1
association attributes must migrate from the association to the participating class on the N side. 11 association
attributes can indifferently migrate to one of the two participating classes. After migrating the attributes of an N1 or
11 association, the latter is transformed into an ordinary association unless it is itself participating to other
associations. The transformations for migrating association attributes are expressed as follows:
Box 3
Transformation Tcc3a: Each attribute belonging to a 11 association is transferred to either one of the classes
involved in the association.
Transformation Tcc3b: Each attribute belonging to an N1 association is transferred to the N-class, i.e. the class
involved several times in the association.
In the media-planning conceptual model, transformation Tcc3b is applied to the association class between the
classes Region and Target: the attribute percentage_of_region is transferred to Target, and the association class
becomes an ordinary association.
4.1.2.4. Transformation of generalizations. The generalizations of the UML notation cannot be mapped directly
into hierarchies in the multidimensional schema, since the semantics of hierarchies in object-oriented and multi-
dimensional formalisms differ.2 However, we want to preserve the information contained in UML generalization
hierarchies and transform them to enable their automatic mapping into multidimensional hierarchies in the logical
phase. To this end, we transform the generalizations into aggregations and classes following the transformation
suggested by [22] for ER models. We have adapted this transformation to UML and extended it to consider the
2
In object-oriented models, hierarchies are constructed based on generalizationspecialization relationships between classes. On the other hand,
multidimensional hierarchies group dimension levels based on aggregation/composition relationships between these levels.
different cases of incomplete and/or overlapping specialization. The corresponding transformation is described
below:
Box 4
Transformation Tcc4: For each class C such that C is the root of a specialization hierarchy, for each specializa-
tion level Li of C, a class named Type-C-i is created. The occurrences of these classes define all the specializa-
tions of C. In case of overlapping between specializations, a special value is created for each overlapping
between two or more sub-classes of C. In case of incomplete specialization, the special value others is created.
An aggregation is created between class C and each of the classes Type-C-i (Type-C-i is the aggregate, with
multiplicity 1; C is the part, with multiplicity *).
Since Tcc4 transforms the subclasses of class C into the classes Type-C-i, the attributes and/or relationships of these
subclasses are transferred to class C. This strategy may result in information loss and/or null values. However, we find
it preferable than purely omitting the attributes/relationships in the transformed conceptual model. Each of the classes
Type-C-i has only one attribute, which identifies Type-C-i (in order to specify this information, transformation Tcc1
should be applied to all the classes Type-C-i). The mapping of UML generalizations into multidimensional schemas is
a subject in its own right. This issue is developed more thoroughly in [3], which presents a set of transformations for
defining dimension hierarchies from UML generalizations and aggregations.
In the case study, the specialization hierarchy rooted in class Shareholder has two specialization levels. Therefore,
two classes Type-shareholder-1 and Type-shareholder-2 are created and renamed as Shareholder_type and Private_
shareholder_type respectively. The set of occurrences of Shareholder_type (i.e. the set of values of its identifying
attribute shareholder_type) is {private, public, both}. The special value both results from the overlapping
Product_type Region
product_type {id} region {id}
Media_type
1..* 1
1 product_unit number_of_inhabitants {meas} gets
media_type {id}
media_unit
*
* Media *
Product 1
Exposure media_name {id}
1 product_code {id} media_exposure {meas} advertising_price {meas}
product_name 1..*
* 1..* in
*
* Target *
Purchase target_code {id} *
quantity {meas} status
for Main_shareholder
amount {meas} minimum_age
maximum_age percentage_of_shares {meas}
sex
percentage_of_region {meas} 1
*
Influence_of_campaign Shareholder
Year
* influence_coefficient {meas} shareholder_name {id}
year {id} public_shareholder_level
*
manager_name
1
Advertising_campaign
* 1..* * *
campaign_code {id}
Quarter *
*
during quarter {id} 1
* 1 1
Shareholder_type
1 shareholder_type {id} Private_shareholder_type
1..* private_shareholder_type {id}
Date
dd_mm_yy {id}
*
Fig. 8. Enriched/transformed media-planning UML model.

constraint in the initial conceptual model. Similarly, the set of occurrences of Private_shareholder_type is {person,
company, others}. An aggregation is created for Shareholder_type and Private_shareholder_type. The attributes of
the subclasses (attributes manager_name and public_shareholder_level) are transferred to the class Shareholder. The
enriched/transformed conceptual model, resulting from transformations Tcc1 to Tcc4, is represented in Fig. 8.
After applying these transformations, the resulting UML model is ready for logical mapping.
4.2. Logical design
This phase maps the enriched and transformed UML conceptual model into a logical schema expressed with the
concepts of the unified multidimensional metamodel. The logical multidimensional schema is generated by means of
the transformations detailed below. The step-by-step application of the transformations is illustrated with the case study.
The final multidimensional schema, represented with the graphical notation introduced in Section 3, appears in Fig. 9.
Logical design is decomposed into five steps, each step elaborating on the results of the previous steps: (1) Definition
of facts (transformations Tcl1a and Tcl1b). The definition of a fact includes the definition of its measures and of the
dimension levels related to it. (2) Definition of hierarchies (transformation Tcl2). (3) Definition of dimensions
(transformation Tcl3). (4) Definition of dimension level attributes (transformation Tcl4). (5) Definition of the aggrega-
tion functions that may be applied to measures along hierarchies (transformation Tcl5).
The facts and measures of the logical multidimensional schema are defined from the NM and nary associations
of the enriched/transformed UML conceptual model (transformation Tcl1a), and from the ordinary classes of the UML
model which own at least one attribute characterized as a measure (transformation Tcl1b):
Box 5
Transformation Tcl1a: Every NM or nary association of the conceptual model is mapped into a fact in the
logical multidimensional schema. The attributes of the association (if any) are mapped into measures of the fact.
The fact is dimensioned by dimension levels defined by mapping the ordinary classes directly or indirectly
involved in the association.
Transformation Tcl1b: For every conceptual ordinary class with at least one attribute which is a measure of
interest, a fact is defined in the logical multidimensional schema. The attributes of the ordinary class which are
measures of interest, are mapped into measures of the fact. The fact is dimensioned by one dimension level,
defined by mapping the class.
The association mapped by transformation Tcl1a may have one (or several) association class(es) among its
participating classes. In this case, the transformation replaces the association class with its participating classes.
This principle is applied recursively so that ultimately, all the classes directly or indirectly involved in the mapped
association are ordinary classes. These ordinary classes may be mapped directly into dimension levels. When mapping
an ordinary class into a dimension level, transformations Tcl1a and Tcl1b map the identifying attribute of the class into
the identifying attribute of the dimension level. Depending on the semantics of the class, the dimension level is
characterized as temporal or non-temporal.
When the association mapped by transformation Tcl1a is nary, it may have one (or several) association end(s) with
an upper bound (maximal cardinality) of 1. In this case, the corresponding dimensioningi.e. the link between the fact
defined from the association and the dimension level defined from the class linked to the association endis
characterized as non minimal. This information will be useful for subsequent physical implementation. Transforma-
tion Tcl1a applies to the particular case of NM and nary associations without attributes, which corresponds to
Kimball's concept of factless fact table [16].
Applying transformation Tcl1a, the following NM and nary associations of the enriched/transformed media-
planning conceptual model are mapped:
The association Purchase is mapped into a fact, with the measures quantity and amount. The dimension levels
Product, Target and Quarter are created in order to characterize the fact Purchase. These dimension levels are
identified by the attributes product_code, target_code and quarter respectively. The dimension level Quarter
is characterized as temporal.
Similarly, the association Exposure is mapped into a fact, with the measure media_exposure. This fact is
dimensioned by Media, Target and Quarter.
The association Influence_of_campaign between the class Advertising_campaign and the class Purchase is
mapped into a fact. This fact includes the measure influence_coefficient and is dimensioned by the
dimension levels Advertising_campaign, Product, Target and Quarter (since the class Purchase is an
association class, the last three dimension levels correspond to the ordinary classes involved in the associa-
tion class Purchase).
The association Main_shareholder becomes a fact with the measure percentage_of_shares. The fact is
dimensioned by Media, Date (a temporal dimension level) and Shareholder. The dimensioning between
the fact Main_shareholder and the dimension level Shareholder is not minimal (following the multiplicity of
1 in the conceptual model).
The NM association between Region and Media is mapped into the fact Gets, dimensioned by Region and
Media. This fact bears no measure. Similarly, the association between Media and Advertising_campaign is
mapped into the fact In.
In the enriched/transformed media-planning conceptual model, the ordinary classes Region, Target and
Media have an attribute which is a measure of interest. Therefore, applying transformation Tcl1b, the
facts Region_fact, Target_fact and Media_fact are created with the measures number_of_inhabitants, per-
centage_of_region and advertising_price respectively. These facts are dimensioned by the dimension levels
Region, Target and Media respectively (these dimension levels have been created by transformation Tcl1a).
The hierarchies of the logical multidimensional schema are defined from the N1 (and 11) associations of the
enriched/transformed UML conceptual model:
Box 6
Transformation Tcl2: For each dimension level DL dimensioning at least one fact in the logical multidimen-
sional schema, a set of hierarchies is defined. These hierarchies are defined starting from the conceptual ordinary
class OC(DL) corresponding to DL. Each acyclic path of N1 (or 11) associations between ordinary classes
starting from OC(DL), noted OC(DL)OC2OCn, is mapped into a hierarchy DLDL2DLnAll,
where dimension level DLi(i=2,, n) is defined by mapping ordinary class OCi. If no acyclic path of N1 (or 11)
associations links OC(DL) to other ordinary classes, only the hierarchy DLAll is defined.
The dimension levels dimensioning at least one fact are those resulting from transformations Tcl1a and Tcl1b.
In relational terms, a path OC(DL)OC2OCn of N1 or 11 associations between ordinary classes (starting
from OC(DL)) may be interpreted as a path of functional dependencies. Note that only acyclic paths are considered.
Thus, transformation Tcl2 cannot generate cyclic multidimensional hierarchies. This is meant to prevent recursivity
problems when performing aggregations based on the hierarchies. N1 associations are frequently aggregations or
compositions. Therefore, the hierarchies generated by transformation Tcl2 often correspond to aggregation/composi-
tion paths in the UML conceptual model. As a consequence of transformation Tcl2, the dimension level All is the
upper level of all hierarchies. This will enable users of the implemented multidimensional schema to perform
aggregations at the highest possible level.
Applying transformation Tcl2 to each of the dimension levels already defined in the media-planning logical
multidimensional schema, we obtain the following hierarchies:
For Product: ProductProduct_typeAll

For Target: TargetRegionAll
For Quarter: QuarterYearAll
For Media: MediaMedia_typeAll
For Advertising_campaign: Advertising_campaignProductProduct_typeAll and Advertising_cam-

paignQuarterYearAll
For Date: DateQuarterYearAll
For Shareholder: ShareholderShareholder_typeAll and ShareholderPrivate_shareholder_typeAll
For Region: RegionAll.
The new dimension levels resulting from transformation Tcl2 are: All, Product_type, Year, Media_type, Share-
holder_type and Private_shareholder_type. These dimension levels are created with their identifying attribute.
Year is a temporal dimension level.
Dimensions are defined by grouping hierarchies using transformation Tcl3:
Box 7
Transformation Tcl3: For each dimension level DL dimensioning at least one fact in the logical multidimen-
sional schema, a dimension is defined. This dimension is composed of all hierarchies defined for DL.
DL is the base dimension level. By default, the name of the dimension is the name of DL. Another name may be
chosen if it expresses the semantics of the dimension more appropriately.
Applying transformation Tcl3, a dimension is defined for each of the following dimension levels: Product,
Target, Quarter, Media, Advertising_campaign, Date, Shareholder and Region. The dimension defined for each
dimension level comprises all hierarchies defined for this dimension level in transformation Tcl2. The dimension
Date is renamed Time.
Tcl4 completes the definition of dimension levels with their non-identifying attributes:
Box 8
Transformation Tcl4: For each dimension level DL of the logical multidimensional schema, the non-identifying
attributes of DL are defined from the conceptual ordinary class OC(DL) corresponding to DL: each non-
identifying attribute of OC(DL) which is not a measure of interest is mapped into a non-identifying attribute
of DL.
Transformation Tcl4 yields the following non-identifying attributes for the dimension levels of the media-
planning multidimensional schema: For Product: product_name; For Target: status, minimum_age,
maximum_age and sex; For Shareholder: public_shareholder_level and manager_name; For Product_type:
product_unit; For Media_type: media_unit. The graphical representation of the media-planning multidimen-
sional schema, resulting from transformations Tcl1a to Tcl4, is shown in Fig. 9 (for the sake of readability,
the upper level of all hierarchies, i.e. the dimension level All, is not represented).
The last transformation of the logical design phase, Tcl5, determines the aggregation functions that may be applied
to measures along hierarchies. The applicable aggregation functions are presented in a table which complements the
graphical multidimensional schema. Following [20], we distinguish between measures expressing a flow (i.e.
recording a cumulative effect over a period of time), measures expressing a stock (i.e. recording a level at specific
points in time), and measures expressing a value per unit (e.g. the price of an item, an exchange rate or a percentage).
Based on this distinction, transformation Tcl5 contains guidelines adapted from the summarizability rules defined in
[20]. However, these guidelines do not work all the time: in order to determine correctly the aggregation functions that
Media_type
Region Region_fact
Product_type Media_fact
media_type
region
product_type number_of_inhabitants advertising_price media_unit
product_unit Fact
measure 1
Target Gets Media ...
Product measure n
target_code Target_fact media_name
product_code status
product_name minimum_age percentage_of_region Dimension level
maximum_age
sex Main_shareholder identifying attribute
non-identifying attribute 1
Exposure percentage_of_shares ...
Year non-identifying attribute n
Purchase media_exposure
year Temporal dimension level
quantity Shareholder
amount
shareholder_name Base dimension le-
Quarter public_shareholder_level
vel (Dimension)
Influence_of_campaign manager_name
quarter
fact - dimension level
influence_coefficient
Shareholder_type Private_shareholder_type relationship
Date (Time) (dimensioning)
Advertising_campaign shareholder_type private_shareholder_type
dd_mm_yy inter - dimension level
campaign_code relationship
In (classification)
Fig. 9. Media-planning logical multidimensional schema.
may be applied to a measure, the data warehouse designer should carefully consider the semantics of both the measure
and the associated dimension levels.
Box 9
Transformation Tcl5: For each measure M of each fact F in the logical multidimensional schema, for each
dimension Di dimensioning F, the set of aggregation functions applicable along Di (i.e. along the hierarchies of
Di) is determined based on the following principles:
If M expresses a flow, the set of applicable aggregation functions is normally {SUM, AVG, MIN, MAX, MED,
VAR, STDDEV, COUNT}, whatever Di. If M expresses a value per unit, the set of applicable aggregation functions
is normally {AVG, MIN, MAX, MED, VAR, STDDEV, COUNT}, whatever Di. If M expresses a stock, the set of
applicable aggregation functions is normally {SUM, AVG, MIN, MAX, MED, VAR, STDDEV, COUNT} if the
base dimension level of Di is non-temporal, and {AVG, MIN, MAX, MED, VAR, STDDEV, COUNT} otherwise.
The set of applicable aggregation functions may in some cases be more restricted than suggested by these
principles. Therefore, the application of these principles has to be checked against the exact semantics of M for
each dimension level of every hierarchy of all Di; for a given dimension Di, the applicable aggregation functions
may depend on the hierarchies of Di, or even on the levels of the same hierarchy.
By definition, a fact F is dimensioned by a dimension Di if F is dimensioned by the base dimension level of Di. As
mentioned in Section 3, a set of applicable aggregation functions depends on the levels of a hierarchy if the
aggregation functions apply only to the first n levels of the hierarchy.
Due to space limitations, we are not able to detail the applicable aggregation functions for the media-planning
example. They result from applying the guidelines of transformation Tcl5 and then checking their validity:
The measures quantity, amount and media_exposure express a flow; therefore all aggregation functions
normally apply along all dimensions dimensioning these measures. The measures influence_coefficient,
percentage_of_shares, percentage_of_region, and advertising_price express a value per unit; therefore the
functions {AVG, MIN, MAX, MED, VAR, STDDEV, COUNT} apply along all dimensions dimensioning
these measures. Finally, the measure number_of_inhabitants expresses a stock; therefore all aggregation
functions apply to this measure along the dimension Region (Region is a non-temporal dimension level and
the only dimension level which dimensions Region_fact).
Checking the validity of the aggregation functions, we restrict the applicable aggregation functions for
quantity, media_exposure, and advertising_price: the unit for quantity purchased depends on the product
type, therefore along the Product dimension, aggregations are not valid for the last level of the hierarchy
ProductProduct_typeAll. Aggregations are only valid for ProductProduct_type. For example, the
quantity purchased may be summed for all products of type beer, but the quantity of beer may not be
summed with the quantity of books. Similarly, media_exposure and advertising_price depend on the media
unit, which depends on the media type (e.g. television, magazine, ). As a result, media_exposure and
advertising_price may not be aggregated for different media types.
Let us stress that in our design method, applicable aggregation functions are determined for each hierarchy within each
dimension of each measure. This way, we can define applicable aggregation functions accurately, and specify explicitly
when aggregation functions apply only to the first n levels of hierarchies (e.g. ProductProduct_type or MediaMe-
dia_type). Our proposal goes beyond the traditional distinction between additive, semi-additive and non-additive facts,
proposed in [16]. This latter distinction is centered on the SUM function, and makes no explicit distinction between the
hierarchies within a given dimension. In our approach, we can say that a measure is additive (like amount in our case
study) if the SUM function may be used along all dimension levels of all hierarchies of all dimensions for this measure. A
measure is semi-additive (like quantity) if the SUM function may be used only along some hierarchies or parts of
hierarchies. A measure is non-additive (like advertising_price) if the SUM function may not be used at all for this measure.
4.3. Physical design
This phase maps the multidimensional schema (resulting from logical design) into a physical database schema. The
physical schema depends on the target MOLAP or ROLAP tool. Therefore, a set of transformations is defined for each
type of tool. In [30], we have defined transformations for ROLAP star implementation. In this paper, we consider
Oracle MOLAP [4], which is representative of the MOLAP tool category. We present the Oracle MOLAP metamodel
and then the transformations that map a logical multidimensional schema into a physical Oracle MOLAP schema.
In [25], the Oracle MOLAP package is defined as a non-normative extension to the Common Warehouse
Metamodel. The metamodel shown in Fig. 10 is an adapted version, with the main concepts of the Oracle MOLAP tool.
Physical Oracle MOLAP dimensions are the equivalent of logical dimension levels. Variables correspond to logical
measures or dimension level attributes, while logical hierarchies are implemented by means of relations. Both variables
and relations are dimensioned. The dimensions of a variable or a relation are specified between in the definition of
the variable or relation. A relation is a functional dependency between its dimension(s) and its reference dimension.
Dimensions are temporal or non-temporal. Possible types for non-temporal dimensions are ID (a small text, no more than
8 characters), Text (a long text), and Integer (in this case, the values of the dimension will be assigned automatically,
forming a series of consecutive integers). Possible types for temporal dimensions are Day, Week, Month, Quarter and
Year. For variables, types are Boolean, Date, Decimal, ID, Integer, Short decimal, Short integer and Text.
ModelElement 1..* OracleMolapSchema

name : Name
+ownedElement 1
OracleMolapSchemaElement +oracleMolapSchema
Dimension
DimensionedObject * * {ordered} isTime : Boolean
+dimensionedObject +dimension dataType : OracleMolapDimensionType
Variable Relation
* 1
dataType : OracleMolapVariableType
+relation +referenceDimension
Fig. 10. Oracle MOLAP metamodel (adapted from [25]).

The transformations described below (transformations Tle1 to Tle6) map a logical multidimensional schema into a
physical Oracle MOLAP schema, using the concepts of the Oracle MOLAP metamodel. The transformations map
successively: (1) the logical dimension levels (transformation Tle1), (2) the logical measures and facts (transforma-
tions Tle2 to Tle4), (3) the logical hierarchies i.e. their classification relationships (transformation Tle5), and (4) the
logical dimension level attributes (transformation Tle6). We have defined no transformation for mapping the
aggregation functions associated with the measures and hierarchies of the logical multidimensional schema. This is
due to the fact that such a mapping cannot be performed easily into Oracle MOLAP concepts. This mapping is only
possible at the cost of some programming, e.g. using the Oracle MOLAP Objects development environment [4]. The
application of the transformations is illustrated using the media-planning example.
Transformation Tle1 maps the dimension levels of the logical multidimensional schema into dimensions of the
physical Oracle MOLAP schema:
Box 10
Transformation Tle1: Each logical dimension level is mapped into a physical dimension. A non-temporal logical
dimension level is mapped into a non-temporal physical dimension with data type ID, Text or Integer. A temporal
logical dimension level is mapped into a temporal physical dimension with data type Day, Week, Month, Quarter
or Year.
The application of transformation Tle1 to the media-planning logical multidimensional schema yields the
following physical Oracle MOLAP dimensions (the name of each physical dimension is followed by its type;
a physical dimension has the same name as the corresponding logical dimension level, unless the name has to be
changed to comply with the naming constraints of Oracle MOLAP Administrator):
Product_type Text Advert_campaign ID Media_type Text Pri_shareho_type ID

Product ID Year_ Year Media Text All_ ID
Region Text Quarter_ Quarter Shareholder Text
Target ID Date_ Day Shareholder_type ID
Logical measures are mapped by transformation Tle2:
Box 11
Transformation Tle2: Each logical measure is mapped into a physical variable with data type Boolean, Date,
Decimal, ID, Integer, Short decimal, Short integer or Text. The variable is dimensioned by the physical
dimensions defined by mapping the logical dimension levels associated with the measure.
The logical dimension levels related to a measure are the ones dimensioning the fact of this measure.
Note that these dimension levels have been mapped into physical dimensions by transformation Tle1.
We get the following Oracle MOLAP variables for the media-planning example (each variable is specified by its
name, type and dimensioning dimensions between ):
Quantity Short decimal Quarter_ Product Target
Amount Short decimal Quarter_ Product Target
Influence_coeff Short decimal Quarter_ Advert_campaign Product Target
Media_exposure Short decimal Quarter_ Media Target
Percent_of_share Short decimal Date_ Shareholder Media
Nb_inhabitants Integer Region
Percent_region Short decimal Target
Advert_price Short decimal Media
Transformation Tle3 maps logical facts with at least one non-minimal dimensioning:
Box 12
Transformation Tle3: Each logical fact with at least one non-minimal dimensioning, is mapped by defining a
physical relation for each non-minimal dimensioning FDi. The reference dimension of the relation is the physical
dimension defined by mapping the logical dimension level of FDi; the relation is dimensioned by the physical
dimensions defined by mapping the other logical dimension levels dimensioning the fact.
By definition, among the dimension levels dimensioning a logical fact, a dimension level has a non-minimal
dimensioning if it is functionally determined by the others. Therefore, non-minimal dimensionings can be mapped
naturally using the Oracle MOLAP concept of relation.
In the media-planning logical multidimensional schema, the fact Main_shareholder is dimensioned by the
dimension levels Shareholder (with a non-minimal dimensioning), Date and Media. Therefore, we get the
following Oracle MOLAP relation (specified by its name, reference dimension, and dimensioning dimensions
between ): Main_shareholder Shareholder Date_ Media.
Tle4 maps the other logical facts, unless they have at least one measure which is always defined:
Box 13
Transformation Tle4: Every logical fact without (1) at least one non-minimal dimensioning and (2) at least one
measure which is always defined, is mapped into a physical Boolean variable. The variable is dimensioned by the
physical dimensions defined by mapping the logical dimension levels of the fact.
A logical fact with only minimal dimensionings (i.e. a fact not mapped by transformation Tle3) cannot be mapped
into an Oracle MOLAP relation. Consequently, for such a fact, we define a dummy Boolean variable in Oracle
MOLAP. For each set of values of the dimension levels dimensioning the fact, the corresponding instance of the
dummy variable will indicate whether the instance of the fact exists or not. Note that if a logical fact has at least one
measure, this measure has been mapped into an Oracle MOLAP variable by transformation Tle2. If the measure is
always defined (i.e. it has the same lifetime as the fact), the variable is defined whenever the fact is defined. Therefore,
in this case, a dummy variable is not necessary.
From the media-planning logical multidimensional schema, only the fact Main_shareholder has been mapped by
transformation Tle3. Among the remaining facts, the facts Gets and In have no measure (all measures of all other
facts are always defined). Therefore, a dummy variable is defined for the fact Gets, in order to indicate which
regions get which media. Similarly, a dummy variable is defined for the fact In: Gets Boolean Media Region,
In Boolean Advert_campaign Media.
Transformation Tle5 maps logical hierarchies by mapping their classification relationships:
Box 14
Transformation Tle5: Every logical classification relationship DLiDLi+1 is mapped into a physical relation.
The reference dimension of the relation is the physical dimension defined by mapping the logical dimension level
DLi+1; the relation is dimensioned by the physical dimension defined by mapping the logical dimension level
DLi.
In the media-planning logical multidimensional schema, the following classification relationships have been
defined (in transformation Tcl2): ProductProduct_type, Product_typeAll, TargetRegion, RegionAll,
QuarterYear, YearAll, MediaMedia_type, Media_typeAll, DateQuarter, Advertising_campaign
Product, Advertising_campaignQuarter, ShareholderShareholder_type, Shareholder_typeAll, Share-
holderPrivate_shareholder_type and Private_shareholder_typeAll. An Oracle MOLAP relation is defined
for each of these classification relationships:
Type.product Product_type Product Product.campaign Product Advert_campaign

All.product_type All_ Product_type Quarter.campaign Quarter_ Advert_campaign
Region.target Region Target Quarter.date Quarter_ Date_
All.region All_ Region Share_type.share Shareholder_type Shareholder
Year.quarter Year_ Quarter_ All.share All_ Shareholder_type
All.year All_Year_ Pri_type.share Pri_shareho_type Shareholder
Type.media Media_type Media All.private All_ Pri_shareho_type
All.media_type All_ Media_type
Finally, transformation Tle6 maps logical dimension level attributes. Note that transformation Tle1 deals with the
identifying attributes of dimension levels (the values of each physical dimension defined by transformation Tle1 will
be the values of the identifying attribute of the corresponding logical dimension level). Consequently, Tle6 maps non-
identifying dimension level attributes only:
Box 15
Transformation Tle6: Every logical non-identifying dimension level attribute is mapped into a physical variable
with data type Boolean, Date, Decimal, ID, Integer, Short decimal, Short integer or Text. The variable is
dimensioned by the physical dimension defined by mapping the logical dimension level containing the given
attribute.
Applying transformation Tle6 to each non-identifying dimension level attribute of the media-planning logical
multidimensional schema, we get the following variables:
Product_unit Text Product_type Sex ID Target

Product_name Text Product Media_unit Text Media_type
Status_ ID Target Pub_share_level ID Shareholder
Minimum_age Short integer Target Manager_name Text Shareholder
Minimum_age Short integer Target
Thanks to these transformations, the definition of the Oracle MOLAP schema in the Oracle MOLAP command
language can be generated from the logical multidimensional schema, in a quasi automated way. The generation of
dimensions is straightforward. The process also generates new variables, going beyond the explicit requirements, such
as the variable Gets which materializes the relationship between Regions and Medias. This example illustrates the
richness of the conceptual modeling aspects. It would not have been easy to obtain this variable without relying on the
conceptual modeling process. Moreover, the choice of implementing the dimension level All results in a list of
relations All.*(All.region, All.year, etc.).
4.4. Discussion
To the best of our knowledge, very few detailed data warehouse design methods have been proposed to date.
The most comprehensive method we have found is the one proposed in [21]. This method and ours have both
adopted the OO paradigm (UML and OCL). They both encompass classical phases of design, development and
implementation. However, they differ in the way of tackling abstraction levels: [21] merges logical and physical
design phases whereas we propose to adapt the three ANSI/X3/SPARC abstraction levels to multidimensional
modeling. Moreover, we consider that multidimensional concepts are not suitable for conceptual modeling. [21]
provides a way of defining users' views (Business Models) which are not explicitly taken into account in our
approach. Finally, [21] defines an ETL process which is not yet implemented in our method. Summing up, the
contribution of this paper can be characterized as follows. We provide (1) a unified multidimensional metamodel,
(2) a formalization of transformation rules between metamodels and (3) a high level of standardization based on
the CWM proposal. Our method spans conceptual, logical and physical design. In particular, the same logical
multidimensional schema may be implemented in different physical environments. Our method adopts a top-down
approach starting from user requirements. However, it also considers operational data sources, in the data
confrontation phase.
5. Conclusion and further research aggregation functions during logical design, and the
definition of data types during physical design). A
We have described a comprehensive UML-based CASE tool is under prototyping. The tool uses Java
method for data warehouse design. Capitalizing on along with OMG's XMI standard (XML Metamodel
transactional database design techniques, we proposed Interchange) for the definition and storage of the
a conceptual design phase aimed at defining a UML logical multidimensional schema. Our CASE tool
model, followed by an enrichment/transformation of aims at supporting data warehouse designers, based
this model. The resulting conceptual model is then on our method and on the quality metrics presented in
mapped into a logical multidimensional schema, [35].
which may then be mapped into any physical Several questions remain open. The mapping trans-
MOLAP or ROLAP platform, as illustrated with the formations need to be enriched. The method has to be
Oracle MOLAP environment. The content of the con- further tested on extensive case studies. Non-data-
ceptual, logical and physical design phases was oriented requirements need to be handled; this can
detailed by presenting the metamodels and the asso- be achieved by exploiting the dynamic concepts of
ciated transformations. Our design method was applied UML, e.g. sequence diagrams for the definition of
to a case study. OLAP operations such as rotate, pivot, slice or dice
Elaborating from previous research, we have [21]. Lastly, the transformations could be formalized
defined a unified multidimensional metamodel. This with CWM's Transformation package, with the aim of
metamodel is used to represent the logical multidimen- tracing mappings between the different design phases.
sional schema in our data warehouse design method. We are currently working on these issues, as well as
The physical data warehouse schema results from on the derivation of data marts from the data ware-
mapping this logical schema into a physical multidi- house and the integration of ETL processes in the
mensional (MOLAP) or relational (ROLAP) database design phases.
schema. The resulting database is accessible not only
to OLAP tools, but also to other decision support Acknowledgements
tools, e.g. data mining. Furthermore, although this
perspective is outside the scope of the present paper, The authors are grateful to the reviewers for their
our unified multidimensional metamodel could be used useful and helpful comments.
in the context of MOLAP or ROLAP tools, in order to
provide decision makers with a standard, high-level References
view to the data warehouse. The need for a unified,
MOLAP and ROLAP-independent multidimensional [1] A. Abello, J. Samos, F. Saltor, Benefits of an object-oriented
formalism within OLAP tools is today widely multidimensional data model, ECOOP 2000 Symposium on
Objects and Databases, Sophia-Antipolis, France, 2000
acknowledged (the Unified Dimensional Model,
(June).
included in the 2005 version of Microsoft SQL Server, [2] R. Agrawal, A. Gupta, S. Sarawagi, Modeling multidimensional
is a recent example). databases, 13th International Conference on Data Engineering
The transformations of our data warehouse design (ICDE '97), Birmingham, UK, 1997 (April).
method may be semi-automated, more specifically in [3] J. Akoka, I. Comyn-Wattiau, N. Prat, Dimension hierarchies
design from UML generalizations and aggregations, 20th Inter-
the logical and physical design phases. Human inter-
national Conference on Conceptual Modeling (ER 2001), Yoko-
action is required to validate the step-by-step applica- hama, Japan, 2001 (Nov.).
tion of the transformations, or to provide information [4] S. Arkhipenkov, D. Golubev, Oracle Express OLAP, Charles
(this information concerns mainly the applicable River Media, 2002.
[5] M. Blaschka, C. Sapia, G. Hfling, B. Dinter, Finding your way [25] Object Management Group, OMG Modeling and Metadata Spe-
through multidimensional data models, DEXA Workshop on cifications, http://www.omg.org/technology/documents/modeling_
Data Warehouse Design and OLAP Technology (DWDOT spec_catalog.htm.
'98), (Vienna, Austria), 1998 (Aug.). [26] Y.-T. Park, An Empirical Investigation of the Effects of Data
[6] A. Bonifati, F. Cattaneo, S. Ceri, A. Fuggetta, S. Paraboschi, Warehousing on Decision Performance, Information and Man-
Designing data marts for data warehouses, ACM Transactions on agement, in press (available online at www.sciencedirect.com,
Software Engineering and Methodology 10 (4) (2001 (Oct.)) April 2005).
452483. [27] T.B. Pedersen, C.S. Jensen, Multidimensional data modeling for
[7] J.W. Buzydlowski, I.-Y. Song, L. Hassell, A framework for complex data, 15th International Conference on Data Engineer-
object-oriented on-line analytical processing, 1st ACM Work- ing (ICDE '99), (Sydney, Australia), 1999 (March).
shop on Data warehousing and OLAP (DOLAP '98), (Washing- [28] V. Peralta, R. Ruggia, Using design guidelines to improve data
ton DC, USA), 1998 (Nov.). warehouse logical design, 5th International Workshop on Design
[8] L. Cabibbo, R. Torlone, A logical approach to multidimensional and Management of Data Warehouses (DMDW 2003), (Berlin,
databases, 6th International Workshop on Extending Database Germany), 2003 (Sept.).
Technology (EDBT '98), (Valencia, Spain), 1998, (March). [29] C. Phipps, K.C. Davis, Automating data warehouse conceptual
[9] L. Carneiro, A. Brayner, X-META: a methodology for data schema design and evaluation, 4th International Workshop on
warehouse design with metadata management, 4th International Design and Management of Data Warehouses (DMDW 2002),
Workshop on Design and Management of Data Warehouses (Toronto, Canada), 2002 (May).
(DMDW 2002), (Toronto, Canada), 2002 (May). [30] N. Prat, J. Akoka, From UML to ROLAP Multidimensional
[10] S. Chaudhuri, U. Dayal, An overview of data warehousing and Databases Using a Pivot Model, 18mes Journes Bases de
OLAP technology, SIGMOD Record 26 (1) (1997 (March)) Donnes Avances, (Evry, France, Oct. 2002), http://faculty.
6574. essec.fr/n.prat/BDA02_Prat_Akoka.pdf.
[11] A. Datta, H. Thomas, The cube data model: a conceptual model [31] M. Rafanelli, F. Ricci, Proposal of a logical model for statistical
and algebra for on-line analytical processing in data warehouses, databases, 2nd International Workshop on Statistical Database
Decision Support Systems 27 (3) (1999 (Dec.)) 289301. Management (SSDBM '83), (Los Altos, California), 1983
[12] S.R. Gardner, Building the data warehouse, Communications of (Sept.).
the ACM 41 (9) (1998 (Sept.)) 5260. [32] C. Sapia, M. Blaschka, G. Hfling, B. Dinter, Extending the E/R
[13] M. Golfarelli, S. Rizzi, A methodological framework for data Model for the multidimensional paradigm, International Work-
warehouse design, 1st ACM Workshop on Data Warehousing shop on Data Warehousing and Data Mining (DWDM '98) in
and OLAP (DOLAP '98), (Washington DC, USA), 1998 (Nov.). conjunction with the 17th Int. Conf. on Conceptual Modeling
[14] B. Hsemann, J. Lechtenbrger, G. Vossen, Conceptual data (ER '98), (Singapore), 1998 (Nov.).
warehouse design, 2nd International Workshop on Design and [33] Z. Shi, Y. Huang, Q. He, L. Xu, S. Liu, L. Qin, Z. Jia, J. Li, H.
Management of Data Warehouses (DMDW 2000) (Stockholm, Huang, L. Zhao, MSMiner A Developing Platform for OLAP,
Sweden), 2000 (June). Decision Support Systems, in press (available online at www.
[15] M. Jarke, M. Lenzerini, Y. Vassiliou, P. Vassiliadis, Fundamen- sciencedirect.com, Dec. 2004).
tals of Data Warehouses, second ed., Springer-Verlag, 2003. [34] J.P. Shim, M. Warkentin, J.F. Courtney, D.J. Power, R. Sharda,
[16] R. Kimball, The Data Warehouse Toolkit, John Wiley & Sons, 1996. C. Carlsson, Past, present, and future of decision support
[17] Klasse Objecten, OCL Checker, version 0.3, (Feb. 2001), http:// technology, Decision Support Systems 33 (2) (2002 (June))
www.klasse.nl/ocl/ocl-checker.html. 111126.
[19] W. Lehner, J. Albrecht, H. Wedekind, Normal forms for multi- [35] S. Si-Sad Cherfi, N. Prat, Multidimensional schemas quality:
dimensional databases, 10th International Conference on Statis- assessing and balancing analyzability and simplicity, Interna-
tical and Scientific Database Management (SSDBM '98), (Capri, tional Workshop on Conceptual Modeling Quality (IWCMQ
Italy), 1998 (July). '03) in conjunction with the 22nd International Conference on
[20] H.-J. Lenz, A. Shoshani, Summarizability in OLAP and statis- Conceptual Modeling (ER '03), (Chicago, Illinois, USA), 2003
tical data bases, 9th International Conference on Statistical and (Oct.).
Scientific Database Management (SSDBM '97), (Olympia, [36] The OLAP Report Market share analysis, (2005), http://www.
Washington, USA), 1997 (Aug.). olapreport.com/Market.htm.
[21] S. Lujn-Mora, J. Trujillo, A comprehensive method for data [37] N. Tryfona, F. Busborg, J.B. Christiansen, starER: a conceptual
warehouse design, 5th International Workshop on Design and model for data warehouse design, 2nd ACM Workshop on Data
Management of Data Warehouses (DMDW 2003), (Berlin, Ger- Warehousing and OLAP (DOLAP '99), (Kansas City, USA),
many), 2003 (Sept.). 1999 (Nov.).
[22] D.L. Moody, M.A.R. Kortink, From enterprise models to dimen- [38] A. Tsois, N. Karayannidis, T. Sellis, MAC: conceptual data
sional models: a methodology for data warehouse and data mart modeling for OLAP, 3rd International Workshop on Design and
design, 2nd International Workshop on Design and Management of Management of Data Warehouses (DMDW 2001), (Interlaken,
Data Warehouses (DMDW 2000), (Stockholm, Sweden), 2000 Switzerland), 2001 (June).
(June). [39] P. Vassiliadis, T. Sellis, A survey of logical models for
[23] T.B. Nguyen, A.M. Tjoa, R. Wagner, An object-oriented multi- OLAP databases, SIGMOD Record 28 (4) (1999 (Dec.))
dimensional data model for OLAP, 1st International Conference 6469.
on Web-Age Information Management (WAIM), 2000. [40] H.J. Watson, C. Fuller, T. Ariyachandra, Data warehouse govern-
[24] T. Niemi, J. Nummenmaa, P. Thanisch, Constructing OLAP ance: best practices at Blue Cross and Blue Shield of North
cubes based on queries, 4th ACM Workshop on Data warehous- Carolina, Decision Support Systems 38 (3) (2004 (Dec.))
ing and OLAP (DOLAP '01), (Atlanta, USA), 2001 (Nov.). 435450.
Nicolas Prat holds a PhD in Information Systems from the University Isabelle Comyn-Wattiau received an Engineer Degree in Computer
of Paris 9 and is an alumnus from ESSEC Graduate School of Busi- Science from the Institut d'Informatique d'Entreprise in Paris, an M.S.
ness Administration. He is currently Assistant Professor of Informa- and a PhD degree in Computer Science from the University of Paris 6.
tion Systems at ESSEC Business School. His research and publication She is currently Professor of Information and Computer Systems at the
areas include information systems design, DSS, data warehouses, Conservatoire National des Arts et Mtiers in Paris. She has published
knowledge management and case-based reasoning. more than 70 journal and conference papers on information and
database systems. Her research interests include information systems
design, data warehouse design and database integration.
Jacky Akoka received an M.S. degree in Computer Science and the
Doctoral degree in Operations Research from the University of Paris
6, and a PhD degree in Management Information Systems from the
Sloan School of Management at MIT. He is currently Professor of
Information Systems and holds the Chair of Information Systems at
the Conservatoire National des Arts et Mtiers in Paris. He has
published over 120 conference and journal papers on information
and decision systems, including in the DSS journal. His research
interests include information systems methodologies, DSS and data
warehouse design and implementation.

A UML-based Data Warehouse Design Method PDF

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

A UML-based Data Warehouse Design Method PDF

Hochgeladen von

Copyright:

Verfügbare Formate

Decision Support Systems 42 (2006) 1449 1473

A UML-based data warehouse design method

1. Introduction systems are based on a multidimensional model. This

6 March 02 9 sale amount

Fig. 1. Multidimensional representation of data.

Fig. 2. Unified multidimensional metamodel.

Week City (Geography) Base dimension level (Dimension)

fact - dimension level

inter - dimension level

Fig. 3. Graphical notation for multidimensional schemas.

Fig. 4. Representing multidimensional schemas in standard UML notation.

4. The data warehouse design method

Enriched/transformed UML model

LOGICAL Logical mapping

Fig. 5. The four phases of the data warehouse design method.

4.1. Conceptual design

4.1.1. Initial UML model definition

Fig. 6. Initial UML conceptual model for the media-planning example.

4.1.2. Enrichment/transformation of the UML model

Attribute AssociationEnd Relationship

OrdinaryClass AssociationClass OrdinaryAssociation

Fig. 7. Extended UML metamodel.

Transformation Tcc2: Each attribute is either a measure or not.

Fig. 8. Enriched/transformed media-planning UML model.

4.2. Logical design

For Product: ProductProduct_typeAll

For Advertising_campaign: Advertising_campaignProductProduct_typeAll and Advertising_cam-

Dimensions are defined by grouping hierarchies using transformation Tcl3:

Fig. 9. Media-planning logical multidimensional schema.

4.3. Physical design

ModelElement 1..* OracleMolapSchema

Fig. 10. Oracle MOLAP metamodel (adapted from [25]).

Product_type Text Advert_campaign ID Media_type Text Pri_shareho_type ID

Logical measures are mapped by transformation Tle2:

Transformation Tle5 maps logical hierarchies by mapping their classification relationships:

Type.product Product_type Product Product.campaign Product Advert_campaign

Product_unit Text Product_type Sex ID Target

Das könnte Ihnen auch gefallen