www.elsevier.com/locate/datak
a
Universidad Catolica Argentina, Argentina
b
Department of Computer Science, Universidad de Buenos Aires, Buenos Aires, Argentina
Abstract
Enhancing multidimensional database models with aggregation hierarchies allows viewing data at dif-
ferent levels of aggregation. Usually, hierarchy instances are represented by means of so-called rollup
functions. Rollups between adjacent levels in the hierarchy are given extensionally, while rollups between
connected non-adjacent levels are obtained by means of function composition. In many real-life cases, this
model cannot capture accurately the meaning of common situations, particularly when exceptions arise.
Exceptions may appear due to corporate policies, unreliable data, or uncertainty, and their presence may
turn the notion of rollup composition unsuitable for representing real relationships in the aggregation
hierarchies. In this paper we present a language allowing augmenting traditional extensional rollup func-
tions with intensional knowledge. We denoted this language IRAH (Intensional Redenition of Aggre-
gation Hierarchies). Programs in IRAH consist in redenition rules, which can be regarded as patterns for:
(a) overriding natural composition between rollup functions on adjacent levels in the concept hierarchy; (b)
canceling the eect of rollup functions for specic values. Our proposal is presented as a stratied default
theory. We show that a unique model for the underlying theory always exists, and can be computed in a
bottom-up fashion. Finally, we present an algorithm that computes the revised dimension in polynomial
time, although under more realistic assumptions, complexity becomes linear on the number of paths in the
hierarchy of the dimension instance.
2002 Elsevier Science B.V. All rights reserved.
Keywords: Data warehousing; OLAP; Dimensions; Hierarchies; Belief revision; Default logic
*
Corresponding author. Tel./fax: +5411-4902-0421.
E-mail addresses: mminuto@uca.edu.ar (M.M. Espil), avaisman@dc.uba.ar (A.A. Vaisman).
0169-023X/03/$ - see front matter 2002 Elsevier Science B.V. All rights reserved.
doi:10.1016/S0169-023X(02)00181-7
226 M.M. Espil, A.A. Vaisman / Data & Knowledge Engineering 45 (2003) 225256
1. Introduction
The development of tools for OLAP (On-line Analytical Processing) has been calling the at-
tention of the database community in the last six years. In models for OLAP, [13] data is rep-
resented as a set of dimensions and facts. Facts are seen as points in a multidimensional space with
coordinates in each dimension and an associated set of measures. Dimensions provide appropriate
contextual meaning to facts [4], and are usually organized as hierarchies, supporting dierent
levels of data aggregation. A dimension schema is represented as a directed acyclic graph, where
the nodes are the dimension levels, with a unique bottom level, and a unique distinguished top
level, denoted All. Each level is associated with a set of coordinate values. An instance of a di-
mension is given by a functional relationship dened extensionally between the coordinates in two
consecutive levels.
The model described above cannot express situations where exceptions occur [5]. Information
in a data warehouse does not always reect unquestionable facts in an organization. Sometimes,
data should be considered, at least partially, as representing simple beliefs about the state of the
world it intends to model. Thus, data in the warehouse may not be entirely reliable, and could be
subject to revision based on the analysts information or intuition about possible exceptions.
For instance, in an insurance company, all customers may be considered as Reliable if they
are between 40 and 50 years old, except those who have been ned for driving at high speed more
than once. As another example, exceptions arise when rating nancial investments, as we will
show in the next section. Thus, when a hierarchy is built from imprecise knowledge, exceptions are
likely to appear.
1.1. Motivation
A credit company maintains an operational database holding information about its loans,
organized as follows: the loan identication code, the borrower identication, the identier of the
branch that approved the credit, the approval date, and the amount of the loan. The approved
loans are depicted in the fact table of Fig. 1, call it Loans.
Here, borrowerId, branchId and date represent the bottom levels of the dimensions Borrower,
Branch and Time, respectively. Amount is a measure. Fig. 2(a) depicts the schema of dimension
Borrower, and Fig. 2(b) shows an instance of this dimension.
The arrows in Fig. 2(b) dene aggregation rules. For instance, the arrows linking borrowerId b3
with category B, and category B with grade Standard can be seen as rules stating: All loans to
borrower b3 should be assigned category B, and All loans to a borrower with category B should
be given grade Standard.
Each level in dimension Borrower is described by attributes. Let us assume that for level bor-
rowerId, attributes name and income are dened. Each category in level category corresponds to
an income between two values represented by attributes lower and upper. Finally, level grade is
described by attributes lower and upper, which dene the bounds for the interest rates corre-
sponding to these grades.
Example 1. Let us suppose the following query: List the total amount of loans summarized by
grade. According to the hierarchy of Fig. 2, borrowers b2 and b3 will contribute to category B,
which receives a total of $253,000, if we consider the fact table Loans, and category B will con-
tribute to grade Standard, which receives the same amount. However, assume that although the
income of customer b3 determines that she belongs to category B (i.e. her loans will contribute to
grade Standard) we are interested in giving her loans a better grade, say, Good. Thus, we dene the
following exception: borrower b3 must be graded Good. In this case, grade Good will totalize
$265,000, while grade Standard will totalize only $3000.
In order to tackle situations like the one presented in Example 1, we must be able to introduce
additional rules, expressing exceptions like the ones introduced above. For instance, we may
introduce a rule like:
borrowerId : b3 =grade : Good
228 M.M. Espil, A.A. Vaisman / Data & Knowledge Engineering 45 (2003) 225256
meaning that all loans from borrower b3 must be graded Good, leading to modify the extension for
dimension Borrower in Fig. 2 to a new extension, as shown in Fig. 3, the dashed lines indicating
the new path which will be followed by the aggregation algorithm when computing a query. Thus,
the rule above turns the rules representing the arrows in Fig. 2(b) as mere defaults when applying
them to loans for borrower b3 , overriding the rule All loans to a borrower with category B
should be given grade Standard.
1.2. Contributions
In this paper we present a rule language that allows to intensionally express redenitions of
aggregations hierarchies like the one in Fig. 2. We denote this language IRAH (standing for
Intensional Redenition of Aggregation Hierarchies). IRAH also provides a way of handling
contradictory exception expressions. Formally, rules in IRAH are restricted forms of normal
default schemas, and our proposal considers IRAH statements and dimension instances together
as a stratied prioritized default theory. We show that, given an IRAH program, there exists a
unique model for the underlying theory, which denes the program semantics and can be com-
puted in a bottom-up fashion.
We explained above that a dimension instance can be seen as an extension of a dimension
schema. Thus, in the presence of exceptions, this extension must be revised. We use IRAH rules
for dening the set of exceptions that hold over a dimension instance, and present an algorithm
which computes the revised dimension. We show that the time complexity of this algorithm is
polynomial on the number of paths in the dimension instance (i.e. it lies within PTIME), although
M.M. Espil, A.A. Vaisman / Data & Knowledge Engineering 45 (2003) 225256 229
under realistic assumptions, it is linear on the number of those paths. As we will discuss later in
this paper, other works deal with irregular hierarchies produced at design time. However, to the
best of our knowledge, our work is the rst one dealing with dimension instance revision, after the
design has been made (i.e. at production time).
The remainder of this paper is organized as follows: In Section 2 we present the model, briey
reviewing concepts of default reasoning. In Section 3 we introduce the IRAH language. In Section
4 we address the semantics of revision. In Section 5 we present an algorithm for revising exten-
sions of dimension hierarchies, a comprehensive example and a discussion on the algorithms
complexity. Section 6 compares our approach with related work. We conclude in Section 7 and
propose future lines of research.
2. The model
Multidimensional databases are usually presented as base collections of concrete facts and
dimension instances, stored in relational implementations called base fact tables and dimension
tables [2,6,7], respectively. These approaches, however, do not capture accurately the notion of
declarative revisions dening exceptions. In order to override this drawback of the classical ap-
proach, we present the multidimensional model from a a formal, many-sorted, axiomatic and
proof-theoretic perspective.
2.1. Facts
Let us consider again the example presented in Section 1. Facts (in our example, loans) are
almost meaningless per se, i.e. they only behave as identiers of their own existence as instances of
some class Loans (like object IDs behave in object-oriented models). In our approach, facts are
considered as pure logical concepts, no matter their nature or class. We dene a general sort A,
mathematically representing the (countable) set (its carrier) of all abstract facts. In our running
example, loans l1 to l4 are abstract facts, members of the carrier set of sort A. Classication of
abstract facts is provided by predicates which we call class predicates.
Denition 1 (Fact classification). A class predicate with a term of sort A as argument is dened
for each possible classication of facts, identifying the class an abstract fact belongs to. We dene
fact classications as ground atoms formed upon class predicates. From our logical perspective,
an abstract fact a belongs to some class C if and only if Ca holds.
Facts are usually presented associated with measures that provide a description of their intrinsic
value for analysis. More than one measure may appear associated to a fact, and the nature of
these measures may not be necessarily uniform. As it was the case with facts, we abstract measures
as sorts, each sort representing a carrier set of admitted measure values.
230 M.M. Espil, A.A. Vaisman / Data & Knowledge Engineering 45 (2003) 225256
Denition 2 (Fact valuation). We associate facts with measures by means of a family of para-
metric predicates ValuateM (where M is a measure sort), with signature A ! M. We call any
ground atom formed upon the predicate ValuateM , a fact valuation. We say an abstract fact a has
a value m as its M measure, if and only if ValuateM a; m holds.
2.2. Dimensions
Facts are given their real meaning when presented associated with dimensions for analysis. As
we have said before, dimension instances are usually presented in the form of tables. In our
framework we provide a logical counterpart to dimension tables, allowing reasoning about fact
aggregation according to the contents of dimension instances. Let us consider the following sorts
and their associated nite sets: a sort D of dimension names, a sort L of level names, and a sort C
of coordinate values.
Example 3. Let us consider the dimension Borrower in Section 1. A schema for dimen-
sion Borrower is the pair LevelSetCustomer ; Customer , such that LevelSetCustomer fborrowerId;
category; grade; Allg and Customer is dened according to the precedence relation depicted in Fig. 2.
We can also produce possible schemas for dimensions Branch and Time; a schema for dimension
M.M. Espil, A.A. Vaisman / Data & Knowledge Engineering 45 (2003) 225256 231
Branch is a pair LevelSetBranch ; branch , where LevelSetBranch fbranchId; state; country; Allg,
and Branch is given by the precedence relation branchId Branch state, state Branch country,
country Branch All; a schema for dimension Time is the pair LevelSetTime ; Time , such that
LevelSetTime fdate; month; year; Allg, and Time is dened by the usual precedence relation
date Time month, month Time year, year Time All.
Denition 5 (Level attributes). Let T be a data sort like Integer, Character, String, Date, and so
on. A set of functions, call them generically Descd:l (standing for level descriptors for level d : l),
with signature Desc: Isetd : l ! T , is provided in our model in order to admit attributes that
describe coordinates in levels.
Example 4. The attributes name and income in the table representing dimension Borrower are
examples of level descriptors. The rst row in this table reads in our model: nameb1
\J: Smith", incomeb1 90; 000.
We still need a mechanism for associating facts with coordinates in all the dimensions de-
scribing it, at a certain level of aggregation detail. The following denition allows performing this
association, and abstracting it as generally as possible.
Example 5. In our running example, the following atoms are examples of fact aggregations,
corresponding to the rst row in Table Loans: aggrl1 ; Borrower; borrowerId; b1 ; aggrl1 ; Branch;
branchId; 2; aggrl1 ; Time; date; 11=2000.
Fact classications, fact valuations, and fact aggregations provide a good way of representing
information on classication of facts, and associations between facts, measures, and level mem-
bers. Nevertheless, because our approach is proof-theoretic, we must prove that fact classica-
tions, fact valuations or fact aggregations hold, in order to consider them as representing true
information in the modeled world. Fact classications, fact valuations, and fact aggregations for
the bottom level of a dimension can be considered as proved, because they are not derived from
any previous knowledge. Fact aggregations for non-bottom levels of a dimension must be de-
duced from the contents of the dimension instance. Hence, we need a proof-theoretic inference
mechanism that allows determining which fact aggregations hold for any non-bottom level of the
dimension. In this work, we have chosen to represent relationships (rollups) between coordinates
in consecutive levels of the dimension hierarchy as normal default rules [8,9], and base our model
of dimension instances on extensions of default theories [10] and the concept of priorities among
them [11]. We will briey review these concepts.
232 M.M. Espil, A.A. Vaisman / Data & Knowledge Engineering 45 (2003) 225256
for every position k in p, the default pk is applicable to the set Inputp1; k 1.
We say that a process p is generated by a strict well order if and only if, for all positions i in
p, pi is the minimal default w.r.t that occurs in pi; 1 and is applicable to Inputp1; i 1.
Finally, a set of beliefs E is called a prioritized extension of a theory T W ; D w.r.t. a partial
order <, if and only if there exists a strict well order on the defaults in D which contains <, and
E Inputp for some process p generated by .
As we have pointed out above, a normal default theory is a pair T W ; D, where W is a set of
rst order sentences called axioms that provides a basis for logical inference, and D is a set of
normal default rule schemas that denes the inference mechanism. We thus create a theory T d,
for a dimension d in D, in order to prove fact classications, fact valuations, and fact aggrega-
tions, and call it a dimension theory.
Denition 7 (Axiom set). Let d be a dimension in D. We dene the axiom set for theory T d, as
the set W d faggrX ; d; lbottom ; b, such that lbottom is the bottom level of dimension d; b 2
Isetd : lbottom g.
W d contains fact aggregation schemas (with abstract fact variable X ) for the bottom level of
dimension d.
M.M. Espil, A.A. Vaisman / Data & Knowledge Engineering 45 (2003) 225256 233
Note. In the rest of the paper, constants will be shown in bold type when necessary.
Example 6. Let us analyze Fig. 2 again. The rollup Borrower : category : B 7! Borrower :
grade : Standard is mapped to the following aggregation rule:
aggrX ; Borrower; category; B
aggrX ; Borrower; grade; Standard ^ lBorrower:grade:Standard
where lBorrower;grade;Standard :aggrX ; Borrower; grade; Good ^ :aggrX ; Borrower; grade; Poor.
Rule schemas of the form:
aggrX ; d; lbottom ; a
;
aggrX ; d; All; all ^ ld:All:all
1
where lbottom is the bottom level of dimension d, are called implicit aggregation rule schemas.
Denition 8 (Rule set). Let d be a dimension in D. We dene Dd, the rule set for theory T d, as
the set of all aggregation rules dened for dimension d.
1
Note that lbottom and All may not be consecutive levels.
234 M.M. Espil, A.A. Vaisman / Data & Knowledge Engineering 45 (2003) 225256
nlbottom
S P d T
nli lj 6 d li nlj [ faggrp; d; li ; c j aggrp; d; li ; c 2 e; e is a prioritized extension of
nlj ; Dd : li w:r:t d ; lj d li g
where Dd : li stands for the subset of Dd containing only level d : li -aggregation rule schemas.
M.M. Espil, A.A. Vaisman / Data & Knowledge Engineering 45 (2003) 225256 235
We say that an aggregation fact aggrp; d; l; c is proved for a dimension instance Id if and
only if aggrp; d; l; c 2 M, where M is a model for Id.
Example 8. According to the instance depicted in Fig. 2(b), the following aggregation rule
schemas (call them defaults d1 , d2 and d3 ) are members of the set DBorrower : grade:
Default d1
aggrX ; Borrower; category; A
aggrX ; Borrower; grade; Good ^ lBorrower:grade:Good
Default d2
aggrX ; Borrower; category; B
aggrX ; Borrower; grade; Standard ^ lBorrower:grade:Standard
Default d3
aggrX ; Borrower; category; C
aggrX ; Borrower; grade; Poor ^ lBorrower:grade:Poor
Suppose now that all the fact aggregations have already been proved for levels borrowerId and
category; then, the following aggregation facts are already proved, and, because of the inductive
denition of ncategory , they are members of nBorrower:category :
aggrSkolemBorrower; borrowerId; b1 ; Borrower; category; A;
aggrSkolemBorrower; borrowerId; b2 ; Borrower; category; B;
aggrSkolemBorrower; borrowerId; b3 ; Borrower; category; B;
aggrSkolemBorrower; borrowerId; b4 ; Borrower; category; C:
Now, defaults d1 , d2 and d3 in DBorrower : grade become applicable. The set nBorrower:grade is
dened in terms of the contents of set nBorrower:category and the defaults in set DBorrower : grade.
Defaults d1 , d2 and d3 become all applicable, and their application leads to conclude the following
aggregation facts as members of nBorrower:grade :
aggrSkolemBorrower; borrowerId; b1 ; Borrower; grade; Good;
aggrSkolemBorrower; borrowerId; b2 ; Borrower; grade; Standard;
aggrSkolemBorrower; borrowerId; b3 ; Borrower; grade; Standard;
aggrSkolemBorrower; borrowerId; b4 ; Borrower; grade; Poor:
The order of application is irrelevant in this case because the preference relation among rules is
empty. Because there is an ordering relation between levels in a dimension, we can assume a linear
order such that we can number levels accordingly.
Theorem 1. There exists at least one prioritized extension n of each theory nli1 ; Dd : li w.r.t. d ,
with ni1 and Dd : li defined as above.
236 M.M. Espil, A.A. Vaisman / Data & Knowledge Engineering 45 (2003) 225256
Proof. A propositional prioritized normal default theory has always an extension if its set of rules
is nite [10]. Then, we need to prove that Dd : li is nite. As we have dened before, Dd is a
nite set of rule schemas. Because Dd : li Dd holds for every i, we only need to prove that
each rule schema in Dd has only a nite number of instances. This follows immediately, since
rule schemas in Dd have only one variable, variable X of sort A. Variable X in rule schemas are
substituted by Skolem terms on constants of sorts D, L, and C. All carrier sets for these sorts have
been dened as nite sets, therefore there can be only a nite number of Skolem terms formed
upon members of these carrier sets.
Theorem 2. The set of all aggregation facts in a prioritized extension n of each theory ni1 ; Dd : li
w.r.t. d is finite.
Proof. We have dened in Section 2.3.1 that an extension of a prioritized theory is a set of
rst order formulae E Inputp, with p a process, that is, a sequence of defaults. Because
members of E Inputp are consequents of defaults in p, or formulae implied by those con-
sequents, the set of all aggregations in a prioritized extension of a theory ni1 ; Dd : li w.r.t. d
is either the consequent of an instance of a default in Dd : l1 or is implied by those con-
sequents. We only need to prove that the consequent of any instance d of a rule schema in
Dd : li entails only a nite number of positive literals, because in Theorem 1 we proved that
Dd : li is nite. The proof is straightforward, since the consequent of any rule schema in Dd : li
is a nite conjunction of literals, with only one positive literal among them, an aggregation
fact.
Thus, all dimension instances have a nite model, the Herbrand interpretation of all fact
aggregation instances, with Skolem terms as abstract facts, that can be proved from the dimension
instance regarded as a theory. A model of a dimension instance is better described in terms of
paths.
Denition 11 (Paths). Let Id be a dimension instance, and M its model. A path p in Id is the
maximal set of fact aggregations of the form aggrp; d; l; c, where c is a coordinate in level l of
dimension d, in the model M of Id, for a xed given Skolem term p.
Example 9. For instance, a path in the dimension instance exhibited in Fig. 2(b) is:
The relationship between paths in our proof-theoretic perspective and relational tabular im-
plementation of dimension instance is immediate. Skolem terms are implemented in dimension
tables by row Ids, with rows implementing paths.
M.M. Espil, A.A. Vaisman / Data & Knowledge Engineering 45 (2003) 225256 237
In our model a dimension instance is regarded as a theory formed upon a set of path axioms,
and a set of predened aggregation rule schemas. In order to tackle the revision problem, that is,
the possibility of partially overriding the eect of the rules in the dimension instance, we provide a
language which allows expressing this overriding as a set of extended redenition rules, such that a
preference relation prioritizes the application of redenition rules over the rules originally dened
in the dimension instance. We call this language IRAH, and the redenition rules IRAH rules. In
this section, we present the languages syntax, and give an intuitive meaning to statements in the
language. In the next section we will formalize the semantics of IRAH statements, that is, we show
how a dimension instance is revised due to the application of IRAH rules.
IRAH is a typed rule language, i.e. variables and constants appearing in IRAH rules are typed.
The types in IRAH are: D for dimension type; L for level type; C for coordinate type. Only
variables of type C are admitted, and must be bound to instance levels. Usual basic types like
integers, booleans, characters, strings, dates, and so on, are also supported. A special symbol
denotes the equality predicate for each type. Analogously, the symbols 6 , and P are given the
usual meaning. A set of strongly typed function signatures is included in the language; and an
implicit set of functions for level descriptors is assumed, in order to admit expressions composed
of references to level attributes. Binding of variables in IRAH must be declared as it will become
clear below. An IRAH statement has a name (a string) and always refers to a dimension name (a
constant of type D). Names and dimensions must be declared explicitly.
The basic constructs in IRAH rules are coordinate expressions and level expressions.
A coordinate expression is an expression of the form l : t, where l is a constant of type L and t is
a term (a variable or a constant) of type C. Some constraints are imposed to coordinate ex-
pressions: constant l appearing in a coordinate expression l : t must be a level dened in the
schema of dimension d; if term t is a constant, it must be a member of set Isetd; l, otherwise, if it
is a variable, this variable can be substituted only by members of set Isetd; l. A coordinate
expression l : t serves therefore as a binding declaration for typing purposes, and, informally,
asserts that a path with term t in level l exists in the model for dimension instance Id.
238 M.M. Espil, A.A. Vaisman / Data & Knowledge Engineering 45 (2003) 225256
Notation. In the examples that follow, for the sake of brevity, we will represent coordinate
variables with letters V ; W . Recall that variables in IRAH must be declared before they can be
used; thus, this distinction is unnecessary.
Example 10. Coordinate expressions for dimension Borrower corresponding to Fig. 2 are:
category: B; borrowerId: b3 ; grade: Standard.
A level expression E is an expression of the form predV :A; t1 ; . . . ; tm , where pred is a predicate
symbol, V is a variable of type C, A is a descriptor of a level l, and t1 ; . . . ; tm , m P 0, are well
formed ground terms of type T1 ; . . . ; Tm , respectively. Thus, a level expression E predV :A;
t1 ; . . . ; tm is bound to level l by variable V . Level expressions dene subsets of the instance set of
level l in the dimension.
Example 11. The level expression V:income 1000, states that income in the borrowerId level
equals 1000. Notice that the variable V appearing at the beginning of the expression must be
declared, otherwise we cannot use it in IRAH.
We can build level formulae from level expressions and propositional connectors ^, _ and : in
the usual manner, provided that all the involved expressions are bound to some level l by the same
variable V . The resulting formula is bound to l by V . For instance, the following expression is a
level formula that is bound to level borrowerId by variable V .
V:income P 1000 ^ V:income 6 1500
Example 12. category: B and borrowerId : V ^ V:income P 1500 are examples of coordi-
nate formulae.
Notice that the second formula contains a coordinate expression borrowerId : V that serves as a
variable binding declaration for V . This declaration constrains the values that substitute V to be
members of IsetBorrower : borrowerId.
A redefinition rule (an IRAH rule) for a dimension d is an expression of the form
B1 l1 ; . . . ; Bk lk =l : c, where l1 ; . . . ; lk , and l are levels in d, such that for all i, 0 < i < k,
li d li1 , and lk d l; B1 l1 ; . . . ; Bk lk are coordinate formulae for levels l1 ; . . . ; lk respectively,
and we call them body formulae; c is a coordinate in Isetd : l, and we call the term l : c the head
formula of the rule.
Because IRAH rules are conceived as redenitions that produce changes on a dimension in-
stance I of some dimension d, the meaning of an IRAH rule must be stated with respect to the
redened dimension instance I 0 . An intuitive meaning might be enunciated as a constraint over
M.M. Espil, A.A. Vaisman / Data & Knowledge Engineering 45 (2003) 225256 239
paths in the redened instance Id0 , as follows: every path p in a modied instance I 0 of dimension
d that contains fact aggregations aggrp; d; l1 ; c1 ; . . . ; aggrp; d; lk ; ck , where cj 2 Isetd; lj , and
cj satises Bj lj ; 81 < j 6 k, should also contain the fact aggregation aggrp; d; l; c.
Example 13. Let us consider again the exception of Example 1, expressed in IRAH as:
borrowerId : b3 =grade : Good.
An intuitive meaning of this rule is that every path p of an instance of dimension Borrower,
once redened by the rule, must satisfy the following: if path p contains fact aggregation
aggrp; Borrower; borrowerId; b3 , then path p should also contain the fact aggregation aggrp;
Borrower; grade; Good.
Example 14. Let us present another example of a rule expressing an exception: Money lent to a
borrower with an income under $28,000 belonging to a category with upper interest rate over 0.5,
should be graded Poor.
This exception acts like a constraint, reducing risks by means of reclassifying low income
borrowers. Fig. 4 shows that borrower b3 is aected by this constraint (i.e. the value of attribute
income in level borrowerId is less than $28,000 and the value of attribute upper in level grade is
greater than 0.5 for the Standard grade, this is, the grade of the category of borrower b3 ). Thus,
the path departing from b3 is altered. The exception in IRAH reads:
borrowerId : V ^ V:income 6 28; 000 ^ category : W ^ W:upper P 0:5=grade : Poor
This rule states intuitively that every path p in a dimension instance of dimension Bor-
rower, once redened by the rule must satisfy the following: if p contains fact aggregations
aggrp; Borrower; borrowerId; V and aggrp, Borrower; category; W, such that V :income 6 28; 000
and W :upper P 0:5, then p should contain fact aggregation aggrp; Borrower; grade; Poor.
It could be the case that contradictory exceptions occur over an instance of a dimension. For
instance, let us assume two exceptions holding over the instance of dimension Borrower: the
exception in Example 1, and the following one: a borrower with income between $20,000 and
$30,000 should be graded Standard . If we assume that the income of b3 is between $20,000 and
$30,000, a path departing from b3 matches the latter exception but also matches the former one.
This situation may lead to grade money lent to borrower b3 in two dierent ways. Therefore,
uncertainty is present and two approaches could be followed: a credulous approach or a skeptical
approach. A credulous approach leads to consider alternative hierarchies, an undesired result. A
skeptical approach prevents grading those loans. We have chosen the second approach, yielding
the hierarchy shown in Fig. 5 for the previous example. Notice that the path departing from b3 is
now undecided in level grade; thus, no fact aggregation for level grade must be present in that
path.
4. Semantics of revision
The meaning of the IRAH rules presented in the previous sections, although intuitive, does not
dene precisely how a dimension instance should be modied in the presence of IRAH rules. We
M.M. Espil, A.A. Vaisman / Data & Knowledge Engineering 45 (2003) 225256 241
will capture such meaning augmenting the theories introduced in Denition 9 with new rules and
priorities among them.
As it was the case with rollups we need to interpret IRAH rules as defaults, in order to in-
corporate them in the theory of a dimension instance. The algorithm below produces a mapping
from an IRAH rule to a default rule schema.
(1) Rename all variables of type C appearing in a rule, preventing the use of the same variable in
dierent formulae in the body. Variable X is reserved and not used in the renaming process.
(2) Replace every coordinate expression of the form d : l : t appearing in the rule, by an atom of
the form aggrX ; d; l; t.
(3) Connect with ^ every coordinate formula appearing in the body of the rule, generating the
prerequisite of the resulting default rule schema.
(4) Take the head of the rule, transformed in step 2, build the uniqueness guarantee ld:l:c , and
connect both formulae with ^, producing the consequence of the resulting default rule
schema.
Defaults resulting from the algorithm presented above are also aggregation rule schemas. We call
them alternatively l-redenition rule schemas, in order to distinguish them from defaults resulting
from mapping rollups to default rule schemas (Section 2.4).
Example 15. Let us consider again the rule Money lent to borrower b3 must be graded Good.
The grade-redenition rule is:
4.2. Revisions
We have shown above how each rule in an IRAH statement can be mapped S to a normal default
rule schema, Let d be a dimension, Id an instance of d, and let Rd i:1::m di be the union of all
redenition rule schemas di for an IRAH statement named R.
242 M.M. Espil, A.A. Vaisman / Data & Knowledge Engineering 45 (2003) 225256
Denition 12 (Preference). Every instance of an l-redenition rule schema resulting from the
mapping of an IRAH rule must precede all instances of an l-aggregation rule schema resulting
from a mapping of rollups.
Let us denote d the preference relation of Denition 12. Moreover, Denition 12 implies
empty preference relations d for unrevised dimension instances, because no redenition rule
schemas exist in this case.
Denition 13 (Revision). We call a parameterized function qR , with parameter R, that maps di-
mension instances of the form Id P d; Dd; d to dimension instances of the form
0
Id P d; Dd [ Rd; d a revision under R for dimension d.
Example 16. Let us come back to the instance in Fig. 2(b). The following aggregation rule schema
(call it default d1 is a member of set DBorrower : grade:
aggrX ; Borrower; category; B
aggrX ; Borrower; grade; Standard ^ lBorrower:grade:Standard
We want to revise the instance with the IRAH rule of Example 13:
borrowerId : b3 =grade : Good
Recall that the resulting grade-redenition rule is (call it default d2 :
aggrX ; Borrower; borrowerId; b3
aggrX ; Borrower; grade; Good ^ lBorrower:grade:Good
Rule schema d2 is then added to set DBorrower : grade, and a pair d2 ; d1 is added to preference
relation Borrower .
Suppose now that all fact aggregations have already been proved for levels borrowerId and
category. Thus, the following aggregation facts are already proved:
aggrSkolemBorrower; borrowerId; b3 ; Borrower; borrowerId; b3 ;
aggrSkolemBorrower; borrowerId; b3 ; Borrower; category; B
and both aggregation facts are the only members of nBorrower:category for path SkolemBorrower;
borrowerId; b3 .
Default d2 is applied before d1 , because it is the minimal default with respect to Borrower that is a
member of DBorrower : grade and is applicable to nBorrower:category . Thus, the following is con-
cluded:
M.M. Espil, A.A. Vaisman / Data & Knowledge Engineering 45 (2003) 225256 243
The semantics for dimension instances (Subsection 9), and their corresponding models
are not aected by the denition of revisions. Revisions simply augment the current logical
theory.
Theorems 1 and 2 are still valid, although we need to deal with the case of coordinate variables
in redenition rule schemas (which is not a problem given that the carrier set for sort C has
been dened as nite).
When multiple revisions apply over an original dimension instance, the preference relation al-
ways prefers redenition rule schemas over ordinary aggregation rule schemas, but it does not
make any distinction between redenition rule schemas themselves. Series of n successive revi-
sions resulting from the successive application of n IRAH statements to an original dimension
instance, are equivalent to one single revision with as much IRAH rules as rules are in each
IRAH statement.
The previous remark implies that, if two l-redefinition rules are applicable and conict with
each other, no preference is dened between them; thus, no conclusion could be obtained from
them. The intersection of the extensions produced from their alternative application is empty.
In the relational tabular implementation of the dimension instance model, a null value will ap-
pear in the column corresponding to the conicting level of the row representing the revised
path.
The semantics presented in Section 4 interprets dimension instance models in relation aggr.
This interpretation, however, is clearly not a good choice for representing the contents of a di-
mension instance in computational terms. As we have pointed out before, we can do better
representing paths by means of rows in relational tables. For instance, for the revised dimension
of Fig. 3 we would get the contents depicted in the table of Fig. 6.
Algorithm 1 takes a table d representing a dimension instance model and a set of IRAH rules as
input, and builds the revised dimension instance model (a new dimension table) as output. Only
modied paths (rows) are produced. The original contents of the dimension table are not altered.
This choice allows considering multiple hypothetical scenarios as revisions. The algorithm visits
the IRAH rules and minimizes the number of visits to paths in the dimension instance. We
represent levels and identify IRAH rules by means of consecutive non-negative integers. We dene
the following data structures:
244 M.M. Espil, A.A. Vaisman / Data & Knowledge Engineering 45 (2003) 225256
Cond, a two-dimensional sparse array dened over rules and levels. Each non-empty cell
Condi; j points to the coordinate formula for level j (body or head) of rule i.
An array Srules over levels. Each cell Srulesj points to the set of rules with the rst body for-
mula in level j.
An array Erules over levels. Each cell Erulesj points to the set of rules with level j in the head.
A set Paths of candidate revised paths. A path is represented as a row, with a column for each
level in the dimension schema. A Boolean variable IsModied is associated with each column in
a row, indicating whether the coordinate for the level has been modied or not. Rows in set
Paths are allocated dynamically, and the set is implemented as a balanced search tree with a
row Id, a pointer to the row, and a pointer to a set of rules (see below) in the nodes.
A set TargetPaths of objective paths. Paths are represented by rows as in set Paths. No Boolean
variable is needed in this case. The set is again implemented as a balanced search tree.
An array Rpaths over rules. Each cell Rpathsi points to the set of paths satisfying the formula
in the body of rule i.
A set (implemented as a linked list) Prules dened on each path. Each set k.Prules points to the
set of rules i such that path k satises the formulae in the body of rule i.
Algorithm 1
Main
Build data structures from rules;
FOR level j 1 to maxlevels DO
RevisePaths (j);
ActivatePaths (j);
DeletePaths (j);
END FOR
Update dimension instance d with paths in Paths;
RevisePaths (level j)
A f g;
FOR EACH rule i in Erulesj DO
A A [ fcoordinate c in Condi; jg;
TargetPaths fsubpaths p in dimension djp:j 2 Ag
M.M. Espil, A.A. Vaisman / Data & Knowledge Engineering 45 (2003) 225256 245
Algorithm 1 visits the hierarchy of levels in a bottom-up manner. First, head formulae in the
current level are examined, and a revision is performed on the current level in non-conicting
paths. Null values are inserted in conicting paths. Every body formula involving a level is then
examined. If the formula is the rst body formula in a rule, and a path in memory satises the
condition, the path is added to the set of paths potentially satisfying the rule. Paths satisfying the
condition and not present in memory are retrieved from the dimension instance derived from
rollups. If the formula is not the rst body formula in a rule, we unlink from the rule every path
not satisfying it. Finally, the algorithm updates the dimension instance.
Example 17. Let us consider a dimension Employee, with levels emp, unit, group and division, such
that emp unit, unit group, group division, and division All. Fig. 7 shows a table describing
level division. In this table, for the sake of notation simplicity we assume that di is a row identier,
and Di is the corresponding value in the instance set of the dimension level. The remaining values
represent level attributes. The tables in Figs. 8 and 9 describe levels unit and emp, respectively.
The table in Fig. 10 represents an instance of this dimension. Column PathID is the row
identier. In order to show how the algorithm would proceed in an actual implementation, each
value in this table holds the identier of the associated row in a table describing each dimension
level. Thus, divisionId represents the rowId in the table above.
The following IRAH program represents exception rules. Note that conditions are not stated
over row identiers, but over level attributes.
Note also that coordinate expressions like employee : 10 act in fact as a shorthand for coor-
dinate formulas of the form employee : w ^ w:emp 10, where emp is the coordinate value col-
umn of the corresponding level table.
We will now run Algorithm 1 on this input. Initially, the two-dimensional array Cond is lled as
depicted in Fig. 11. The sets Paths, Rpaths and Prules are all empty. Arrays Srules and Erules
have, respectively, the following content:
Srulesemp frule 1; rule 2; rule 5g; Srulesunit frule 4; rule 6g;
Srulesgroup frule 3g; Srulesdivision f g;
Erulesemp f g; Erulesunit f g;
Erulesgroup frule 1; rule 2; rule 4; rule 5g; Erulesdivision frule 3; rule 6g:
In the rst iteration (for level emp), since the set Erules(emp) is empty, procedure RevisePaths does
nothing; ActivatePaths adds new rows to the set Paths. The state of Paths is depicted in Fig. 12.
Fig. 12. The state of the set Paths after the rst iteration.
M.M. Espil, A.A. Vaisman / Data & Knowledge Engineering 45 (2003) 225256 249
Function DeletePaths does nothing because there is no rule not present in Srules(emp) nor in
Erules(emp) such that Cond is non-empty in the column for level emp.
In the iteration for level unit RevisePaths does nothing, because the set Erulesemp is empty.
Then, for each rule i in Srulesunit, ActivatePaths populates a set A of paths in the set Paths,
satisfying Condi; unit. The paths in A are added then to the set Rpath(i). For each of these paths,
the set Prules is also updated. Rpathsrule 4 fp4 ; p5 g; Rpathsrule 6 fp4 ; p5 g; the rest of the
Rpaths sets remain unchanged. Prulesp4 frule 1; rule 4; rule 5; rule 6g; Prulesp5 frule 1;
rule 4; rule 6g.
ActivatePaths repeats the procedure in order to nd a set of paths in the dimension instance d
satisfying Condi; unit, not already present in Paths, for each of the rules i in Srulesunit. In this
case there are no new paths in the dimension instance to be considered; thus, no change is applied
to Rpaths, Prules, or Paths.
Rules 1 and 5 are neither present in Srulesunit nor in Erules(unit), and Cond is non-empty in
column unit for these rules. Thus, DeletePaths computes, for rule 1, a set A of paths in the set
RPathsrule 1 not satisfying Condrule 1; unit. Paths p4 , and p5 belong to A. Then, they are re-
moved from Rpaths(rule 1), and rule 1 is removed from Prulesp4 and Prulesp5 . Because neither
Prulesp4 nor Prulesp5 become empty the set Paths remains unchanged. DeletePaths proceeds in
the same fashion with rule 5. As paths p4 and p5 satisfy Condrule 5; unit, DeletePaths does not
produce any change in this case. Thus, we have: Rpathsrule 1 fp1 g; the rest of the Rpaths sets
remain unchanged. Prulesp4 frule 4; rule 5; rule 6g; Prulesp5 frule 4; rule 6g; the other
sets remain the same.
In the iteration for level group, the set Erules for group is not empty. Then, RevisePaths retrieves
all paths in the dimension instance such that the coordinate value in level group matches a co-
ordinate value in the head of some of these rules. As group division, a projection over groupId
and divisionId is performed. The set TargetPaths is fhg2 ; d1 i; hg3 ; d2 ig.
After this, RevisePaths applies the head of each rule i in Erules(group) to every path in
Rpaths(rule i) as follows: (a) Rule 1 modies path p1 , merging it with path hg2 ; d1 i in TargetPaths
(path p1 matches rule 1); (b) Rule 2 modies path p10 by merging it with path hg3 ; d2 i in Tar-
getPaths (notice that path p10 mathches rule 2).
Paths p4 and p5 match rule 4. The algorithm modies path p5 , merging it with path hg2 ; d1 i in
TargetPaths, and invokes subprogram TestConflict nding out a contradiction between rules 4
and 5 with respect to path p4 , solving this conict assigning null values from level group up in the
hierarchy.
The algorithm proceeds analogously with path p5 .
ActivatePaths then looks for rules in Srules(group). Only rule 3 is present in Srules(group). The
set A of paths in set Paths satisfying Condrule 3; group is then considered. All paths in A are
added to Rpaths(rule 3). In our example, only path p10 satises Condrule 3; group; thus, p10 is
250 M.M. Espil, A.A. Vaisman / Data & Knowledge Engineering 45 (2003) 225256
added to Rpaths(rule 3), and rule 3 is added to Prulesp10 . It is interesting to note that p10 did not
satisfy Condrule 3; group before RevisePaths changed its value in column groupID to g3 .
The same procedure is repeated retrieving the set of paths in the dimension instance satisfying
Cond(rule 3, group) and not present in Paths. Only path p8 is retrieved and added to Paths and
Rpaths(rule 3). Also, Rule 3 is added to Prulesp8 . Finally, Rpathsrule 3 fp8 ; p10 g; Prulesp8
rule 3, Prulesp10 frule 2; rule 3g.
Paths has the contents shown in Fig. 13 (the old values are shown between parentheses). Again,
DeletePaths(group) performs no action.
Finally, in the iteration for division, the set Erules for division is not empty. Thus, RevisePaths
retrieves all paths in the dimension instance matching the value in the heads of rules 3 and 6. The
set TargetPaths is fhd1 i; hd2 ig. RevisePaths matches the head of rule 3 with every path in
Rpathsrule 3, and the head of rule 6 with every path in Rpathsrule 6; rule 3 modies paths p8
and p10 , merging them with path hd1i in TargetPaths. No conict arises here. Note that this is the
second revision produced on the value in column divisionID of path p10 . Rule 6 modies paths p4
and p5 , merging them with path hd2 i in TargetPaths. Again, no conict occurs. Notice that path p4
updates the null value in column divisionID, set in the previous iteration, to d2 . This situation shows
how an exception can reintroduce certainty.
Because Srules(division) is empty, ActivatePaths does nothing. The nal state of set Paths, this
is, the output of the algorithm, is shown in the table in Fig. 14.
Fig. 13. The state of the set Paths after the iteration for group.
Let us dene the following metrics for the input of Algorithm 1: R, the number of rules in the
program; L, the number or levels in the dimension; P , the number of paths in the dimension
instance.
It is straightforward that Algorithm 1 lies within PSPACE, because the revised dimension
instance (the output of the algorithm) only works with data from the input instance and constants
from the IRAH program. Further analysis of the size of the data structure allows concluding that
the algorithm requires the following spaces: OR L for arrays Cond and Srules; OP for Paths
and TargetPaths; OR P , for Rpaths and PRules. With respect to the local space for each
subroutine, we need: OL for set A in RevisePaths; OP for A in ActivatePaths and DeletePaths.
As P always dominates L, it is easy to see that the space complexity of the algorithm is OR P .
Let us study the time complexity of Algorithm 1. First, function Main iterates over the set of
levels; for each level, Main calls three other routines, yielding a bound for its complexity, given by
OL maxORevisePaths; OActivatePaths; ODeletePaths. As RevisePaths and ActivatePaths
iterate over Erules and Srules respectively, and both arrays of sets have exactly R rules in total, we
propagate the iteration over levels into each subprogram and analyze RevisePaths and Acti-
vatePaths separately. Time complexity for RevisePaths is bounded by OL R R L P
R R P OTestConflict maxOMerge; OPutNulls. Complexity for TestConicts, is
bounded by OR; Merge and PutNulls are bounded by OL; thus, time complexity for Revise-
Paths is bounded by OR P R L, the dominant term. In a normal situation R should
dominate L because L is xed and R is not; thus, the bound for time complexity is given by
OR2 P . Complexity for ActivatePaths is given by OL R R P P P P log P
OR P log P , as P log P dominates L. Complexity for DeletePaths is bounded by OR P L.
Finally, assuming that P dominates L and R, the dominant term for the three routines above is
OActivatePaths, with complexity OR P log P . Then, time complexity of Algorithm 1 lies
within PTIME.
The previous analysis of the time complexity of Algorithm 1 is conservative, and does not
reect the expected behavior of the algorithm in normal cases. In subroutine RevisePaths, the
main cycle (the one beginning with FOR EACH rule i in Erulesj) contributed to time complexity
with the dominant term OR P R L; however, the P in this term is an extremely rough
estimation for the worst-case situation, which depends on the number of paths present in a set of
Rpaths for some rule i. Given that those paths are bound to rule i because at least one condition in
the rule has been satised in the path, and if we consider that a normal case is the denition of an
exception, it is clear that the selectivity of the satised conditions must be high. Fixing an upper
bound (independent of P ) for paths normally satisfying conditions in IRAH rules, the dominant
term may be reduced to OR R L and to OR2 because R normally dominates L as we have
pointed out before. Under this normal assumption the dominant term for RevisePaths should be
OL P . An analogous analysis could be performed for ActivatePaths and DeletePaths, yielding
bounds of OR P and OL R, respectively.
From the considerations above, in a normal situation, we can expect a time complexity bound
of OR P for Algorithm 1. This result proves that the number of visits is proportional to P , the
number of tuples in the dimension table P . This bound can be improved by means of indexing
mechanisms avoiding full scans of the dimension table, reducing the factor P .
252 M.M. Espil, A.A. Vaisman / Data & Knowledge Engineering 45 (2003) 225256
5.2. Discussion
The results on complexity above clearly provide theoretical bounds on the performance of an
algorithm performing a revision of the contents of a dimension instance. The reader may wonder,
however, about the applicability of this approach; that is, when and how frequently revisions
could be eectively applied usefully for OLAP applications. Intuitively, although the time com-
plexity of Algorithm 1 is low, its real impact on response time may be signicant, since it relies
heavily on the size of the dimension instance table to be revised, the selectivity factor of the bodies
of the IRAH rules, and the physical structures supporting the logical model (v.g. indexing). As
nal users in OLAP demand fast response time for their queries, it is unlikely that they will be
directly involved with tools that perform revisions embedded in on-line queries. Nevertheless, the
revision mechanism presented here could be evaluated at a low cost running in the background,
while nal users are querying the data warehouse, because Algorithm 1 actually do not alter the
contents of the dimension instance being revised; it rather produces the exception paths and stores
them separately. The same argument can be applied with respect to cube revisions [13]. Queries
therefore may refer to the original contents of a dimension instance or to the contents of instances
produced by dierent revisions. This constitutes a rst setting where our approach can be applied.
A second setting, in fact the more promising, considers the revision mechanism presented here
as a maintenance tool to be applied when exceptions appear and a redesign of the dimension
hierarchy becomes necessary. Because a redesign may entail re-computing every dimension table
and every data cube present in the warehouse, exceptional situations like the ones described, for
instance, in Section 1 would result extremely expensive to deal with, in terms of the time needed
for data downloading and reloading. Our approach can be applied instead, and the benet in this
case would be unquestionable. Moreover, as it was pointed out before, the analyst could issue
queries during the entire revision process.
In summary, the revision mechanism presented here can be applied as a tool for maintenance of
the contents of the warehouse and, running in the background, as a device for providing hypo-
thetical scenarios. On the contrary, it would not perform well when embedded in an OLAP nal
user tool.
6. Related work
Several techniques have been proposed for dealing with irregular hierarchies, like many-to-
many dimensions [14] and multiple hierarchies [15]. The former technique assigns probabilistic
weights to rollup alternatives yielding probabilistic aggregations. The latter deals with alternatives
in rollups, namely alternative dimension instances. The approach here is strictly operational
(algebraic), and no priorities are imposed among alternatives. Niemi et al. [16] present a technique
dealing with hierarchies with shortcuts, called non-transitive rollups. This approach is the closest
to ours. However, the study applies only to hierarchies from a design point of view. They address
incomplete hierarchies, propose adding not known values to level instances and, if exceptions
occur, propagate this values up in the hierarchy. The meaning of these not known values is not
clearly dened. Our approach uses null values to rule-out possible contradictions. Hurtado and
Mendelzon [17] discuss hierarchies that lead to heterogeneous aggregation. Irregular hierarchies
M.M. Espil, A.A. Vaisman / Data & Knowledge Engineering 45 (2003) 225256 253
can also be found in works on hypothetical queries in OLAP [18] and multiple scenarios [3]. While
these works deal with what if sort of queries and consider queries rather than revision of rollup
functions, our work incorporates revisions from the beginning, with the rollup function concept
embedded.
The works cited above deal with irregular hierarchies produced at design time. To our
knowledge, at this time our work is the rst one that deals with the problem of revision (an
operation that turns regular hierarchy instances into irregular ones), after the design has been
made (i.e. at production time).
Previous work has studied dimension updates. A set of basic and complex operators has been
proposed [6,7] in order to update dimension hierarchies. These operators, however, cannot cap-
ture exceptions, as they are strongly based on rollup functions that lead to homogenous aggre-
gation [19].
The model for multidimensional data we present here follows from a conceptual point of view,
the work of Abello et al. [4,20]. Nevertheless, in these works cube cells are considered complex
facts and therefore there is not a clear distinction between facts and cells. In other words, in-
formation contents and information representation are not clearly separated. As we have pointed
out in Section 2, the notion of rollups as members of a dimension instance has been brought out
from the work of Cabibbo and Torlone [21], and Hurtado et al. [6,7].
Regarding our proof-theoretic approach for modeling, we have chosen representing rollups
and IRAH rules as normal default schemas (rules in IRAH are, as it was pointed out before,
restricted forms of default rule schemas). Other formalisms could be used instead. For instance,
datalog with negation, provided we accept Skolem terms as arguments of predicates, would
produce a notion of model for dimension instances akin to ours, although equivalence referring to
our complete notion of dimension instance regarded as a theory do not apply. Similar arguments
can be applied to defeasible logic [22], logic programming without negation as failure [23], and
logic programming with exceptions [24]. They all constitute alternatives for representing rollups
and IRAH rules in this context. However, our choice of default theories aims at giving a more
general framework, in the sense that it can be more easily extended in order to support con-
straints, intra-reasoning on priorities [25], and declarations of aggregation functions based on the
concept of iterators [26]. Using default logic as a framework we gain in expressive power and
extensibility, because default logic is strictly more expressive than the other cited logics. The se-
mantics for models of dimension instances presented in this work are close to the notion of
stratied default logic extensions [27,28]. A dimension instance model can be viewed, under this
notion, as an extension of a stratied default theory, skeptical at the level of strata.
As a nal comment, the choice of normal default rules for representing rules in IRAH could be
easily modied choosing semi-normal default rules [29]. We must simply omit uniqueness guar-
antees in consequents, in order to avoid loss in meaning. However, if we want to extend the se-
mantics and also consider aggregating negative facts, our semantics applied to semi-normal
theories fails to provide the same meaning.
Belief revision has been a topic of active research in the area of non-monotonic reasoning for
Articial Intelligence. Formally, a revision is a function that maps an epistemic state (a logic
theory or set of logic sentences) and a newly acquired belief (a sentence), to a new revised epis-
temic state. For a long time classical belief revision has been accepted by the community as
satisfying the well known eight AGM postulates of minimal change [30]. Recently, however, some
254 M.M. Espil, A.A. Vaisman / Data & Knowledge Engineering 45 (2003) 225256
of these postulates have been subject of controversy. Boutelier et al. [31] argue against the pos-
tulates dealing with inconsistent theories. In this work inconsistent sentences are rejected when
revising a theory with them. AGM postulates state that a theory revised by a sentence must always
incorporate the new sentence to the revised theory, whether the sentence is inconsistent or it is not.
More radically dierent approaches were introduced, like the one presented by Brewka [25]. In
Brewkas opinion, the mechanics of revision should not be dierentiated from a process consisting
in giving a new belief to some (non-monotonic-)inference system, and letting the system work with
it, contrasting the given sentence with the current epistemic state. This approach leads to simpler
algorithms, and is applicable to common situations where the revision problem become compu-
tationally tractable. However, AGM postulates are not valid any longer in this setting. An in-
termediate position has been stated in other works [32,33], where the postulates have been
rewritten, in order to consider classical indisputable knowledge as well as defeasible knowledge.
This choice seems reasonable, since defeasible knowledge can always be defeated by new
knowledge. Our approach follows these lines, although we added rules for considering revisions,
while the aforementioned works only add literals for that purpose.
In this work we have introduced IRAH, a language of intensional rules allowing redening
dimension instances in order to support exceptions that partially override rollup function com-
position and cancel the eect of rollups in the presence of contradiction.
We have presented a clear semantics for rollups and IRAH rules together, based on a priori-
tized default logic theory, as well as a model for the underlying theory has been dened. The
default logic framework emphasizes the non-monotonic nature of the aggregation process in
OLAP.
We have also introduced an algorithm that computes the revised dimension instance, given
rollup extensions and rules in the form of an IRAH program. We have proved that the algorithms
complexity lies within PTIME. Moreover, we have shown that, under some realistic assumptions
regarding exceptions, the algorithm indeed behaves linearly on the size of the dimension instance.
A clear future step consists in developing ecient algorithms for revising materialized views
with aggregation, particularly cube views. In this sense, the revision algorithm presented in this
paper clearly identies the coordinate changes in the aggregation paths. Our approach to the
semantics of rollups and rules can be slightly modied, allowing including uncertain and negative
knowledge (three-value models), and the use of negation in the body and in the head of rules,
augmenting the expressive power of rules in IRAH programs. These changes yield an interesting
question: how these kinds of rules, dierent from those presented in this paper, can be exploited in
modeling multidimensional data with constraints. Exploiting priorities for modeling plausibility
on inconsistent data sources is another promising research topic arising from the work presented
here. We are currently working in determining how to use non-monotonic reasoning to express
aggregation. Several classes of aggregate functions can be expressed by means of default rules with
uniqueness guarantees. This line of work, however, lies beyond the scope of this paper.
M.M. Espil, A.A. Vaisman / Data & Knowledge Engineering 45 (2003) 225256 255
References
[26] H. Wang, C. Zaniolo, User-dened aggregates in database languages, in: Proceedings of the 7th International
Workshop on Database Programming Languages, 1999, pp. 4360.
[27] P. Cholewinski, Stratied default theories, Computer Science Logic (1994) 456470.
[28] P. Cholewinski, W. Marek, A. Mikitiuk, M. Truszczynski, Experimenting with nonmonotonic reasoning, in:
International Conference Logic Programming, 1995, pp. 267281.
[29] D. Etherington, Formalizing nonmonotonic reasoning systems, Articial Inteligence 31 (1) (1987) 4185.
[30] C. Alchouron, P. Gardenfors, D. Makinson, On the logic of theory change: Partial meet functions for contraction
and revision, Journal of Symbolic Logic 50 (1985) 510530.
[31] C. Boutelier, N. Friedman J. Halpern, Belief revision with unreliable observations, in: Proceedings of the AAAI,
1998, pp. 127134.
[32] G. Antoniou, D. Billington, G. Govematori, M. Maher, Revising non-monotonic belief sets: the case of defeasible
logic, in: Proceedings of the 23rd German Conference in Articial Intelligence, LNAI, vol. 1701, 1999, pp. 101112.
[33] M. Williams, G. Antoniou, A strategy for revising default theory extensions, in: Proceedings of the 6th
International Conference On Principles of Knowledge Representation and Reasoning (KR), 1998, pp. 2435.
Mauricio Minuto Espil received a degree in Computer Science in 1985 from the University of Buenos Aires. He
became Chief Consultant of several Banking Institutions in Argentina, in the area of enterprise automation,
nancial operations and CRM. In 1988, he was elected Councilor at the Faculty of Natural and Accurate
Sciences of the University of Buenos Aires, being member of the ruling and teaching steering committees.
Since then, he worked in Academia, teaching courses in several Schools and Universities, particularly on
Logics and Databases. Currently he is a Professor at Universidad Catolica Argentina, Universidad Nacional
de La Matanza and Universidad de Belgrano, and advisor of Inter-Cultura, a group with interests in Com-
puter Human Interaction and the Web, from the viewpoint of Social Sciences. He is currently working
towards a Ph.D. degree at the University of Buenos Aires, and his research interests are Data Warehousing,
Data Integration, Non-monotonic Logics, Ontologies and the Semantic Web.
Alejandro A. Vaisman was born in Buenos Aires. He is a Civil Engineer and Computer Scientist, and holds a
Ph.D. in Computer Science from the University of Buenos Aires. He was a visiting researcher at the Uni-
versity of Toronto, Canada, and invited lecturer at the Universidad Politecnica de Madrid, Spain. He has
authored and co-authored several scientic papers presented in major database conferences like ICDE and
VLDB, among others. His research interests are in relational and deductive databases, OLAP and data
warehousing, temporal databases, data mining, and web-based information systems. He has worked in design
and operation of database systems, and he is currently Vice-Dean at the University of Belgrano, Argentina.