Sie sind auf Seite 1von 12

Design and Implementation of an

Extensible Database Management System


Supporting User Defined Data Types and Functions

V. Linnemann, K. Kflspert, P.Dadam, P.Plstor,


Ft. Erbe, A. Kernper*, N. Sildkamp, G. Walch, M. Wallrath

IBM Sclentlflc Center Heidelberg, Tlergartenstrasse 15


D-6900 Heldelberg, West Germany

University of Karlsruhe, FakultM fOr lnformatlk


D-7500 Karlsruhe, West Germany

Abstract written in a conventional programming language, in


our case PASCAL, thus allowing to formulate gen-
eral algorithms. One important aspect is the inter-
Current query languages for relational databases
usually are fixed, i.e. they provide only a fixed set face between the data types of the DBMS on the
one side and the data types of the programming
of data types and operations. It is usually not pos-
language on the other side. In our case, this map-
sible to extend this set by user defined data types
ping is more complicated than in other approaches
or functions. This is a major drawback especially in
because our type system supports complex objects
advanced applications like engineering applications
directly and not via long strings as other authors
or office automation. In these areas special data
do. Moreover, we use the PASCAL type system to
types and special functions are needed quite fre-
a large extend in order to allow type checking at
quently, e.g. a data type for matrices and a function
compile time.
for matrix multiplication. Since matrices and matrix
multiplication are not provided in conventional
query languages, the user has to model matrices
by low level constructs as, for example, byte 1. Introduction
strings, and to write a rather cumbersome applica-
tion program in a conventional programming lan- Current database management systems (DBMSs)
guage for interpreting these byte strings as matri- and their database languages only offer a fixed set
ces and for multiplying them. Another example of of data types and operations. Whenever the set of
a missing function is even as simple as the square data types supported by the DBMS is insufficient for
root function. Therefore, a mechanism is needed a given application, the system has to be misused
that allows the user to define his own data types in some way to handle the new type of data. Ex-
and functions and add them somehow to the DBMS amples of that kind are large numerical vectors and
such that they can be used within the query lan- matrices, long texts, geometrical data, image data,
guage in the same way as a normal built-in function etc. In such cases the DBMS is used just as a byte
on basic data types. This paper describes an ex- container. As a consequence, search predicates
tension mechanism for data types and functions on the contents of these fields are usually not sup-
that has been implemented at the IBM Scientific ported. The manipulation of these attribute values
Center in Heidelberg. The mechanism is based by the DBMSs DML can only be performed in a
upon HDBL, an SQL based query language for very rudimentary way. Moreover, a high depend-
complex objects. The functions themselves are ency between the physical data representation and

The work described in this paper was done within


Permission to copy without fee all or part of this material is the R2D2 (A Relational Robotics Database System
granted ~ovided that the copies are not made or distrhted for with Extensible Data Types) project. R2D2 is a
direct commercial advantage, the VLDB copyright notice Md cooperaNon project (started in 1986) between
the title of the publication and its date appear. Imd notice is given the IBM Scientific Center Heidelberg and the
chat copying is by permission of the Very Large Data Base University of Karlsruhe. FakuM filr lnformatik
Endowment. To copy otherwise. or to republish. raquires a fee
and/or special permission from the Endowment.

Proceedingsof the 14th VLDB Conference


Los Angeles, California 1988 294
the application programs is again re-established. - The PETERLEE RELATIONAL TEST VEHICLE (PRTV
To lessen this kind of dependency was one of the /To76/), which is known as one of the first running
major reasons why DBMSs have been developed. prototypes of a relational DBMS, had already a
simple mechanism for so-called user extensions:
Several ongoing research projects attempt to over-
The user could provide his own procedures (written
come limitations of current DBMSs by more pow- in PUI) which could then be used in query state-
erful data models. Some of them support a richer ments and called by the DBMS at run time. Since
set of basic data structures based on or influenced PRTV tables were always in first normal form (INF),
by nested relations or by variants of the entity-re- complex (hierarchical) data structures as procedure
lationship model (/HL82, Da87, DaK686, La84, input and output could not be processed.
RKB85, SS86, PaSc87, Ha87, AB84, Di86, VKC86/).
Others try to solve the problem by staying with a Galileo /AC085/ is a strongly-typed, interactive
more rigid basic data model but by supporting conceptual language for database applications de-
some kind of dynamic references. This is done in signed, among others, to support the abstraction
POSTGRES /RoSt87, St87, St86a, St86b, St84/ by in- mechanisms of modern programming languages.
troducing procedures as attribute values. These The main contributions of Galileo are a flexible type
procedures consist, among others, of POSTQUEL system, the inclusion of type hierarchies and a me-
statements, the database language of POSTGRES. chanism to support abstract data types.
Procedures as attribute values provide a powerful
extension to the relational data model. In /St87/ it TAXIS /MBW80/ is a language for the design of in-
is shown how procedures as attribute values can teractive information systems. It offers, among oth-
be used to model complex objects. ers, database management facilities which are in-
tegrated into a single language through the con-
Even with more powerful data structures offered, cepts of class, property, and the IS-A relationship.
there will always remain cases, especially in novel
application areas, where some new type of data In addition to procedures consisting of POSTQUEL
cannot adequately be supported. Though some of statements, POSTGRES /RoSt87, St87, St86a, St86b,
the data structures offered by an advanced DBMS St8U also supports procedures written in a con-
may be useful for efficiently storing the data, the venttonal programming language as, for example
search capabilities provided by the DBMS will usu- LISP or C. Moreover, the concept of abstract data
ally be unsatisfactorily. Missing functions are not types is supported by POSTGRES, but only on a
only a problem for strange data types. Assume rather low level as far as the representation of an
for example the square root of a speclttc field or a abstract data type is concerned. The representation
specific group of fields. In a standard query lan- is an unstructured storage area. Only the length of
guage for a relational database, as for example the area is given, i.e. there Is no strong typing as
SQL /Ch76, Ch81, IBM81/, one cannot express this far as the representation of an abstract data type is
problem because square root operations are usu- concerned. This is also the method for passing
ally not provided. The only way for the user is to parameters to functions written in LISP or C /St86a/.
forget about the query language and to write a PROBE /Da87, G087/ distinguishes between entities
rather cumbersome application program in a pro- and functions. Access to the attribute values of an
gramming language. This is, of course, only possi- entity is only provided by invoking the correspond-
ble if an interface to a programming language is Ing function. Functions can be system provided
provided. functions or user detlned functions.
One solution for this specific problem would be, of The STARBURST project /Sch86, LMP87/ is investi-
course, to ask the implementor of the DMBS to add gating, among others, how to design the DBMS ar-
the square root to the query language. But then the chitecture such that storage alternatives for re-
next user may ask for another function, e.g. matrix lations and foreign indexes can be supported.
inversion. It obviously does not make sense to
keep adding functions to the query language from GENESIS /Bat86/ and EXODUS /Ca86/ are, in es-
the very beginning because there will always be sence, soffware engineering tools for configurating
applications which need other functions. What is a DBMS according to a given specification. GEN-
really needed is a mechanism that allows the user ESIS, for example, relies on database components
to specify a new function and to provide appropriate whose interfaces have been standardized in such
a way that they become exchangeable. One goal
interfaces to the query language. That is, to make
the DBMS itself extensible by user defined data of EXODUS is to provide kernel DBMS facilities and
software tools for the semi-automatic generation of
types and operations.
application-specific DBMSs. Under the assumption
Currently, the area of extended data base technol- that in the future there will exist large libraries of
ogy is quite heavily investigated. Some of the work application area oriented data types and respective
reported in the literature shall be reviewed shortly. functions which can be optionally added to a data-

295
base kernel (customization), tools like GENESIS or 2. Static Types Versus Dynamic Types,
EXODUS will be very helpful if not even mandatory
to configure these systems. Abstract Data Types Versus
Extensibility of a DBMS has several aspects. One
Encapsulated Types
is, how to make new data types and functions
If one talks about types and functions, there are two
available to the user. That is, how to reflect them in
main alternatives: The types may be static or dy-
the query language and in the application program namic, i.e. the types of the parameters and the re-
interface. Another aspect is how to implement sult of a function may be statically known, or they
these functions. That is, how to program them may vary dynamically. In POSTGRES lRoSt87, St87,
(what is the reference basis?), how to plug them StaGa, St86b/ , even the type of a tuple in a table
into the system, and how to actually execute them may vary from tuple to tuple. This results from the
at run-time. A third aspect is how to support also fact that each function stored in an attribute value
user defined indexes within the DBMS, how to may produce a value of an arbitrary type. Opposed
evaluate them during query optimization and exe- to a normal attribute, the structure (value type) is
cution, and how to integrate them into the systems therefore not known prior to the access to the attri-
concurrency control and recovery mechanisms. bute value and to the execution of it (that is the
function/procedure it contains). This approach
In the R*D* project we are currently mainly con- provides a lot of flexibility. On the other hand, only
centrating on the first two issues. In /KLW87/ the dynamic type checking can be provided, i.e. type
concept of abstract data types on top of nested re- errors show up only at run time. To write an appli-
lations is described. Our paper describes the ex- cation program for processing tables with tuples of
tensibility of the underlying DBMS by user defined unpredictable types is rather difficult and error
data types and functions and how they are reflected prone. Therefore, we think that for the standard
in its query language. The functions themselves user more secure mechanisms should be provided.
are written in a conventional programming lan- Moreover, optimization is easier if types are known.
guage, in our case in PASCAL, to allow for general
Improved security and efficiency can be achieved
algorithms. The underlying DBMS is a further de- by binding functions to static types. On the data-
~ velopment of the Advanced information Manage- base programming language side, probably the
ment Prototype, called AIM-P in the sequel for most significant contributions supporting static
short. AIM-P is an experimental DBMS developed types were PASCAL/R /Schm77/ and Galileo
at the IBM Heidelberg Scientific Center since 1983 /ACOBS/. By using static types, the result type of a
for application oriented research purposes in ad- function can be determined respectively derived at
vanced application areas (cf. e.g. /DaK686, KDG87, function definition time. Thus it can be described
Lu84, Lu85, Pi87/). AIM-P has been extended ac- in the catalog (cf. Sect. 5.1.2). By doing so, the data
cording to R*D*s needs. The link between AIM-Ps structures returned when executing a function are
database language and a user defined function is already known at compile time of the application
provided by mapping the data model of AIM-P to program. Hence, there are no surprises at exe-
appropriate PASCAL structures and vice versa. It cution time.
should be noted that the approach is not restricted For these reasons we have decided for R*D* to bind
to PASCAL. Any programming language which the functions with respect to their parameters and
supports static types could be used as well, for ex- return values to static types. Therefore, only type
ample MODULA /Wi83/. compatible attribute values, constants, and query
expressions can be passed as actual parameter to
The paper is organized as follows: Section 2 dis-
these functions.
cusses possible alternatives for adding types and
functions to a DBMS by concentrating on the alter- In R*D* functions are not limited to basic data types
natives: static types versus dynamic types. More- like integer, real, string, etc. A function can be de-
over, the relationship between abstract data types fined on any data structure supported by AIM-P;
and so called encapsulated types is discussed. even a complete table as data type is allowed.
Section 3 recalls some database language con- Therefore more emphasis than in the flat table case
structs which are necessary for understanding sec- had to be put on providing a reasonable basis for
tion 4 which in turn is the central part of the paper. the implementation of these functions.
It describes by examples the function extension At this point some comments should be made on
mechanism we have implemented. Implementation abstract data types. We feel that the database ker-
details are discussed in section 5. Section 6 gives nel should provide more than pure abstract data
some conclusions and an outlook for future work. types. Binding of functions to only one type is too
narrow because there are applications where a

296
function belongs to two or more types. Consider, lue) can again be of any of the types listed above.
for example, the problem of converting the value of That is, the attributes of a tuple valued object, for
one abstract data type to a value of another ab- example, can be either atomic, or set valued, or list
stract data type. This conversion function belongs valued, or again tuple valued. Objects need not
to both abstract data types. If only pure abstract occur as elements of a table. A list of lists of REAL
data types are supported, the conversion function values (which is a two dimensional matrix) can oc-
has to be added artificially to one of the two ab- cur as element in another list or set or as attribute
stract data types. Therefore, we decided to directly value within a tuple or as a single standing object
support only the information hiding concept of ab- (having an object name). Figure 1.a shows a
stract data types by introducing so called encapsu- graphical representation of this data model; both
lated types. The structure of encapsulated types is the 1NF data model and the pure NF* data model
not known to the user, values of an encapsulated are special cases of this more general data model.
type can be accessed and changed only by appro-
priate functions. Functions can refer to several en- As we will use HDBL statements later on to show
capsulated types. Encapsulated types are similar to the embedding of user defined functions and types,
the concept of hidden types introduced by the pro- we give here a brief introduction into this language.
gramming language MODULA /Wi83/. The concept A comprehensive treatment of this subject can be
of abstract data types is, in a sense, a special case found in /PT88, PA86, Pi87/.
of encapsulated types because an abstract data The following example shows a CREATE statement
type is an encapsulated type together with functions and some simple queries. The example will later
restricted to this type. on also serve as reference basis for the discussion
One of the main goals of R*D* is to provide an en- of user defined data types and functions. To make
vironment where adding of new functions to the explanation not unnecessarily complicated we have
underlying DBMS should be possible without re- selected a rather simple structure. It should be
quiring much database specific knowledge. Espe- clear, however, that HDBL can deal with much more
complex structures and operations on those (pro-
cially, it should not require knowledge about inter-
nals of the underlying DBMS, especially the internal jection, selection, join) as well.
data representation. Every experienced application As an example we use a part of a geographic in-
programmer should be able to program these formation system which allows to store information
functions. In order to make this a safe task, the about specific properties. Each property is defined
functions should be implemented with program- by the boundaries which are given by a list of
ming language structures which represent the cor- points. We can create a corresponding table in
responding data model types as naturally as HDBL as follows:
possible. This means that a tuple, for example, CREATEproperties
should be mapped to a record structure rather than ( [ id: ttring(lO),
to a byte string with offset pointers. owners: { [ name: string(30),
share: real J },
To understand our approach for solving the function points: c [ x-c: real,
implementation problem, a brief explanation of the y-c: real ] > ] }
underlying data model has to be given first. END

Sets are indicated by curly brackets ({...}), tuples


by square brackets ([...I), and lists (ordered sets)
3. Data Model and Language by sharp brackets (< ...>) lALPS88l. Thus, the
properties example represents a set of tuples. Each
The data model supported by the Advanced Infor- tuple has three attributes: The first one, called id,
mation Management Prototype is an object-or- is an identifier of the property. id is a string, i.e. a
iented generalization of Non-First-Normal-Form flat attribute in the conventional sense. The second
(NF*) respectively nested relations. It has an one, owners, represents all the owners of the
SQL-like language interface, the Heidelberg Data property. owners is a set of tuples. Each tuple
Base Language (HDBL) /PT86, PA88, Pi87/. contains the name of an owner and the percentage
The obJect types HDBL can deal with are: of the ownership. The third one, called points,
represents the boundary of the property. It is a list
set valued, list valued, tuple valued or atomic. of tuples. Each list element represents a limiting
Atomic data types are: DATE, REAL, INTEGER, point. A list element Is a tuple containing the x and
BOOLEAN, CHARACTER, STRING and SURRO- y coordinates of the limiting point. For example, the
GATE. The elements or attributes of any object type property depicted by the picture
(except for objects consisting just of an atomic va-

297
(394) (784) whereas INSERT Inserts one or several elements
id: 'SQUARE' into an existing set or list. DELETE deletes one ele-
owners: 'Miller, Jim', 50 percent
cl 'Miller, Jane', 50 percent
ment or a whole set or list. Assume, for example,
ho) (798)
that a property share is split into two parts. This can
be expressed by the following two HDBL state-
can be modelled by the following properties table ments:
entry: ASSIGN owner.percentage * 9.5
TO owner.percentage
FROM owner IN p.owners, p IN properties
id { owners } < points > WHERE p.id = 'XYZ' AND owner.name = 'Hr. X'

name share x-c y-c INSERT { [ name: Hr. Y,


percentage: owner.percentage ] )
SQUARE Miller, Jim 5o.e 3.G 4.0 INTO p.owners
Miller, Jane 5e.e 7.8 4.8 FROM p IN properties, ovmer IN p.owners
7.8 El.8 WERE p.id = 'XYZ' AND owner,name = 'Mr. X'
3.8 El.8

In HDBL, the information about properties 1234


4. User Defined Types and Functions
and 5676 can be retrieved by
Although being quite powerful, HDBL does not Al-
SELECT p FROM p IN properties low certain queries. One group of queries involves,
WHERE p.ld = '1234' OR p.id = '5678'
among others, the computation of the transitive
If one is only interested in the owners of the speci- closure of a relation. This problem could be solved
fied properties, one can formulate the following by introducing recursive queries over nested re-
projection: lations /Ll87/. Another group of queries involves the
computation of arbitrary functions involving, among
SELECT [p.owners] FROM p IN properties
WHERE p.id - '1234' OR p,id - '5678'
others, mathematical expressions. Assume, for
example, the query:
If one is interested in all properties such that a Find all properties such that the length of
specific point occurs in the limltlng points, one can the boundary is larger than a certain value.
express this in HDBL as follows:
This query cannot be expressed in current HDBL
SELECT p FROM p IN properties
because it involves looping over all limiting points
WHERE EXISTS (point IN p.points):
point.x-c = 13.7 AN0 point.y-c = 39.8 and the computation of the square root. One sol-
utlon to solve problems of that kind is to use the
In addition to queries, HDBL provides operations for application program interface /ESW67, EW87/ for
changing tables (ASSIGN, INSERT and DELETE). fetching the objects of interest and to perform the
ASSIGN assigns a new value to a specific field computation itself in the application program.

RELATION RELATION
(SET) (SET)

tuples tuples

atomic atomic
VALUES values values

a) HDBL Data Model b) NF* Data Model c) 1NF Data Model


(Extended NF Data Model)

Figure 1. Data Model Comparlaon: HDBL, INF, NP: Terms written in capital letters indicate legal object types.
Objects of these types can be created within the data model with a CREATE statement.

298
Especially for computations which are needed fre- a type boundary is defined to be a list of points. For
quenty, especially if they are needed in various ap- this HDBL type the following PASCAL types would
plications (think, for example, just a square root or be generated:
standard deviation function is missing) this ap- TYPE boundarySR = ARRAY [ 1..65535 ] OF point;
proach is too cumbersome. Hence a mechanism boundary - RECORD
should be provided to make the DBMS itself exten- ACT-ELEH: k.65535;
ALO-ELEH: k.65535;
sible by user defined functions such that they be- val : fboundary$R;
come part of the DBMSs query language. This END;
section describes how this has been achieved in
the Advanced Information Management Prototype These types need some explanations: Since PAS-
CAL like many other programming languages does
For the user, the most obvious solution to the query: not support dynamic arrays, special solutions
Find all properties such that the length of the have to be used to overcome the problems of re-
boundary is larger than a certain value, would be presenting variable long lists or sets. In our exam-
to define a function get-length which computes the ple, a default limit of 65535 is used, since no limit
length of a boundary and then use this function in was given in the declare statement. The compo-
the following HDBL statement: nents ACT-ELEM, ALO-ELEM and val simulate
SELECTp FROMp IN propcrtlcs a dynamic array. In the val component, the list el-
WHEREget-length(p.points) > 123456.7 ements are stored. ACT-ELEM indicates the c&r-
It should be possible to program get-length in a rent length of the list. ALO-ELEM is needed for
storage allocation. This is described in more detail
programming language like PASCAL. One impor-
tant point has to be solved for that end: The world in Sect. 5.3.3. With these types, the properties ta-
of PASCAL types has to be connected to the world ble can now be defined as
of HDBL types, because PASCAL functions like CREATEproperties
{ [ Id: strlng(lO),
get-length require parameters of PASCAL type. In owners: ( [ rmne: string(36),
our approach, this is accomplished as follows: A percentage: real J ),
special DECLARE TYPE statement is added to HDBL points: boundary
which allows the user to define types which can be 13
END
used in CREATE statements or within other DE-
CLARE TYPE statements. Once a type has been From the database point of view this statement is
declared, the system will generate corresponding equivalent to the first CREATE statement, since the
PASCAL representations (type declarations) for this types are not declared to be encapsulated, If a type
type. For example, the statement is declared to be encapsulated, the internals of a
DECLARETYPE point value of such a type are known only to the functions
[ x-c: real, which have a value of such a type as a parameter.
y-c: real ] For example, by the declaration
END
DECLARETYPE boundary < point > ENC END
defines a type point as a tuple with an x coordi-
nate and a y coordinate. The translation of the DE- boundary is declared to be encapsulated, i.e. the
CLARE TYPE statement results in the generation of elements of a boundary list cannot be accessed di-
a corresponding PASCAL type declaration as fol- rectly but only by functions which have a boundary
lows (cf. Sect. 5.3 for the details): as a parameter.
TYPE point - RECORD After having declared the necessary types, we can
x-c: real; introduce our function get length. This is done in
y-c: real
END;
two steps: First, get-length is made known to the
database system by the statement
By the statement DECLAREFUNCTIDNget-length(b: boundary): real
DECLARETYPE boundary < point > END
In a second step, the body of the function is written
in PASCAL by using an auxiliary function
line-length:

299
FUNCTIONline-length(pl,pZ: point): real; TYPE new-owner = RECORD
VAR x,y: real: first-name: string(30);
BEGIN last-name: string(39);
x := p2.x-c - pl.x-c; y 1. pt.y-c - pl.y-ci percentage: real
line-length := SDRT( x*x + y*y ); END;
END;
TYPE old-owner = RECORD
FUNCTIONget_length(b: boundary): real: name: string(3B);
VAR len: real; 1: integer; percentage: real
BEGIN END;
WITH b 00
BEGIN A function for splitting names is made known to the
IF ACT-ELEN <= 2 system by
THEN BEGIN
DECLAREFUNCTIONname-split (old: old-owner): new-ormer
len := 6.0;
error-exit(.... ) /* WRONGDATA: boundary must The corresponding PASCAL implementation can be
have at least 3 points */
sketched as follows:
END
ELSE BEGIN FUNCTIONname-split (old: old-owner): new-owner;
lcn := line~lcngth(valf[ACT~ELEN], valt[l]); VAR result: new-owner;
BEGIN
FOR 1:-l TO ACT ELEN-1 DO result.last~name :- . . . .
len := len + Tine-length(valt[i], valt[i+IJ) result.first-name := . . . .
END result.percentage := old.percentage;
END; name-split := result
get-length := len END:
END;
Now we can create our new table:
This PASCAL program is now compiled by the CREATEnm_properties
PASCAL compiler and added to the database soft- ( [ id: string(lB),
ware. More details are given in Sect. 5. get-length owners: { nrw~ounrr )
points: < [ x-c: real,
can now be used within HDBL whereever a real
y-c: real ] >
value (= value type of the result) is allowed. One
example was already given:
SELECTp FROMp IN properties and convert the old table as follows:
WHEREget-length(p.points) a 123456.7
INSERT
Another example uses get-length for the con- SELECT [ id : old-prop.id,
struction of an attribute value. it gives the proper- owners: SELECTname-split(old-owner)
FROMold-owner IN old-prop.owners,
ties table together with the lengths of the bounda- points: oldgrop.points ]
ries: FROHold-prop IN properties
INTO new_properties
SELECT [ id : p.id, Owners: p.owners, points: p.points,
length: get-length(p.points) ]
FROMp IN properties
5. Implementation of Types and
In the next example, we want to change our prop-
erties table such that the name is split in first name Functions
and last name. For this example, we use another
in the following we will explain how user defined
alternative for defining types, namely a DERIVE data types and functions have actually been impie-
TYPE statement which derives a type from an ex- mented within the DBMS. We address database
isting table. We declare types for the owners as catalog extensions (Sect. 5.1) the run time man-
foiiows: agement of user defined functions and type in-
DECLARETYPE new-owner stances (Sect. 5.2) and - finally - the PASCAL data
[ first-name: string(3B), last-name: string(M), structures chosen to map the HDBL types into
percentage: real ] (Sect. 5.3).
END

DERIVE TYPE old-owner AS o 5.1 Types and Functions in the Database


FROMo IN pr.owners, pr IN properties
Catalog
The system generates the following PASCAL types:
5.1.1 The ObJect Catalog of AIM-P
The database catalog of AIM-P is composed of

300
three parts: the object catalog, the type catalog, and the DBMS; the user defined types must have been
the function catalog. The last two catalogs reflect explicitly declared before they can be used in a
the extensions of our system by types and func- function definition.
tions, the first one was the catalog in the initial
version of AIM-P. Certain attributes in the function catalog, like
function-result-type and parameter-type, are
The object catalog records descriptive information used by the DBMS at DML parsing time to check the
- meta data - about all database objects, i.e. - in the applicability of a given function in a DML statement.
HDBL data model - about sets, lists, tuples, and These type specifications for function input and
scalars (cf. Figure 1). output are also used to trigger implicit type con-
versions (as far as possible) if the actual and formal
The object catalog itself is again an HDBL object (in parameters do not fully match. A parameter-name
fact a table). Each tuple in that table contains some
attribute is also provided to automatically generate
general information about the related database ob-
the corresponding PASCAL function declaration to
ject as a whole (one-object-descrlptlon), such as be used in a function implementation.
its external and internal name, its creator and cre-
ation date, etc. The most important part of the ob- Our type and function concept had to be designed
ject catalog, however, is the attribute description, and implemented such that the DBMS is able to
which is a list of tuples where each tuple keep track of all interdependencies between types,
(one-attribute-descrlptlon) describes exactly one functions, and database objects. These interde-
attribute. Via the attribute description the catalog pendencies can be expressed in the following dia-
manager of the DBMS keeps also track of the par- gram:
ent-child relationships between attributes on differ-
ent levels, i.e. the attribute description reflects the
hierarchical structure within a database object of
arbitrary size and complexity.

5.1.2 Catalog Extensions for Types and Functions


Information about types and functions is maintained r--l&-J
in two separate tables, a type catalog and a func-
tion catalog. t
As explained in Sect. 4, any flat or nested HDBL
structure (list, set, tuple, scalar) may be used in The figure reads as follows: Types may be used in
exactly the same way both in a type declaration the declaration of objects, functions, and (other)
(DECLARE TYPE ... or DERIVE TYPE ...) and in a types. A consequence is that whenever a type shall
database object declaration (CREATE ...). It is be dropped the DBMS must check whether that type
therefore not surprising that the structure of the is still somewhere in use (in objects, functions, or
type catalog is quite similar to that of the object types). We decided to permit type drop operations
catalog. Especially the attribute description is done only if the type to be dropped is not (anymore) in
in exactly the same way. use. An alternative would have been to implicitly
drop also those objects, functions, and types where
For each user defined data type the catalog man- the type to be dropped is currently used by; we be-
ager records, again, the internal and external lieve, however, that for the normal user the effects
name, the creator and creation date, etc. The cor- of such cascading drop operations would be
responding PASCAL data structure is automatically rather unpredictable and therefore extremely un-
derived from the type declaration (see Sect. 5.3) safe.
and could be stored in the type catalog. Moreover,
several use counts are provided to keep track of
a types usage in other types, in functions, and in 5.2 Handling of User Defined Functions
database objects. We will come back to the se- and Type Instances at Run Time
mantics of these use counts at the end of this sub-
section. 5.2.1 Function Dispatcher
In the function catalog the DBMS keeps track of the Obviously, user defined functions cannot be hard-
names and the interface descriptions (parameter wired since neither the function names nor the
names and types) of all user defined functions. Pa- number and kind of parameters are known when
rameters of functions - as well as the function result the DBMS code is written. Extensibility means that
- may either be of basic HDBL types (BOOLEAN, new user deflned functions can be brought into the
INTEGER, REAL, etc.) or of user defined types. The DBMS at any time. The DBMS must be able to call
basic HDBL types are always implicitly known to these functions and to provide the proper set of

301
parameters. It would not be really satisfying if one be transformed into the Pascal representation. This
had to change the DBMS source code every time is both done by the type Instance loader.
when the user defines a new function. A mech-
anism is therefore required in the DBMS to call any Moreover, AIM-P must also be able to unload
function with any set of parameters. We call that (complex) type instances from a Pascal represen-
mechanism the function dispatcher. tation (back) to the database. An example was also
given In Sect. 4: The result data of the name-split
In the following the function dispatcher will be function, which are of type new-owner, are finally
briefly explained by presenting the scenario in inserted into the database. Data transformation
which a function call within a DML statement is ac- from the Pascal representation to the database for-
tually processed by the DBMS: mat and writing to the database are done by the
type Instance unloader.
When the DML statement is parsed the parser
checks whether the specified function exists or not. The type instance loader and unloader shield the
This is done via a look up operation in the function higher level DBMS components from details of type
catalog (Sect. 5.1.2). The parser also checks the instance implementation, such as storage allo-
applicability of the function in the given context of cation, address and pointer representation, Pascal
the DML statement, i.e. whether actual and formal data layout, etc. For the AIM-P query processor
parameters really match in number and type (if ne- there is functionally no difference between
cessary type conversions are prepared). loading/unloading a simple scalar value (e.g. a 4
We INTEGER) on the one hand and
At DML execution time the function must actually loading/unloading a large, complex type instance
be called by the DML evaluator (run time evaluation of size 4 MB on the other hand: Both is done via a
part of the DML processor) and the required pa- single call to the type instance loader/unloader
rameters must be passed. This function call is done which is - in case of data loading - also responsible
indirectly via a call to the function dispatcher. The for storage space allocation.
function dispatcher gets basically three input pa-
rameters, namely the function name, the number The type instance loader and unloader are fully ca-
of parameters and an address vector with one ad- talog driven: They have both the information about
dress for each parameter. an obJects database format (via the ob)ect catalog)
and about its Pascal format (via the type catalog).
The function dispatcher has some internal book- They can thereby - on the fly - perform the neces-
keeping (function address table) to keep track of all sary convemlons between the database format and
function names and function start addresses. the respective Pascal format. These conversions
Whenever a new user defined function is brought are not restricted to atomic data occurrences (e.g.
into the system its function name and interface de- transformation of an INTEGER vector in the data-
scription are not only recorded in the function cat- base to a REAL vector in Pascal and vice versa);
alog, but also in the function address table. other transformations such as LIST (n) * SET (m)
The function dispatcher - written in assembly lan- etc. can be done as well.
guage - takes the given function name to look up
the corresponding function start address and then
5.3 Generation of PASCAL Data Structures
executes this function. The address vector pro-
vided by the caller is passed to the function. for HDBL Types
Once the user defined function has been executed When creating PASCAL data structures for HDBL
the result is returned to the function dispatcher and types, three different strategies are conceivable:
- finally -to the DML evaluator where that result can
be used e.g. as input for another function, for dis- User deflned PASCAL structures: The user who
play at the screen, or for use in predicate evalu- deflnes a new type may specify whatever PASCAL
ation, etc. data structure he would like to see for that type. To
gain efficiency, to save storage space, and to write
compact program code, the user can therefore tai-
5.2.2 The Type Instance Loader/Unloader
lor the PASCAL structure to the operations to be
In order to provide the actual (input) parameter set performed on that structure. A major disadvantage
for user defined functions, AIM-P must be able to of this approach is, however, that no general map-
load (complex) type instances from the database ping mechanism can be provided by the DBMS to
into the respective Pascal structures. In our exam- transform data from the internal database format to
ple in Sect. 4, where an instance of type boundary the PASCAL format and vice versa. The user would
is input for the function get-length, the required have to implement these mapping routines himself,
boundary data - which are a list of point tuples - a task that is both cumbersome and extremely er-
must be read from the database and must thereby ror-prone; incorrect mapping routines could even

302
destroy the database. We therefore did not follow 5.3.1 PASCAL Types for Basic HDBL Types and
that approach. Previously Defined Types
System defined PASCAL structures: One standard For a type declaration
PASCAL equivalent is defined for each HDBL con- DECLARE TYPE user~defined~name any-type-name END
struct, i.e. for sets, lists, tuples, and scalars (cf.
Figure 1). For any user defined type the DBMS can the corresponding PASCAL type declaration is
therefore automatically generate the PASCAL TYPE user-defined-name = any-type-name;
equivalent for that type. The mapping algorithm to
transform data from the internal database format to For example, for the declaration
the PASCAL format and vice versa is also fixed DECLARE TYPE my-own-real REAL END
(catalog driven) and can thus be provided by the
DBMS; no user driven transformation is required. a PASCAL declaration with the contents
An obvious disadvantage of system defined PAS- TYPE my-own-real = REAL:
CAL structures is that the user has no means for
is created.
optimization and customization; the user deflned
functions have to be coded on - and tailored to -the
PASCAL data structures as they are provided by 5.3.2 PASCAL Types for HDBL Tuples
the DBMS. Assume now a HDBL tuple type definition with at-
User customized PASCAL structures: In principle, tribute names a, and type names 1,:
there are again standard PASCAL equivalents for DECLARE TYPE user-defined-name
HDBL types, as in case of system defined PASCAL [ a, : tl, . . . , an : tn ]
structures. To a certain degree, however, the user
may customize these PASCAL structures in a sense A PASCAL record type serves as the programming
that the DBMS offers him a number of different language construct to map a HDBL tuple type into:
PASCAL equivalents for each basic HDBL con-
TYPE user-defined-name- - RECORD
struct. A HDBL list, for instance, could be repres- : t,; . . .; a : t
ented in PASCAL via an array, pointer array, or " n
END;'
linked list implementation. Among these different
choices the user may now select the most appro- Instead of the type names t,, the user may also
priate PASCAL data structures for his personal specify any other DDL construct, e.g. another HDBL
needs. Since customized PASCAL structures are tuple, as the following example of nested tuples il-
not fully user defined but still DBMS controlled, lustrates:
general mapping routines can be provided by the DECLARE TYPE nested-tuples
DBMS and need not be manually coded by the user. [ attribute-l : REAL,
attribute-2 : [ attribute-3 : INTEGER,
The approach of user customized PASCAL struc- attribute-4 : CHAR ] ]
tures seems to be a fairly good compromise be- END
tween user defined PASCAL structures on the one Two PASCAL types are created for the two tuples
side (which require too much manual interaction) in that HDBL type definition:
and system defined PASCAL structures on the other
TYPE nested-tuplesSattribute_2 *
side (which are sometimes not appropriate for the
RECORD
Implementation of specific algorithms). In our sys- attribute-3 : INTEGER;
tem, the last two approaches are supported. Further attribute-4 : CHAR
details on customization can be found in /KKLW87/. END;

Whenever a new type is defined (via DECLARE nested-tuples I


TYPE ... or DERIVE TYPE . ..). an equivalent PASCAL RECORD
declaration is created as well which can then be attribute-l : REAL:
attribute-2 I nested-tuplesSattribute_2
embedded into the source code of PASCAL func-
END:
tions working with that type. Some small examples
for these PASCAL data structures have already Nested-tuples$attribute-2 is a system generated
been shown in Sect. 4 (see e.g. PASCAL types PASCAL type name.
boundary and point).
53.3 PASCAL Types for HDBL Sets and Lists
In the following subsections further examples for
HDBL types and their PASCAL equivalents will be For any HDBL set/list type definition, e.g.
given together with a set of more general rules how DECLARE TYPE user~defined~name cn FIX element-type> END
to create system defined PASCAL data structures DECLARE TYPE user-defined-name cn element-type> END
for given HDBL type definitions. DECLARE TYPE user~defincd~name (n element-type} END

303
where element-type is the name of the element of the function into a separate address space, a
type and n is the maximal (variable length) or ac- solution which has also been suggested for
tual (fixed length) number of elements, the PASCAL POSTGRES. This, however, would cause some ad-
representation looks as follows: ditional performance penalty (task switch). For the
TYPE user-defined-name93 - ARRAY [l . . n] OF clement-type; time being, we execute both user defined functions
and normal DBMS code within the same address
user-defined-name n space. As rather conventional data structures are
RECORD provided to program these functions (no trick pro-
ACT-ECEH : 0 . . n; gramming is required) and as dynamic storage al-
ALO-ELEH : 0 . . n;
location and de-allocation is done via dedicated al-
VAL : Tuser-defined-nomeW
END;
location routines /KKLW87/, this risk seems to be
tolerable.
ACT-ELEM gives the actual number of elements;
ALO-ELEM gives the number of elements for which
storage space has been allocated (ALO-ELEM r Acknowledgement
ACT-ELEM). ALO-ELEM has been introduced since
- in order to save space in main memory -one does
The authors would like to thank A. Blaset, manager of the
not always want to allocate the array in its maxlmal IBM Scientific Center in Heidelberg, for the continuous
length (n) which might be rather large (see also support of the Advanced Information Management Pro-
Sect. 4). In programming languages which directly ject
support arrays of variable length, this construct
could be simplified.
Instead of the type name element-type, the user References
may again specify any other DDL construct, e.g.
another HDBL set or a tuple, thus defining sets of A884 S.Abiteboul, N.Bidoit: Non First Normal Form
Relations: An Algebra Allowing Data Restruc-
sets, sets of tuples, etc. without having to perform turing. Rapports de Recherche No 347, lnstitut
explicit type declarations for the lower level sets de Recherche en lnformatique et en Automa-
and tuples. tique, Rocquencourt, France, Nov. 1984
AC005 A.Albano, L.Cardelli, R.Orsini: Galileo: A
Strongly-Typed, Interactive Conceptual Lan-
6. Status and Conclusions guage, ACM Transactions on Database Sys-
tems, Vol. 10, No. 2, June 1985, pp. 230-260
ALPS88 F.Andersen, V.Linnemann, P.Pistor, N.Sildkamp:
In this paper we have described a mechanism for Advanced Information Management Prototype:
adding user defined data types and functions to a User Manual for the Online Interface of the
DBMS. We have outlined how functions are re- Heidelberg Data Base Language (HDBL) Proto-
flected in the query language, how they are to be type Implementation, Release 2.0, Jan. 1988,
implemented, and how they are executed at run- IBM Scientific Center Heidelberg TN 86.01
time. Moreover, we have described the system ex- Bat86 D.S.Batory et al.: GENESIS: A Reconfigurable
tensions performed in order to support these tasks. Database Management System. Dept. of Comp.
Science, University of Texas at Austin,
Though described for the Advanced Information TR-86-07, March 1986
Management Prototype, the solution is generally Ca86 M.J.Carey, D.J.DeWitt, D.Frank, G.Graefe, M.
applicable. At the time being, only functions written Muralikrishna , J.E.Richardson, E.J.Shekita: The
in a programming language are supported. We Architecture of the EXODUS Extensible DBMS,
have therefore concentrated on those in this paper. Proc. 1986 IEEE Intern. Workshop on Object
We plan, however, to support functions written in ;;i;;ted Database Systems, Pacific Grove, pp.
HDBL as well. -
Ch76 D.D.Chamberlin et al.: SEQUEL2: A Unified Ap-
Because the functions are compiled (machine code) proach to Data Definition, Manipulation and
they are nearly as efficient as comparable standard ;;;trr; IBM Journ. Res. Devel. 20 (1976), pp.
built-in functions though some extra overhead -
caused by in-core data movement and data con- Ch81 D.D.Chamberlin et al.: Support for Repetitive
Transactions and Ad Hoc Queries in System R.
versions has clearly to be paid for supplying the ACM TODS, Vol. 6, No.1, March 1981, pp. 70-94
functions with their parameter values, and to put Da87 U. Dayal, F.Manola, A.Buchman, U. Chakra-
their results back into the DBMSs internal repre- varthy, D.Goldhirsch, SHeiler, J.Orenstein, A.
sentation. Rosenthal : Simplifying Complex Objects: The
PROBE Approach to Modelling and Querying
Supporting compiled instead of interpreted func- Them, lnformatik Fachberichte 136, Springer-
tions certainly increases the risk that a malfunction Verlag 1987, pp. 17-37
of a user provided function may cause the DBMS to DaK(186 P.Dadam, K.KOspert, F.Andersen, H.Blanken,
stop. This risk could be avoided by putting the code R.Erbe, J.GOnauer, V.Lum, P.Pistor, G.Walch: A

304
DBMS Prototype to Support Extended NF2 Re- Organization (Invited Talk), Kyoto, Japan, May
lations: An Integrated View on Flat Tables and 1985, pp. 21-31
Hierarchies, Proc. ACM SIGMOD Conf., Wash- MBW80 J.Mylopoulos, Ph.A.Bernstein, H.K.T.Wong.: A
ington,D.C., 1986, pp. 356-367 Language Facility for Designing Database-ln-
Di88 K.R. Dittrich: Object Oriented Database Sys- tensive Aootications. ACM Trans. on Database
tems: The Notion and the Issues, Proc. 1986 Systems, Vol.5 No.2; June 1980, pp. 185-207
IEEE International Workshop on Object Oriented PA88 P.Pistor, F.Andersen: Designing a Generalized
Database Systems, Pacific Grove, pp. 2-6 NF* Data Model with an SQL-type Language
ESW87 R. Erbe, N. Sildkamp, G. Waich: An Application Interface, Proc. VLDB, Kyoto, Japan, Aug. 1988,
Program interface For A Complex Object Data- pp. 278-285
base, IBM Heidelberg Scientific Center Tech. PaSc87 H.-B.Paui, H.J.Schek, M.H.Scholl, G.Weikum,
Report TR 87.10.008, to appear in Proceedings UDeppisch: Architecture and Implementation
of the 3rd Intern. Conference on Data and of the Darmstadt Kernel System. Proc. SIG-
Knowledge Bases, Jerusalem, 1988 MOD 87, San Francisco, pp. 196-207
Ew87 R. Erbe, G. Waich: An Application Program In- PI87 P. Pistor: The Advanced Information Manage-
terface for an NF* Database Language or How ment Prototype: Architecture and Language
to Transfer Complex Object Data Into an Appli- Interface Overview, 3. Journees de Base
cation Program. IBM Heidelberg Scientific Don&es Avancees, Port Camargue, France,
Center, Tech. Rep. TR 87.04.003, April 1987 May 1987; also: IBM Heidelberg Scientific
GO87 D.Goldhirsch, J.A.Orenstein: Extensibility in the Center Tech. Rep. TR87.06.004, June 1987
PROBE Database System. Data Engineering, PT88 P.Pistor, R.Traunmilller: A Database Language
Vol. 10, No. 2, June 1987, pp. 24-31 for Sets, Lists, and Tables. Information Sys-
Hit87 T.HBrder, K. Meyer-Wegner, B.Mitschang, A. tems Vol. II, No. 4, pp. 323-336
Sikeler PRIMA: a DBMS Prototype Supporting RKBES M.A. Roth, H.F.Korth, D.S.Batory: SQUNF: A
Engineering Applications. Proc. VLDB, Brigh- Query Language for Y 1NF Relational Data-
ton, U.K., September 1987, pp. 433442 bases. Deptm. Comp. Scienc. Univ. of Texas,
HL82 R.L.Haskin, R.A.Lorie: On Extending the Func- Austin, TR-85-19, Sept. 1985
tions of a Relational Database System. Proc. RoSt87 L.A.Rowe, M.Stonebraker: The Postgres Data
SIGMOD 82, Orlando, June 1982, pp. 207-212 Model, Proc. VLDB, Brighton, U.K., Sept. 1987,
IBM81 SQL/Data System, Concepts and Facilities, IBM pp. 83-96
Corporation, GH 245013, Jan. 1981 Sch88 P.Schwarz, W.Chang, J.C.Freytag, G.Lohman,
IBM85 PASCAWS Language Reference Manual, IBM J.McPherson, C.Mohan, H.Pirahesh: Extensi-
Corporation, Program Number: 5798-PNQ, 1985 bility in the Starburst Database System, IBM
KDG87 K.Kaspert, P.Dadam, J.Gilnauer: Cooperative Almaden Research Center, San Jose, Cal., RJ
Object Buffer Management in the Advanced 5211 (54671) 1986, also in Proc. 1986 IEEE In-
Information Management Prototype. Proc. tern. Workshop on Object Oriented Database
VLDB, Brighton, U.K., Sept.1987, pp. 483492 Systems, Pacific Grove, pp. 85-93
KKLW87 A. Kemper, K. K&pert, V. Linnemann, M. Wall- Schm77 J.W.Schmidt: Some High Level Language Con-
rath: Pascal Structures for HDBL Types: structs for Data of Type Relation, ACM Trans-
Layout, Naming Conventions, Storage Allo- actions on Database Systems, Vol. 2, No. 3,
cation, and Usage in Functions. IBM Heidelberg September 1977, pp. 247-261
Scientific Center Tech. Note TN 87.05, Oct. 1987 SS88 H.J.Schek, M.Scholl: The Relational Model
KLW87 A. Kemper, P.C. Lockemann, M.Wallrath: An with Relation-Valued Attributes, Information
Object-Oriented Database System for Engi- Systems 1988, Vol.1 I, No.2, 1986, pp. 137-147
neering Applications. Proc. ACM SIGMOD St84 Stonebraker,M. et al.: Quel as a Data Type,
Conf., San Francisco, May 1987, pp, 299-311 Proc. ACM SIGMOD Conf., Boston, Mass., June
La84 W.Lamersdorf: Recursive Data Models for 1984, pp. 208-214
Non-Conventional Database Applications, Proc. St88a M.Stonebraker: inclusion of New Types in Re-
First Intern. IEEE Conference on Data Engi- lational Data Base Systems, Proc. Second In-
neering, Los Angeles, 1984, pp. 143-150 tern. Conference on Data Engineering, Los An-
Li87 V.Linnemann: Non First Normal Form Relations geles, Feb. 1986, pp. 262-269
and Recursive Queries: An SQL-Based Ap- St88b M.Stonebraker, L.A.Rowe: The Design of Post-
proach, Proc. Third IEEE Conference on Data gres, Proc. ACM SIGMOD Conf., Washington,DC
Engineering, Los Angeles, 1987, pp. 591-598 1986, pp. 340-355
LMP87 B. Lindsay, J.McPherson, H.Pirahesh: A Data St87 MStonebraker, J.Anton, E.Hanson: Extending a
Management Extension Architecture, Proc. Database System with Procedures, ACM Trans.
;9;M;7D 1987, San Francisco, May 1987, pp. on Database Systems, Vol.12, No.3, Sept. 1987.
pp. 350-376
Lu84 V.Lum, P.Dadam, R.Erbe, J.Ganauer, P.Pistor, To78 S.J.P. Todd: The Peterlee Relational Test Vehi-
G.Walch, H.-D.Werner, J.Woodfill: Designing cle - A System Overview, IBM Systems Journal,
DBMS Support for the Temporal Dimension. Vol. 15, No. 4, 1976, pp. 285-308
Proc. SIGMOD 84, Boston, Mass., June 1984, VKC88 P.Valduriez, S.Khoshaftan, G.Copeland: Imple-
pp. 115-130 mentation .-. Techniques of Complex Objects,
Lu85 V.Lum, P.Dadam, R.Erbe, J.Gunauer, P.Pistor, Proc. 12th Intern. Conf. on Very Large Data
G.Waich. H.-D.Werner. J.Woodfrll: Desian of an Bases, Kyoto, Japan, August 1986, pp. 101-I 10
Integrated DBMS to Support Advan& Appli- W83 N.Wirth: Programming in MODUIA-2, Springer
cations. Proc. Int. Conf. on Foundations of Data Verlag 1983

305

Das könnte Ihnen auch gefallen