Integrating Object Persistence To Relational Databases: Sampo Nurmentaus

HELSINKI UNIVERSITY OF TECHNOLOGY Department of Computer Science and Engineering Laboratory of Information Processing Science
Sampo Nurmentaus
Integrating Object Persistence to Relational Databases
Masters Thesis submitted in partial fulllment of the requirements for the degree of Master of Science in Technology.
Espoo, May 01, 2004
Supervisor: Instructor:
Professor Eljas Soisalon-Soininen Professor Eljas Soisalon-Soininen
HELSINKI UNIVERSITY OF TECHNOLOGY

Author: Sampo Nurmentaus
ABSTRACT OF THE MASTERS THESIS
Name of the thesis: Date: Department:
Integrating Object Persistence to Relational Databases May 01, 2004 Department of Computer Science and Engineering Number of pages: 60 Professorship: T-79
Supervisor: Instructor:
Prof. Eljas Soisalon-Soininen Prof. Eljas Soisalon-Soininen
Both object-oriented development and relational data bases are here to stay. They are both mature technologies that are used in a wide range of software projects. We have used both successfully in embedded environment. Both technologies are good at their own elds. Object-oriented methods are good at modelling real world problems and relational databases are practical to store and retrieve data eectively. But integrating these technologies is not a trivial question. There are lot of mismatches ranging from the process to the implementation level. But often relational databases are available for persistence object storage and they might contain data that the application developed must be able to access. There are a lot of cases where a combination of these technologies is required, although there are some mismatches between them. In this thesis we have examined the possibility of integrating these technologies using a persistence layer, a class library that provides reusable tools for storing and restoring C + + objects to and from dierent relational databases. The goal of our solution was to provide extra exibility to application development by reducing the coupling between the application and the database solution used and to provide a reusable solution for persistence questions. We developed a design for a persistence layer together with some implementation tests to answer the questions like: is it worth the eort to develop one and what kind of process changes are required. It was discovered that the development of a persistence layer is a large project that requires a lot of eort. It would have to be reused in several projects to be worth implementing, but in the long run it might be worth the investment in terms of reduced application development and maintenance eort. Keywords: object-orientation, persistence, relational databases, object-relational mapping,agile data, embedded systems, C++, serialization
TEKNILLINEN KORKEAKOULU
Tekij: a Tyon nimi: Pivmr: a a aa a Osasto: Tyon valvoja: Tyon ohjaaja: Sampo Nurmentaus
DIPLOMITYON TIIVISTELMA
Olioiden tallennus relaatiotietokantaan 01.05.2004 Tietotekniikan osasto Prof. Eljas Soisalon-Soininen Prof. Eljas Soisalon-Soininen Sivuja: 60 Professuuri: T-79
Relaatiotietokannat ja olio-ohjelmointi ovat yleisesti kytss olevia vakiintuneita teknologioita. a o a Niit kytetn lhestulkoon kaiken laisissa ohjelmisto jrjestelmis. Meill molemmat teknoloa a aa a a a a giat ovat kytss sulautetussa ympristss. a o a a o a Molemmat teknologiat ovat hyvi kytttarkoiksessaan. Relaatiotietokantoja kytetn sua a o a aa urien tietomarien hallintaan ja olio-ohjelmointia monimutkaisten reaalimaailman ongelmien a mallintamiseen. Niden teknologioiden yhdistminen ei kumminkaan ole mutkatonta, vaan ona a gelmia syntyy niin toteutus, kuin prosessi tasollakin. Usein kumminkin tarve olio-ohjelmoinnin ja relaatiokantojen yhteiskytlle on olemassa. Organisaatioissa on relaatiotietokantoja laajalti a o kytss ja ne sisltvt dataa, johon oliopohjaisen sovelluksen on pstv ksiksi. Usein nit a o a a a a aa a a a a a teknologioita kytetn yhdess, vaikka niiden yhteiskytss on selvi ongelmia. a aa a a o a a Tss tyss on tutkittu olio- ja relaatioteknologioiden yhteiskytt tallennuskerroksen avula a o a a oa la. Tll tarkoitetaan luokka kirjastoa tai ohjelmistokehyst, jonka tarkoitus on tarjota sovela a a lukselle olioiden tallennus palvelua. Tss tyss keskitytn nimenomaan tallennuskerroksen a a o a aa a toteuttamiseen C++ oliosovelluksen ja relaatiotietokantojen vlille. Yksi ptavoitteista oli joustavuuden lisminen sovelluskehitykseen vhentmll kytkent aa aa a a a a aa sovelluksen ja tietokannan vlill ja kehitta uudelleenkytettv ratkaisu olioiden tallena a a a a a tamiseksi. Tyn puitteissa kehitettiin suunnitelma pysyvyyskerroksen toteuttamiseksi ja muutamia testeo j toteutuksesta. Tmn tavoitteena oli arvioida tarvittavaa tymr ja onko saavutetut a a a o aa aa hydyn investoinnin arvoisia. Mys arvioitiin millaisia prosessi muutoksia pysyvyyskerroksen o o kyttnotto aiheuttaisi. a oo Phavainto oli ett pysyvyyskerroksen toteutus on tyls projekti, jossa on paljon huomioitavia aa a oa yksityiskohtia. Pysyvyys kerrosta tulisi voida uudelleen kytt useissa projekteissa, jotta sen a aa toteutus kannattaisi, mutta pidemmll aikavlill se olisi kannattava investointi, helpottuneen a a a a sovellus kehityksen ja yllpidon ansiosta. a Avainsanat: olio-ohlelmointi, pysyvyys, relaatio tietokanta, olio-relaatio kuvaus, ketter tia etomalli 3
Acknowledgements
Hard to believe but it is done. It has required some hard work, but nally I am here. Typing the last part of my thesis and I do feel great relief. But there is also some melancholy in the air. Typing this makes me look over my sholder at the past studies at Helsinki University of Technology. This has surely been the most interesting era in my life this far. I have studied several interesting subjects and learned to know many new people. Spending the next 40 years in that nine-to-fe scene sounds a bit frightening to me so it might very well be that I will continue my studies some day. I wish to thank my supervisor and director professor Eljas Soisalon-Soininen for all the help and advices during the writing process. I would also like to thank Kaj Bjrklund,Baris Boyvat, Ilkka Pelkonen and o Markku Rontu for comments about my thesis and interesting discussions over the persistence questions and Cristoer Von Bundstorf for the help with the language. My gratitude also goes to my parents for the nancial support that make it possible for me to fully concentrade on my work. Finally I would like to thank my lovely ance for tolerating this stress suering e geek on her sofa.
On a sunny spring day Nurmijrvi, May 19, 2004 a
Sampo Nurmentaus
Contents
1 Acknowledgements 2 Introduction 2.1 The Structure of This Document . . . . . . . . . . . . . . . . . . 4 8 9 9 10 11 11 12 13 14 14 . . . . . . . . . . . . . . . . . . . 15 17 17 17 18 18 19 20 20 21 22 22 23 23 24 25 26 26 27 28 28
3 Relational Data Bases 3.1 3.2 3.3 3.4 3.5 3.6 Operations on Relational Databases . . . . . . . . . . . . . . . . Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Referential Integrity . . . . . . . . . . . . . . . . . . . . . . . . . Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cursors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 Object Oriented Development 4.1 4.2 4.3 4.4 Relationships Between Classes Extensibility Full Encapsulation of Persistence Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Object Oriented Frameworks . . . . . . . . . . . . . . . . . . . .
5 The Object-Relational Inpedance Mismatch 5.1 5.2 5.3 5.4 5.5 5.6 Design Time of Relations . . . . . . . . . . . . . . . . . . . . . . Representing Objects as Tables . . . . . . . . . . . . . . . . . . . Object Identier . . . . . . . . . . . . . . . . . . . . . . . . . . . Representing Collections in a Relational Database . . . . . . . . Representing Object Relationships . . . . . . . . . . . . . . . . . Representing Inheritance Hierarchies . . . . . . . . . . . . . . . . 5.6.1 5.6.2 5.6.3 5.6.4 5.7 5.8 5.9 Whole hierarchy in one table . . . . . . . . . . . . . . . . Each concrete class to a table of its own . . . . . . . . . Each class to its own table . . . . . . . . . . . . . . . . . Map inheritance hierarchies to a generic structure . . . .
Comparison of Dierent Mapping Strategies . . . . . . . . . . . . Abstraction of Queries . . . . . . . . . . . . . . . . . . . . . . . . Mapping Query Results to Objects . . . . . . . . . . . . . . . . .
6 Our Embedded System 6.1 6.2 Proxy Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3 6.4
Abstraction of Cursors . . . . . . . . . . . . . . . . . . . . . . . . Multi-object Actions . . . . . . . . . . . . . . . . . . . . . . . . .
28 29 29 30 30 30 31 31 31 32 36 37 38 38 41 42 43 45 46 47 50 52 52 52 54 54 54 54 54
7 Implementation Language 7.1 7.2 Exceptions for Error Handling . . . . . . . . . . . . . . . . . . . . Lack of Reectivity . . . . . . . . . . . . . . . . . . . . . . . . . .
8 Issues with Legacy Data and Applications 8.1 8.2 8.3 Several Persistence Mechanism . . . . . . . . . . . . . . . . . . . Multi-object Actions . . . . . . . . . . . . . . . . . . . . . . . . . Multiple Connections . . . . . . . . . . . . . . . . . . . . . . . . .
9 Requirements for a Persistence Layer 9.1 9.2 Questions we are looking answers for . . . . . . . . . . . . . . . . Goals for Our Implementation . . . . . . . . . . . . . . . . . . . .
10 Our Solution 11 Logical View 11.1 Representing Queries as Objects . . . . . . . . . . . . . . . . . . 12 Development View 13 Process View 14 Data View 15 Scenarios 16 Usaging Persistence Layer from an Application 17 Analyzing Results 18 Alternative Solutions 18.1 Alternatives To A Persistence Layer . . . . . . . . . . . . . . . . 18.2 Object Oriented Data Bases . . . . . . . . . . . . . . . . . . . . . 18.3 Alternatives to Persistence Interface to The Application . . . . . 19 Future Developments 19.1 Management Utility . . . . . . . . . . . . . . . . . . . . . . . . . 19.2 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.3 Database Schema Versioning . . . . . . . . . . . . . . . . . . . . 6
19.4 Fine grained Versioning of Data . . . . . . . . . . . . . . . . . . . 19.5 Storing Temporary State Between Sessions 20 Summary References A Glossary . . . . . . . . . . . .
55 55 56 58 60
Introduction
Object Oriented development is one of the most popular programming paradigms today inuencing both programming tools and methods and development processes. In a modern object oriented development process, data structures in the program tend to reect the problem domain to fullll user requirements as well as possible. On the other hand relational databases are designed to store data that has relatively static structure and to provide fast operations on this data. Traditionally the structure of a relational database used in an application is designed very early in a development process and it is tried to keep xed during development. Relational databases and object oriented development are both used in many dierent applications ranging from embedded systems to large scale business solutions. Both object oriented development and relational databases are good at their own elds, but have completely dierent design goals and principles. Relational databases have their roots in relational algebra whereas Object Oriented development is raised from years of experience in software engineering. Relational databases emphasize on good design decided beforehand where as object oriented development promotes exibility. This makes the combining of these technologies a non trivial problem. One key advance of OO development is exibility, it is relatively easy to adapt an object oriented design 1 to the changing requirements of the problem domain. In practise any application must change to remain useful [4]. Often this exibility is limited by existing database structures stored to relational databases containing huge amounts of data [6, 2]. Combining these techniques is still an interesting question. Often relational databases do exist in organisations and they are very good at handling huge amounts of data. On the other hand, object oriented software development is one of the most promising programming paradigms today. So an eective way to combine these two techniques would give real added value. Many systems that use both technologies do this manually. Queries against relational databases are stored in classes describing the problem domain. This both requires extra development eort and is hard to maintain [1, 6, 2]. A better solution would be to use a persistence layer that reduces the coupling between the database and the class structures of the application [1, 6]. This layer should provide services to retrieve and store classes from a persistence mechanism in question. The actual techniques to store the data of an object should be hidden from the application, so that developers could concentrate on fulllling user requirements. In addition this layer can provide tools for data exports and imports both XML and other data bases as well as data versions to support existing databases when the program is updated. Also undo functionality and object access control can be integrated to the persistence framework. The possibilities are almost unlimited. In the eld of embedded systems application development is faced with many
1 more
specically: a good object oriented design
problems unfamiliar to normal desktop and server environment. Memory, storage space and computing power are limited so extra care should be taken in programming. Often embedded systems have higher expectations on reliability than desktop computers. People are used to reboot their PCs, but this is not the case with a dish washer. Still software components of an embedded system are expected to provide exibility. It is thought to be easier to adapt software system to changing requirements than a completely hardware based solution and this is often the motivation to implement application logic in software. All this makes an embedded system a very interesting and challenging platform for an application developer. Integrating databases to embedded systems is a challenge so reuse of existing solutions is more than welcome. A great opportunity for our persistence layer.
2.1
The Structure of This Document
First we will describe both relational data bases and object oriented development in further detail concentrating on features related to this project in Sections 3 and 4. Next some fundamental dierences between object orientation and relational databases are examined in further detail and also possible solutions to these problems are discussed in Section 5. Also embedded systems are discussed in Section 6 both in general and in our particular case. The limitations set by the embedded environment to the persistence layer are also described and we will present a few solutions for these found in the literature. After this in Section 7 we will discuss a bit about the implementation language we are about to use in our implementation and why we have chosen C + +. A short discussion about pre-existing, legacy, data is given together with ideas about how persistence layer can help in Section 8. The requirements raised from the rst Sections are then summarised together with a few new ones in Section 9. After this in Section 10 we will introduce solutions we have designed and nally we describe how our solution tries to solve dierent problems encountered in Section 17. Then some alternative solutions to the persistence layer are described in Section 18. Finally in Section 19 we make a few remarks about how the development of our persistence layer will go on. In the summary in Section 20 a brief description about what we have achieved is given and things that still remain unsolved are listed.
Relational Data Bases
Relational data bases are an 30 year old concept about abstracting the actual physical layout of a database into relations that can then be managed with a 9
query language. To dene a relation more exactly, see [7] and [16], for an example, we assume the sets S1 , S2 , ..., Sn given. Now an n-tuple with one element for each set forms a record in the database and a set of these n-tuples is called a relation. More mathematically R is a relation on sets S1 , S2 , ..., Sn if and only if R S1 S2 ... Sn . The sets S1 , S2 , ..., Sn are called the domains of the relation. For example if S1 were a set of person names and S2 a set of phone numbers, the relation R(S1 , S2 ) would combine a phone number to a person. Each 2-tuple in the relation R will state that person with this name has this phone number. The primary key of a relation is a subset of its domains with values that uniquely identies a tuple in the relation. In our phone number example a person name would work as a primary key 2 . If a primary key of a relation is present in another relation it is called a foreign key. This way relations can refer to each other to form more complicated data structures. Relations this way form a data model that then can be accessed and modied with a query language. Operations are dened by relational algebra which give mathematical foundation to the database languages used in real world [7]. The most often used language for data operations on relational databases is called structured query language or SQL for short [8]. In practical database therminology relations are often referred to as tables and domains as attributes or columns. Tuples are called rows and a set of elements from a single domain in a relation forms a column. This convention is not a part of relational model, but it provides a more practical view point to the mathematical world of relations. In the following Sections I describe some of the common features of relational databases that should be taken in account in our project.
3.1
Operations on Relational Databases
Dierent operations can be performed against a relational database to retrieve and manipulate data stored in relations. Some of the operations are based on relational algebra, some are included for purely practical reasons. Projection of a relation means that we pick up some of its columns to a result set [33]. If we assume relation R with the columns name and phonenumber. Now operation name (R) produces a relation that has only one column name. When projection is used to limit the number of columns in the result set selection is used to limit the number of rows [33]. Selection returns only rows that fullll given conditions. For example name= RingoStar (R) produces only rows with the name Ringo Star. The cartesian product of two relations means that relations are combined in all possible ways [33]. If we have the relations A and B with both one attribute in it, a and b respectively, the cartesian product of these is a relation with both
2 As clearly can be seen this would never work in real world but serves well as an example here
10
attributes a and b and all the values of A combined to all the values of B. So if the number of rows in the relation A is |A| and respectively for B then |A B| = |A||B| where A B is a notation for the cartesian product of A and B. By itself the cartesian product of relations is rarely useful. Often a more limited operation, a join of two relations is used instead [33]. In a join rows of the relations are combined if they fulll given conditions. For example if we have a phone number relation above and another one, say S, that contains names and street addresses, a natural join of these would combine records with matching names. In addition to natural join, a theta join can be used to join two relations on arbitrary condition. Query languages are also familiar with a notation of an outer join that includes rows from the relation to the result, even if there is no matching row in another relation. All these operations can be performed with SQL commands select and join [8]. Other commands, not based on formal relational algebra, include commands to insert, delete and update rows in a relation [33]. These are represented by SQL commands named logically insert, delete and update [8]. In addition there are also many other commands in SQL related to views,indexes, dening relations etc. The ones with particular inuence on our system are discussed in the fallowing Sections.
3.2
Views
Data stored in a relational database is organised in a database schema that describes the structure of relations. This provides a well organised way to store data, but by no means only one model of the data is the right one for all purposes. We can use relational algebra to produce a new relational model from an existing one providing an additional view to the data [33]. This new model can then be queried with familiar operations. This way views provide kind of stored subquery for queries to come. Almost any operations applicable to relations can also be applied to views. However there are certain limitations when it comes to updating views [33]. In many cases it is impossible for the database management system to gure out how to handle an update to a view. Support for updatable views tend to vary between database vendors. These issues makes views an interesting solution to provide easy, read only access to the database for example third party applications. But when updates are required, the advance of views is quite minimal since the application accessing database through views still has to be aware of the underlaying database schema. One possible solution is to use triggers or stored procedures for updates. This provides more exibility and allows some changes in the database schema without changes to the applications accessing the database.
3.3
Referential Integrity
Let there be relations A and B so that A refers to the relation B by storing the key of B as a foreign key in one of its columns. They are said to maintain 11
referential integrity if it is guaranteed that the row in the relation B that is referred to by a row in the relation A actually exists [33]. In other words, referential integrity means that references between relations are in order. Relational database management systems include techniques to guarantee referential integrity in database schema by foreign key constraints and triggers for example. Referential integrity is also payed attention to at application level. Applications do not allow the user to perform operations that would harm referential integrity and operations that temporarily break the integrity of the database are performed inside transactions. These are discussed further in Section 3.4. Also persistence layers have to take into account referential integrity among objects and thus does jet another checks. One approach, represented in [2], is that multiple layers of these checks only render too much performance penalty. Therefore database integrity checks should be used only as a safety net during development and could be turned o in production use. Of course this does not hold if there are also other applications accessing the database. When using a persistence layer this approach might seem interesting.
3.4
Transactions
If database operations are executed one by one the concurrent access and failures in the applications and the database system may easily generate the database into an inconsistent state. To solve this database operations are grouped into logical sets called transactions. Transaction basically is a set of basic database operations executed on the database [33]. Database management systems are designed to execute transactions with the following properties. Atomicity , the whole transaction is executed or nothing of it is executed. Consistency , if the database is in consistent state before a transaction begins it is still in consistent state when the transaction nishes. Isolation , other processes accessing the database simultaneously do not see the changes made by the transaction until it is nished. Durability , after transaction is nished the changes that it has made are permanently stored in the database. When a transaction is cancelled, either by a database error or by the application, it is said to be rolled back. If the transaction nishes successfully it is committed. When an operation in the middle of the transaction fails, operations already executed in the transaction are rolled back leaving the database to the state it was when the transaction began. If all the operations in the transactions succeed the transaction is committed and after that changes become permanent and visible to the other processes.
12
Transactions are a way to guarantee that a data base does not corrupt due to database or application failures and that processes accessing the database simultaneously always see it in a consistent state. Also persistence layer should have some kind of support for transactions [6, 15, 2, 1]. These can be created implicitly when objects are stored through the persistence layer, but application developers should be able to create them also explicitly when needed. In a persistence layer there is even more than only data involved with transactions [2]. There are also some behavioural aspects as well. For example, undo/redo functionality is an example of object transaction that can be reversed. These are not usually stored to the database, but instead they are transactions inside business objects. Integrating undo/redo functionality is discussed further in Section ??. Also error handling often involves some kind of functionality. Just as collision handling with optimistic locking scheme described in Section 3.5.
3.5
Locking
Most relational database management systems provide ability to lock rows or entire relations while manipulating them. This prevents other processes from accessing them while they are modied by the running process. Depending on the application in question dierent locking strategies supported by the database may be applied. If there is a persistence layer in between, it should also allow this kind of exibility [2, 6]. In [2] a few dierent approaches are described. The rst one is to ignore locking totally. Here no locking is involved when objects are fetched from the database, modied and stored back. In this scenario, if someone modies objects in between, only later update will be visible in the database. A counter approach is to lock records when object is fetched from database and keep it locked until it is stored back. This can also be achieved by keeping transactions active during the whole operation. This prevents collisions but it makes it impossible to anyone to access objects simultaneously. This does not work well for example for interactive applications where records may remain locked for a long time while user is editing the data in them. In between of these two extreme solutions is one where logical time stamp is stored with the object. This time stamp is forwarded always when an object is modied in the database. When an object is stored back it is checked whether the time stamp has changed since the last retrieval. If so someone has updated the object in between. How these situations are then solved is up to application. Dierent solutions from [2] for locking is summarised in table 3.5. However in embedded systems the database often has only one application accessing it at once, so using too state-of-art locking would only introduce a performance penalty. So the need for locking is often dierent for dierent application and thus a general purpose persistence layer should support multiple dierent approaches.
13
Approach Overly mistic Opti-
Description The whole locking issue is ignored. If concurrent access is allowed this will cause collisions but in case of only one application accessing database this strategy works ne. Data is read in and stored inside separate transactions. While the application manipulates data it remains unlocked, but some approach is used to detect collisions and they are then resolved. Data is kept locked while the application is manipulating it. This prevents any sorts of collisions, but reduces concurrency. Especially when there is no way to distinct read and write access to the objects.
Optimistic Locking
Pessimistic Locking
Table 1: Dierent approaches for locking at the application level
3.6
Cursors
To improve eciency of database applications, relational databases often provide a feature called cursors [11]. These are a technique to retrieve data from a database little by little instead of huge amounts at time. For example when a query against a database returns large number of records, fetching them all to the client at once is not an option since it requires a lot of memory and transferring all the records at once causes the query execution to take too long time. To solve this databases introduce a cursor, which means a handle to the result set of query that is stored to the database management system. When a cursor is forwarded a new record of the result set is fetched to the client application. This way the application can iterate through a large result set without wasting too much resources. The same kind of behaviour should be preserved by a persistence layer [6]. Integration of cursors to the object oriented persistence layer is discussed in Section ??.
Object Oriented Development
Today object oriented development is the most promising programming paradigm and it is adopted in many organisations world wide [24]. At the same time when the object oriented programming tools and methods are spreading across the planet software processes are developing. Instead of serial waterfall [24] styled software process, more and more organisations are adopting some kind of exible processes for example, Unied Process [15]. Object oriented development models real world subjects, both concrete and abstract ones, as objects [5]. Objects are instantiations of classes, where class denes the attributes and the behaviour of the objects it represents. All we know about an object is described by its class. Like concrete subjects in the real world and Platos ideas of them [22]. In a class attributes are used to describe what the object knows, the informa-
14
tions bound with it and operations to describe what the object can do, the functionality of it. These are then common to all objects of the class, but they all have their own identity and values for attributes. Classes and thus objects are related to each other in various ways discussed in more detail in Section 4.1. Object Orientation allows software processes to be more iterative and incremental in nature than traditional ones taking full advance of the exibility of this modern paradigm [15, 24, 2]. In practise this means that software is developed, designed and specied little by little by adding a few new features on each iteration. The force driving the development is the user requirements, which are accepted to be vague, a moderate amount of change is natural to these processes. Data models in program model the subjects of the problem domain and provide functionality whose major goal is to fulll some of the user requirements. The nature of object oriented development requires completely new thinking when it comes to data models. Data models are no more something completely static. Instead they must be exible enough to allow some change during development [1]. Impact of this fact to the persistence layer is further discussed in the following sections.
4.1
Relationships Between Classes
In the design of an object oriented program the classes are related to each other in many dierent ways to form larger entities. A class describing a single book is related to a class describing its author, the book class may contain several chapter classes and the book may be a subtype of a more general class describing common properties of books, CDs etc. More exactly these relationships can be categorised as fallows [2]. Inheritance models a is-a relationship between two classes. For example lorry, car and ship can be modelled as subtypes of a vehicle class, which practically means that they all are vehicles. There are also other types of inheritance like private inheritance in C + + or implementation of an interface. Often inheritance is not even described as a relationship between classes, but it means that one class fully represents another. But when it comes to persistence of objects it is a relationship. This subject is discussed in further detail in Section 5.6. Composition means a relation where a class is a component of another, a isa-part-of relationship. For example engine, doors and wheels are components of a car. Aggregation means that one class may be made up of others. It is a kind of weaker form of composition. Both aggregation and composition tend to be asymmetric as an is-a-part-of relationship in the real world. It is also worth noticing that relationships as strong as aggregation and composition have also an eect to the lifetime of the objects. When an object representing a car is destroyed, also the classes describing its components are destroyed. Two classes are said to be associated together when they have access to each other through a pointer etc. This is a weaker form of class relation than the ones listed above. It has no inuence on the lifetime of the objects and it may be that the object at the other end is completely missing. There are also other types of relationships among classes. If an object or a class is passed as a parameter to a method of another class or as a template 15
parameter to a class the classes in question depend on each other. Class can also implement an interface which forms a realisation relationship between the class and the interface. Still this kind of relationships are not interesting from the point of view of the persistency since they are more about the behaviour of objects than data. The number of objects in a relationship is called the multiplicity of the relationship. Both ends of the relationship have multiplicities of their own, which are usually dened during design of the class model [15]. Multiplicity can be dened to be any range of natural numbers, but a few important special cases can be noted. In the case of composition and aggregation, the whole end of the relationship has the multiplicity of exactly one. For example an engine is a part of exactly one car. The multiplicity of the other end can vary. A car can have any number of doors from zero to six. The multiplicity of exactly one denes a referential integrity constrain for the relationship since it forces the object at the other end to exist. Association does not have any limitations like this for multiplicities. These multiplicities are not explicitely expressed by the object oriented program, but are an essential part of the class model and thus the data model rised from it. If a data model is rised from a class model the constrains for referential integrity are based on the multiplicities of the class model [2]. This makes it necessary for the persistence layer to be able to dene and manage multiplicities at some extent. The dependencies of classes have a great impact on the behaviour of the persistence subsystem when it comes to lifetime of objects [2]. When an object representing a car is deleted also the objects representing its parts are deleted. So when an object has aggregation or composition relation to another object, their lifetimes in persistent storage are related. This kind of chained operations set up by one operation are called cascading operations in [2]. Cascading deletes are suggested at least for objects with composite relationship and possibly also for aggregate objects. But it does not work for associations in more general since the multibilities might dier from one-to-many model of composition and aggregation. Other possible cascading operations include cascading reads and saves. For example, when an object representing a car is read into memory also objects representing its parts could be read. The same applies to saves. But these may still vary from one application to another. Cascading creation of objects is not an interesting question since it is traditionally taken care of by constructors of objects. Associations do not generate cascading operations on objects but relations to other classes may have to be updated. For example if a car object has a relation to the object representing its owner and the car object is destroyed the reference to the car object must be removed from the owner object. When objects have an association relationship between them, the object at the other end of the relationship is automatically restored from the database when the association is fallowed by the application.
16
4.2
Full Encapsulation of Persistence Mechanisms
One of the key aspects of object oriented design is that classes have clear responsibilities assigned to them [15]. No class should have responsibilities from several dierent domains in application. The responsibilities of the classes of the problem domain are in that domain and no other responsibilities should be assigned to these classes. Problems related to the databases and persistence are from completely dierent domain and that is why problem domain classes should know nothing about persistence subsystem. The persistence mechanisms like databases should be fully encapsulated from the object to be stored to the persistence system. Classes in the persistence layer should be orthogonal to the classes in problem domain. In practise this may not be perfectly achieved but at least coupling between persistence mechanism and application should be minimised [6]. In [6] it is stated that business domain classes, intended to be persistent, should not inherit from a common superclass implementing persistence since this generates too much coupling between the persistence layer and application classes. On the other hand in [1] a design where all classes stored through the persistence system are inherited from a class P ersistenObject is suggested, which increases coupling between the persistence subsystem and the application classes but gives all persistent objects similar interface to work with. The later is easier to implement but the previous design probably leads into design that is easier to maintain. When programming in C + + the lack of reexivity described in section 7.2 leads to a solution where domain classes still should be aware of the persistence layer so the system can never be fully orthogonal to the domain classes.
4.3
Extensibility
As mentioned above, object oriented programs can be developed little by little. When the program is growing more and more classes get added to it. Also when a program is maintained, class structures in it often get changed. These actions should be allowed by the persistence layer [1, 2]. The database schema should allow enough exibility to allow it to adopt to the new user requirements. These actions can not generally be performed fully automatically, but the persistence layer should provide tools to ease the change in database schemes. This viewpoint is taken into account when comparing dierent mapping strategies in Section 5.
4.4
Object Oriented Frameworks
One key idea of object oriented development is the reuse. In addition to reuse of program code also general ideas of dierent solutions are reused in terms of design patterns. Between reuse of fully implemented class libraries and abstract design patterns lies object oriented frameworks. Dierent applications require dierent characteristics from class libraries so the
17
libraries should be extremely exible. Frameworks try to provide a more general solution, in terms of incomplete class libraries where some of the classes are abstract [18] . These classes have only descriptions of their required functionality which is then implemented by an application programmer in the way that is most suitable for the application in hand. These kind of extension points called hot spots in class structure of a framework make a well designed framework an extremely exible way of reuse. Also persistence layer could be implemented in terms of frameworks. This would allow the application programmer to customise the behaviour of the persistence to optimise it for application at hand.
The Object-Relational Inpedance Mismatch
Object oriented development and relational databases described in Sections 3 and 4 have quite dierent backgrounds. Object Orientation is a practise based on experience on software development whereas relational databases have a sound mathematical background [1, 6]. When an application is developed two separate data models are designed [2]. An object oriented one that is used in application to represent objects of the problem domain and a relational one that is used to store data describing the problem domain persistently . This both adds extra modelling eort and may generate two models that are partly unrelated and may have conicts in them. Objects are designed to have responsibilities in terms of both data and behaviour but relational databases are all about data, which easily leads into very dierent models about the same problem. Also people working with either of these tend to think development process differently [2]. Where data administrators want to start with a data model of the system, developers following an object oriented process start with user requirements and class models. In iterative and incremental development models are a subject to change whereas in data oriented world data models are something rock solid. In the following Sections we will describe some issues that these dierences rise when storing objects to relational database. We will also describe some of the common solutions found in literature.
5.1
Design Time of Relations
In traditional database related software development data models for the application under development are often specied very early in the process [2, 6]. Now that software is developed more and more in iterative and incremental manners, this kind of predened database structures tend to be a bit inexible [15, 6, 2]. The data model inuences the structure of the software instead of user requirements. In modern software development, it should be user requirements that form the basis for the software design and the database is only a tool to make software remember things between separate runs.
18
This is a political, process related question more than a technical one, so the solutions are usually not technical, but technology must support these new solutions [2]. The solution discussed in both [2] and [6] is to design the database in the beginning of the implementation of the software where core functionality and data structures required are already clear. In our work we will also examine the possibility for the persistence layer to generate a simple table structure by itself to minimise the eort spent by the application developer.
5.2
Representing Objects as Tables
To be stored to a relational database objects have to be mapped to the tables. Because of dierent natures of these domains, this is not always trivial. Objects may be composed of other non-trivial objects, they do not have keys and they may inherit properties from other objects. The table rows do not have same identity properties as objects do. One possible solution is to map a single class to a single table and atomic attributes of it, such as integers and strings, into attributes of the table. Complex attributes are handled as objects of their own with separate tables with foreignkey references to original table and attributes containing collections are stored in the table describing relationships between objects [6, 1]. Simple object structure and relational table structure representing it is shown in gure 1. Here one class, Class1 is composed of a few parameters and it aggregates another class Class2. These are mapped to tables of their own and the table for Class1 contains a foreign key from the table for Class2. Also object identiers are shown in the mapping. These are discussed further in 5.3. The key idea of this mapping is to be simple and provide easy access to the database also for third party applications. Still dierent questions about exibility must be solved, since changes in the class model generate changes in the database schema.
Figure 1: One-to-one mapping of objects to tables
19
Another approach to the mapping, suggested in [2], is to map objects into a general structure where classes, objects, attributes and their values are modelled with tables. This allows the database structure to remain the same whenever classes are added or their attributes are modied. This is discussed in greater detail in Section 5.6.4 and illustrated in gure 7. Advances of a general structure like this are improved exibility of the data model but as a drawback third party access to the database becomes more complicated and there might be performance penalties.
5.3
Object Identier
Objects have identity whereas rows in a relational database lack this feature [6, 1, 2]. Two objects with exactly the same values are still separately accessible by the system, but in the case of rows in database systems, the identity of the rows is dened only by the attribute values. This identity feature of objects should be somehow simulated by the persistence layer which is usually done by adding an object identier to the data of each object as so called shadow information [2]. These identiers are then stored to the persistence system and used as primary keys to retrieve objects and as a foreign key to reference to other objects and they are generated to be unique across all the tables [6]. Object identiers, hereon referred to as OIDs, can also carry type information to make it easier to access objects base on oids. Related to OIDs one technique used in persistence systems is the so called pointer swizzling [13] . This means that a strategy is developed to map OIDs in persistent storage to main memory pointer to minimise the overhead introduced by persistence layer. There are several strategies for this which are discussed in [13].
5.4
Representing Collections in a Relational Database
In object oriented programs dierent collections are used to represent collections of objects belonging to a class or sharing the same base class [12]. There is no direct counterpart for these collections in the relational world so they must be mapped to some structure. Collections dene many-to-one or many-to-many relationships between objects which are further discussed in Section 5.5. So by solving the mapping of collections we also solve questions raised by these objects relations. Objects in a collection may have a specied order or they might be keyed in some special way [2]. These features must be preserved by the mapping to the relational world. If there is an ordering in the collection, the order numbers of objects must be stored to the database and there should be clear politics about how to solve cases like inserting new objects in the middle of the collection that may require renumbering of objects. When mapping a single class to a single table, an additional table is used to represent collections [2]. It is often the same table that represent object relationships. This table then has one column that includes ordering or keying
20
information of the collection. In the case of a generic mapping, like one described in gure 7, collections must be somehow represented by attributes. The keying information can be added to the attribute table. This is not the most beautiful solution since all attributes carry the keying information, but it does add exibility since any attribute and thus class relationship can change its multiplicity at any time.
5.5
Representing Object Relationships
In object oriented development objects are related to each other in various ways as discussed in Section 4.1 and [1]. Objects can be constructed from other nontrivial objects, they may refer to other objects in one-to-one, one-to-many or in many-to-many fashion or they may inherit properties from each other. Some relationships also has properties of their own, like they can be indexed with a special key. All these relationships must be expressed somehow by relational database structure. Representing these relationships when classes are all mapped to tables of their own requires additional tables to represent relationships. As mentioned in the previous Section 5.2, objects inside other objects can be stored in tables of their own with foreign key reference stored to the original table. The same applies to the one-to-one relationships. Using foreign keys is also the key idea in more complicated relationships. To remain exible one-to-many and manyto-one relationships must be modelled using a relationship table. This is also the case with relationships having special properties [1, 6]. As mentioned in Section 5.4 collections are used in the object oriented world to represent these relationships, so mapping collections and mapping complex relationships are reduced to the same problem. In [2] it is also noted that simple relationships between classes could be implemented in tables of their own. This introduces a moderate performance penalty but provides extra exibility. In a relational database all relationships between relations are always bidirectional, they can be queried in both ways. This is not always the case in the object oriented world. The relationship between objects can also be unidirectional, a dierence that must be taken into account by the mapping. Example of complex relationship is provided in gure 2. In the gure Class1 includes a map of objects of Class2 the map being keyed with string objects. If objects are mapped to a generic structure relationships are modelled at class level in a table of its own. This is a exible solution but may cause some performance penalty since it might be that several tables must be accessed to fetch objects. A clear advance is that when properties of a relationship change or new relationships are added only one table is updated. Attributes that carry foreign keys for relations can be stored as attributes in the database structure described in gure 7. Complex relationships that are keyed in some special way still require some clever politics to handle them. Inheritance is yet another relationship among classes which will be discussed in Section 5.6 in greater detail.
21
Figure 2: Object-Relational Mapping from a Complex Relationship
5.6
Representing Inheritance Hierarchies
Inheritance is purely an issue of object oriented domain with no counterpart in the relational world. As the persistence of objects has something to do with the data stored in the objects, we are not interested in types of inheritance where only interfaces of classes are implemented. The thing we really are interested in is a kind of inheritance where the base class has some data members to be stored to the relational database. This could be a public inheritance modelling is-a relationship between classes or a private inheritance modelling is-implementedin-terms-of relationship [19]. There are dierent approaches to map inheritance hierarchy to tables discussed in the following Sections. First we describe a few solutions that are applicable when mapping every class to table of its own and then we describe how inheritance is represented in a generic structure for classes. In each Section we will represent a mapping of example class hierarchy shown in gure 3. This is a simple library example where classes Book and AudioCD inherit a common base class Item and all the classes have attributes to be stored to the database. Base class is an abstract one so it is never instantiated as is. 5.6.1 Whole hierarchy in one table
A table with columns for all the attributes in the hierarchy and one column to dene the correct subtype of the object is used to store all the objects of one class hierarchy [2, 6]. The solution is illustrated in gure 4. This is quite easy to implement and ecient to use. Every object in the hierarcy can be fetched with a single query. On the other hand on the table containing all the possible columns for the attributes of all the classes in the hierarchy becomes very wide and contains a lot of empty elds. This makes it inecient in terms of storage spaces. It also is very inexible when the application changes [1]. This solution does not provide a very elegant solution in terms of database design either. 22
Figure 3: Example class hierarcy
Figure 4: Mapping example hierarcy to one table 5.6.2 Each concrete class to a table of its own
own. This means that attributes belonging to the base class have columns in each of these tables as shown in gure 5 [1, 2]. This is an ecient solution both in terms of execution speed and storage space. No extra space is required and every class can be fetched and updated with a single query. Since the attributes of the base class are stored to dierent tables, changes in base class structure generate big updates to database schemas. Another problem is that queries over all the objects of the base class are hard to implement. 5.6.3 Each class to its own table
In this solution each class of the hierarchy is mapped to a table of its own as demonstrated in gure 6 [1, 6, 2]. This approach is somehow harder to implement than the previous ones and it has also a performance penalty as fetching and updating a single class requires operations on multiple tables. Here each table has OID as its primary key and same oid appears in all tables
23
Figure 5: Mapping each concrete class of the hierarchy to a table representing the class of the object in question. As OIDs connect the rows in dierent tables together they are also foreign keys in every table. As mentioned in Section 5.3 OIDs can also have type information in them. If this is the case and the same OID is present in multiple tables it should somehow be addressed that this OID actually refers to a class hierarchy. This is the recommended mapping in [6, 1].
Figure 6: Mapping each concrete class of the hierarcy to a table
5.6.4
Map inheritance hierarchies to a generic structure
When mapping objects to a generic structure, mapping inheritance hierarchies becomes quite simple [2]. Inheritance is just modelled as another kind of relationship between classes. As mentioned before this structure can act as a solution to all mapping related questions, but its drawbacks lies in performance and in the fact third party access to the database becomes more complicated. Structure like the one presented in gure 7 is highly exible. No changes in classes generate any changes to the database schema, including changes in inheritance hierarchies. Still data structures must be versioned since the existing 24
data may represent previous class structure. The problem with this kind of approach is that operations on objects require accessing several tables. Fetching one class would require accessing four tables. If persistence layer supports changing mapping approach for an existing application, this kind of generic structure can be used when developing and prototyping the application, when change of rate in data structures is high. When the mapping becomes a bottle neck the approach can be changed.
Figure 7: Mapping classes to a generic structure
5.7
Comparison of Dierent Mapping Strategies
As described in the Sections above, there are at least two dierent aproaches for mapping classes to a relational database, to map each class to a table or to map classes to a generic structure. The mapping of each class to a table is the recommended solution in [6] and [1]. It is very simple in a basic case, gives a simple database structure and is quite eective. Special cases like complicated relationships between classes and inheritance structures require special handling. Also the exibility has its limitations since the schema must be changed whenever the class model changes. A general structure to represent classes is described in [2]. It gives a clear advance in terms of exibility. The same database structure can be used to represent any class model. The main disadvantage mentioned in [2] is the performance penalty. The table containing all the attributes for every object seen in gure ?? easily grows very large. Another problem is that the database schema is quite complicated to access for any third party application.
25
5.8
Abstraction of Queries
An application should not be coupled with an underlaying database [1, 6]. If an application is tightly coupled with a database it is very hard and expensive to port it to dierent database systems. This same rule applies also to the persistence layer [1]. The persistence layer should use some kind of abstraction to the database access that is both natural in an object oriented environment and independent from underlying database. This makes it easier to run the application on dierent databases or even on completely dierent persistence mechanisms like XML les. Even when database access is abstracted the application should be able to take full advantage of the features provided by dierent databases [1]. This abstraction can be achieved in many ways. One way is to use a higher level query language that is then compiled to actual queries against the persistence mechanism in question. Another approach is to use a subset of a database language that is common to most of the database management systems, like SQL92 standard. This has the drawback that software can not be ported to other than SQL databases and it can not take advantage of the latest features of those either. In addition many data base management systems are thin enough to be run on an embedded system described in Section ?? do support only a subset of SQL92. Even a more sophisticated method would be to represent queries as objects and the actual query is then generated from this structure [1]. On this kind of approach for java is described in [21]. These query objects can then be generated to sql suitable for the database in question or even XML queries described in [29]. This problem is also present in object oriented visual query systems where the visual presentation of queries is represented by objects which are then converted to database queries. Existing solutions are discussed in [23] where also an implementation in java is described. We will not focus on our project to this question, but it is clear that a fully capable persistence layer that totally abstracts the underlaying persistence mechanism must introduce a query abstraction that is capable to take full advance of the database instead of treating it as a simple storage mechanisms.
5.9
Mapping Query Results to Objects
When a programmer queries a database, the results returned should be presented to the application in a form that is natural to the object oriented paradigm. When the result set contains only objects from a single class this is easy, but often query results may overlap several classes. As result sets provided by queries may be bigger than the available memory some abstraction of cursors described in section 3.6 should be available. Usually the database is asked for objects of a certain type. In this kind of query it is easy for persistence layer to generate objects. But in [1] it is stated that the system should allow also arbitrary queries. For example, if we want to make a query for a class structure described in gure 8 to 26
display a list of music pieces including the names of categories for each piece of music. If this is done by fetching all the object into memory, both for the pieces of music and their categories, this would introduce large overhead compared to direct query that simply joins the music piece and the category.
Figure 8: Class Structure Example In our work we will concentrate on pure object approach and minimising the over head involved. The mapping of arbitrary query results to objects is also discussed in visual query systems.
Our Embedded System
In general, the embedded system refers to a computer system integrated into another device [17]. In practise these range from tiny watches and smart cards to the high end rewalls and routers. Our system is integrated into an active loudspeaker forming a device that can be programmed to play dierent audio les or streams. The whole system is managed and updated over Internet [25]. The device has Linux operating system [27] running on it on sixteen megabytes of RAM and uses ash memory or a hard disk as permanent storage. The main CPU of the system is a 25M Hz Etrax 100LX risk processor witch produces 100M IP S of computing power [30]. The data managed by the system is stored into a custom relational database management system data les of which are located on the hard disk. If systems is run without the hard disk, the database is accessed over a TCP/IP networking. This kind of environment sets some limitation to the software running on it. As an embedded system this is a quite powerful one, but compared to usual database and web servers handling audio data resources are very limited. First of all, the amount of processing power and memory is limited. Also dependencies to external libraries should be minimal since all the libraries used must be ported to the target system. These limitations also aects the persistence layer to be used on the device. The over head introduced by another layer of indirection should be moderate. Even though persistence layer should provide persistence to dierent persistence mechanisms, it should be possible to compile it with only one mechanism in it. This is because accessing dierent databases usually requires dierent libraries and not all of these should have to be ported to the embedded system. There are dierent ways to take limited resources into account when designing persistence layer. These are discussed in the fallowing Sections.
27
6.1
Proxy Objects
What is meant with proxies when it comes to objects is an object that is only partially in memory [6]. This means that only a minimal set of its attributes have values and others are omitted. This way objects consume less memory, for example when listing objects in a database. When detailed information over an object is needed, the rest of it is fetched. In C++ proxies can be eectively implemented using smart pointers [15]. Still there is the problem when to prefetch proxies. This must somehow be expressed by the application using persistence layer or be setup in the mapping data with a management application. In [2] the use of graphical management system for persistence services is suggested. On the other hand, in [21] an implementation where the path of objects to prefetch from database is described in the application program. The rst approach allows the performance tuning of the application after the development phase only when the performance really becomes a bottle neck whereas the specifying behaviour in application bounds the system tightly with the persistence framework and promotes optimisation during development. On the other hand, the application programmer often has the best insight to the program and he knows where prefetching proxy objects will have the highest added value.
6.2
Cache
One way to optimise database access speed is to cache data retrieved from database [32]. When the same data is accessed again the data can be fetched from the cache instead of the database. This improves the speed of access especially when the database is located behind a network connection. Of course cached data consumes memory. So caching is somehow a compromise between speed of access and size requirements. It depends on application whether caching has real advances. If application runs with tiny memory requirements and objects are known to be used only once, caching has no advantages. On the other hand, an application that is interactive and thus requires moderate access times to a database and may access same data entries many times in a way that is hard to predict by the application programmer caching can have real advantages. Caching of data is a simple idea, but it rises complicated issues when concurrent access to the data behind the cache is allowed [9]. The problem is very much the same as with a shared memory multiprocessor system with private caches. One trivial solution to cache coherency problems is to keep cached records locked, but this practically prevents all the concurrency. No simple solution exists.
6.3
Abstraction of Cursors
As described in Section 3.6 relational databases use a technique called cursors to minimize memory requirements of query results. This is essential in embedded systems where query results are likely not to t in the memory available. Persistence layer should provide abstraction for cursors so that the application
28
can easily access a large set of objects without having them all in the memory at once. As stated in [1] persistence layer should always return a collection of objects and the principle of cursors is that they are accessed one by one. For this kind of purposes there is a design pattern called iterator [10]. An iterator is an object that represents a cursor to a set of objects. A persistence layer can return an iterator as a result from a query. This iterator object then holds the database cursor in it. When the application iterates trough the result set, the cursor fetches objects from the database when needed. This way the application can access result sets using cursors just like native collection objects of standard libraries.
6.4
Multi-object Actions
Many actions on data overlap several objects [1]. For example operations to fetch or store multiple objects access large set of objects in the database. These operations should be optimized to take place inside a single transaction and queries should be combined when possible. For example when objects are fetched automatically as application accesses them through a reference from another object, it might be that several objects are to be fetched. The access speed could be increased by allowing the persistence layer to prefetch objects in one large query. The problem is that the persistence layer can not know when to do this kind of prefetch of objects. One solution described in [21] where the application programmer gives persistence system hints about paths along witch to do prefetching. In [1] a solution where metadata used by the mapping includes hints for prefetching. The later has the advance that behaviour can be tuned when needed simply by changing the meta data.
Implementation Language
The implementation language used is C++, which is an object oriented programming language raised from C [31]. C++ is designed to be compatible with C, to fully support the object oriented paradigm and to be ecient [31]. This requirement for eciency is also inherited by dierent libraries written in C++ and it is often the reason why to select C++ as an implementation language. This is also the reason why we have selected it to be used in our embedded system 6 as application level language. It does not provide much overhead compared to C, but has a rich standard library and is object oriented making it more eective in terms of implementation time for us. The eciency, in addition to object oriented exibility, is also one major goal for our persistence system. Both in terms of execution speed and memory requirements. The ways to achieve these are discussed in Section 6. There are also some limitations set by using C++ as an implementation language when it comes to object persistence.
29
7.1
Exceptions for Error Handling
An object oriented application uses error handling technique called exceptions [12]. When a routine failes it creates an exception object that describes the error taking place. Then it throws the exception and the routine that has called the erroneous routine catches the exception at some point. Exceptions are superior to traditional error codes returned by routines since the program does not have to check them for every routine. The persistence layer should also provide an abstraction for database errors in form of exceptions.
7.2
Lack of Reectivity
Reectivity is a feature of some object oriented programming languages that allow the program to access its type information runtime. For example in Java this feature can be used to determine the attributes of a class to be stored C++ does have only very limited reective programming capabilities. This means that there must be some way to tell to the persistence system what attributes it should store from each class [1, 15]. This couples classes of the business domain more tightly to the persistence system which is undesirable as discussed in Section 4.2. The solution we have used is described together with the example code in Section 16.
Issues with Legacy Data and Applications
It is often the case that applications must be developed to co-operate with existing systems. With existing databases, applications, web services etc. This places a signicant constraint to the application design. If application must access a legacy databases that is used by other applications the database schemas are not easily changed. The data obtained from previously existing systems is called legacy data [2]. Data can be exported from existing applications or it can exist in databases. It may be that it is enough to once import legacy data to the new application, but often old and new systems must coexist. In this case it may be that the new application also must be able to do updates to the existing data. Usually the existing data and applications do not reect the requirements of the new application. If they would there probably would not be a need for a new application. It might be that there is information missing or that the database includes extra information that must by handled in case of updates etc. It is also common that a database used for a long time in an organisation has grown out of its schema, meaning that as requirements have changed database structures have become outdated. New information is added to existing elds that are then parsed in applications or some elds have become unnecessary. All this makes accessing legacy data a complicated issue. Even though it is a fact that legacy data issues must be taken in account in application design they should not be the driving force of the design [2]. One should still focus on the requirements set to the application, design the application from scratch and then consider what kind of database structures is needed 30
and how well existing databases support them [6]. In these questions a capable persistence layer may come in handy. A persistence layer can support connection to dierent databases and data exchange formats to import and export data to and from the application. Persistence layer can also provide exibility to the development of the application so that the user requirements can be satised easily including the requirement of being able to access pre-existing data.
8.1
Several Persistence Mechanism
By a persistence mechanism we mean a storage system where objects can be stored to between program runs. There are several dierent persistence mechanisms available ranging from at les to object oriented databases and web services. All the technologies have multiple vendors with their own dierent solutions. Often an application is written to support one persistence mechanism that is currently in use and if required ported later to other mechanisms. A full featured persistence mechanism should support a large set of dierent persistence mechanisms [1]. This allows the porting of the application to dierent persistence mechanisms with minimal eort. When all the details of the storage systems used are completely hidden from the application programmer, the application becomes independent from the persistence mechanism used coupling it only to the persistence layer. If the persistence layer supports plug-ins for dierent persistence mechanisms that can be linked dynamically with the application large set of dierent mechanisms could be supported still keeping the library small enough to be ported to embedded systems. One important requirement for the support for multiple persistence mechanisms is that the system should be able to still take full advantage of each of them. This makes the design of the persistence layer more complicated. One possible design is discussed in [1].
8.2
Multi-object Actions
Many actions on data overlap several objects, these should be executed eciently. Each object should not generate query of its own. Instead queries about dierent objects should be grouped together.
8.3
Multiple Connections
A persistence layer should be able to handle multiple connections to dierent persistence mechanisms simultaneously [1]. This gives it ability to transfer data from one persistence mechanism to another or from a version of database schema to another. This gives a lot of exibility and extensibility 4.3 to the system by allowing it to transfer data from one connection to another. If a persistence layer supports several dierent persistence mechanisms as de31
scribed in Section 8.1 together with multiple connections, the persistence layer can be used as a powerful tool to export and import data from dierent persistence systems. For example legacy data or data generated by legacy applications can be imported from XML and stored to a relational database or data from relational databases can be queried and stored to XML to be used with third party applications. All this with minimal additions to the application code.
Requirements for a Persistence Layer
A general layer like persistence layer has several stakeholders that all set dierent requirements for the layer. Dierent stakeholders are summarised in table 2. ID S1 S2 S3 S4 Stakeholder Application Developer User of the Application Database Administrator Third party Application Developer Maintenance Developer Description Programmer that uses persistence layer in his application. User that uses an application taking advance of the persistence layer. Administrator responsible for database management. Programmer developing third party applications interacting with the one using the persistence layer. Programmer responsible for further development of the application after it has been taken into production use.
S5
Table 2: Stakeholders of the persistence layer The main user of the persistence layer is the application developer that uses the persistence layer in his application. The developer wants to use persistence services with as little eort as possible with out loosing the exibility of the object oriented programming. He also have to deal with existing legacy data. Whether the application is interactive or not, it always has a user in some sense. For an application user, the usage of a persistence layer should be invisible. One important aspect of this is that persistence layer handles dierent error conditions properly. The user should also be unable to notice other users accessing the database simultaneously and no extra latency should be generated by the persistence layer. The database administrator should be able to take advantage of the database and the version independence provided by the persistence layer. He can switch from database to another when needed and data can be easily transferred between database management systems, program versions and other applications. A third party developer wants to be able to exchange data with the application in standard format that can be easily integrated to her application. She 32
doesnt want to pay any attention to the internal behaviour of the application she interacts with. The maintenance developer must handle issues with existing data that previous versions of the application have produced. Still she must be able to eectively produce new features and re-factor the old software maintaining its good quality. An overview of the architecture of a system taking full advantage of a capable persistence layer is shown in Figure 9 modelled after [1] and [6]. The gure indicates dependencies between the components of the system. The persistence layer provides transparent storage services for the classes of the problem domain hiding database totally from the application. In addition the persistence layer can provide services to export data in dierent formats like in Figure 9 the persistence layer can transparently export some data to XML that can then be used by other applications. The persistence layer can also provide help dealing with legacy data, a problem that most database developers have to deal with. The database itself is not depending on the persistence layer so it can also be accessed without the persistence layer.
Figure 9: Role of the persistence layer in a software system From literature we have found some problems involved with persistence layers that should be addressed somehow. They are covered in the previous sections and are summarized in table 3. We list the properties that a persistence layer should have based on from [1],[6] and [2]. The features are described in a form of statements that can also be used to validate the design and implementation of a persistence layer. References to all the stakeholders of the projects and priorities for each feature are also included in the table. ID Feature Description Priority Stakeholder
33
F1
Minimize modelling eort
The persistence layer should be able to generate initial database schema and mapping metadata. Section 5.1. The system should be able to save and restored objects to and from relational database. Section 5.2. The system should be able to maintain the relationships between objects stored and maintain referential integrity. Section 5.5. The system restores objects from database when referenced to trough another object. Section 4.1. The system should be able to store inheritance hierarcies to database. Section 5.6. Persistence Layer must preserve the identity properties of objects. Section 5.3. Persistence layer is able to store collections of objects preserving their ordering properties. Section 5.4. Persistence layer is able to map arbitrary query results to objects. Section 5.9. The system should be able to save objects to dierent persistence mechanisms like dierent databases and XML les. Section 8.1.
medium
S1,S3
F2
Store single objects
high
S1, S2
F3
Manage objects relationships
high
S1,S2
F4
Lazy initialisation
medium
S1
F5
Store inheritance hierarchies Object identity
high
S1
F6
high
S1
F7
Strore collections
high
S1
F8
Represent query results Several persistence mechanisms
low
S1
F9
medium
S1, S5
34
F10
Full encapsulation of persistence mechanism Support transactions
It should be possible to change persistence mechanism without modication to the application code. Section 4.2. The persistence system should use transactions so that concurrent acces to the objects in the mechanism does not corrupt the data. Section 3.4.
medium
S1, S3, S5
F11
high
S1, S3
F12
Extensibility The persistence layer should support addition of classes to existing class models and updates to database schema. Section 4.3. Locking The persistence layer should provide locking for objects in a persistence mechanism. Section 3.5. Persistence layer can control the access to objects in persistence mechanism. Section 19.2. When multiple objects are fetched at once the persistence layer should be able to use database cursor to fetch them one at the time. Section 3.6. When only few of the attributes of an object is needed an proxy object with only needed attributes is fetched. Others are fetched when needed. Section 6.1. The persistence layer uses a cache to speed up database operations. Section 6.2.
medium
S1, S3
F13
medium
S1, S3
F14
Access control
low
S1,S2,S4
F15
Cursors
medium
S1
F16
Proxy Objects
low
S1
F17
Cache
low
S1
35
F18
Multiple connections
The persistence layer should be able to handle multiple simultanous connections to persistence mechanisms so that it can transfer objects from persistence mechanism to another. Section 8.3. Persistence layer should provide simple and clean interface for application programmer to the persistence services, that is compatible with C++ standard template library and take account the limitations of the language. Section 7. The system should provide object oriented abstraction for dierent error handling systems in dierent persistence mechanism in terms of exceptions. Section 7.1.
medium
S1, S2, S4
F19
Application interface
high
S1, S5
F20
Error handling
high
S1, S5
Table 3: Common Problems With Persistence Layers
9.1
Questions we are looking answers for
The questions that we are seeking answers for in this thesis are listed in table 4. In Section 17 answers to these questions are discussed. Q1 Is it worth the eort spent to build a persistence layer between applications and relational databases ? It might be that a general purpose persistence layer turns out to be so complicated system that implementing one would not be worth doing it. Q2 Does the usage of a persistence layer require extra eort from the application programmer? If the persistence layer sets a lot of limitations to the application the usage of it may become a burden. Does the eort spent by the application programmer pay back both in short term and long term?
36
Q3
Is it possible to provide extra exibility for iterative development? A real advance from a persistence layer would be that application could be developed iteratively without paying too much attention to the database structures in a relational database
Q4
Is it possible to provide exible access to legacy data It would have real added value if a persistence layer could provide access to dierent formats of pre-existing data.
Q5
Is it possible to provide easy transition over dierent databases If a persistence layer could encapsulate all the database related operations the application would become really independent of the database, which would make transition from one database to another fairly easy.
Q6
Is it possible to achieve both exibility and eciency Many questions related to a persistence layer compromise between exibility and performance. We are looking for solutions that try to achieve both to some degree. When not possible we examine if it is possible to tune the persistence layer per application to meet the requirements of the task at hand.
Q7
Is it possible to use persistence layer in our embedded systems? If a persistence layer can be implemented eectively enough in terms of execution speed, memory usage and code size and if it does not depend of too many libraries it could be used on our embedded Linux system.
Q8
What kind of process changes does the inauguration of a persistence layer require? When integrating a persistence solution at process level some changes may have to be done to the traditional way of doing things. What kind of changes are required and what are the possible advantages and disadvantages of these? Table 4: Question set for these thesis
9.2
Goals for Our Implementation
In our solution the main focus is on rapid, exible development not forgetting the performance limitations set by an embedded system. In terms of this thesis no full featured implementation is given, but instead we will represent a design that we have tested with a simple test implementation. The main goal is to reduce the eort spent on data modelling and to make it possible to easily adopt an iterative, exible process while using relational databases and to promote the reuse of the persistence related code.
37
Although a persistence layer propably introduces some performance penalty, we are trying to keep it moderate enough to make the solution usable on our embedded system. In addition to exibility in development we are trying to provide exibility at the data access level. We will examine the possibility to change the mapping of class data model to a relational model for an existing application for database optimisations etc. Though we concentrate on relational databases, that we are trying to abstract away from the application developer, we do not exclude the possibility that the underlaying persistence mechanism could be something completly dierent. For example XML les etc. For legacy data access, we are providing multiple simultaneous connections to exchange data between persistence mechanisms.
10
Our Solution
In the following the architecture of our solution is described. We use 4 + 1 view points like described in [14] except that instead of physical view point, we use a data viewpoint. The use of dierent viewpoints is also promoted by IEEE standard 1471 [20]. We also provide references to the features listed in table 3 to provide traceability for dierent design solutions. The rst view begins with the logical view in section 11, the static structure of the system described by class diagrams. Then we discuss the development view in section 12 of the persistence layer. Here questions like code organisation to libraries and linking to applications are discussed. Solutions to issues related with performance, concurrency and other dynamic characteristics of the system are described in section 13. In section 14 we will describe database structures used and generated by persistence layer. Finally we provide scenarios to interconnect these dierent views in section 15. In each scenario a typical request from the application to the persistence layer is described together with illustrations of how the components of the persistence layer interact to fulll this request.
11
Logical View
Here the static structure of the persistence layer is described. We rst start from a higher level architecture and then descend to the dierent subsets of it. In Figure 10 the overall architecture of the persistence layer is illustrated. Only the major classes are displayed. Class P ersistentObject denes an templatized interface to be implemented by classes to be stored trough the system. It requires them to implement method GetClassData that is used to overcome the lack of reexity in C++. P ersistentObject is also used to manage the state of the objects using state design pattern as described in [10]. The issues related to object states are discussed further in section 13. GetClassData method returns an object of type ClassData. It is an interface 38
used to describe the structure of a class. It is implemented by ClassDataObject template class as illustrated in Figure 12. The singleton class ClassM anager is used to manage the class information and to form an object oriented presentation of the metadata describing the data structures to be stored to trough the persitence system. All the attributes of the classes to be stored are listed in this structure. Templates are used to generalize dierent attribute types. This representation of the datamodel is stored to the persistence mechanism in question. It can then be tuned for performance and to map classes to a dierent database structure. In namespace P ersistenceM echanism interface for a persistence mechanism to implement is described. The idea is to fully abstract the persistence mechanism both from application and from the rest of the persistence layer. This allows the application to change the persistence mechanism when needed as stated by feature F 9 in table 3. The abstract class P ersistenceConnection hides the details of a persistence mechanism from the application. All the operations to a single persistence mechanism are passed trough this interface. When application operates on multiple persistence mechanisms it uses one of this type of objects for each of them to fullll the feature F 18. Abstract class Generator is used to generate queries from their object representations as stated by features F 19 and F 10. The interface does not dene any operations since not all the persistence mechanisms provide all the possible functionalities. When queries are then generated to the representation specic to the persistence mechanism C++ templates are used to provide compile time checks for syntax of queries and that the mechanism used supports the features requested. The T ransaction interface is used to abstracts the transaction on the persistence mechanisms as required by feature F 11. The interface class automatically rollbacks uncommitted transactions in its destructor. This way the transactions can be used in natural object oriented manner, where exceptions thrown in the middle of transaction automatically reverses the changes done. The transaction objects are created using factory method in class P ersistenceConnection that is a friend class of the T ransaction class witch has a private constructor. This way the transactions are always bound to a connection to the persistence system. Just like transactions also database cursors are abstracted in an object oriented manner. The abstarct class P ersistenceIterator is a sub class of the standard library iterator that is to be implemented by the persistence mechanisms. As a concept iterators and cursors are quite similar so this abstraction provides very easy access to the cursors for object oriented applications. ObjectM anager class is responsible for all the operation performed on the objects. It adds an object oriented layer over the abstraction of the persistence mechanisms. It converts the persistence related object operations to a form understandable by the P ersistenceConnection interface. The routines used to fetch objects return iterators. The iterator class used can be extented by a persistence mechanism to provide abstraction for cursors as stated by feature F 15. Also proxy objects and lazy initialisation, features F 4 and F 16 ,would be responsibilities of the object manager if implemented. When the persistence layer is extented to support dierent mappings the object manager can be devided 39
into set of classes and encapsulate the mapping operations behind a common interface to be called from the ObjectM anager. This way there could be a few dierent classes to fullll this interface and a factory method to create proper mapping object for each class to be stored. Singleton class P ersistenceF acade is used to manage dierent operations on the persistence layer. Classes are registered trough it and it can be used to do dierent operations on the main persistence mechanism. The main persistence mechanism is the one that is used to generate object identiers and is used as the primary storage for objects. The application does not have to take care of the management of this connection. Application can also create additional connections to save objects to dierent persistence mechanisms.
Figure 10: The Architecture of The Persistence Layer
40
Figure 11: Abstraction of Persistence Mechanism
Figure 12: Class structures used to describe the data model of the application
11.1
Representing Queries as Objects
As mentioned in Section 5.8, queries should be generated by the application in a way that completly decouples it from the persistence mechanism. This design is shown to full ll requirement F 10 in table 3. In Figure 13 one way to present queries as objects is described. This is partly modelled after [1], but is modied to take implementation language into account. In C++ templates provides a way to syntax check queries against a database in compile time. Also abstraction of the connectivity to the database is provided by this class structure by encapsulating data access using interface class P ersistenceM echanism as described in gure 11. The query can be either one that fetches object from a database, one that updates existing objects or one that inserts a new one. The ones that does updates and fetching of objects are given the conditions that an object should met to be aected by the query. Complete class structure describing a where clause is shown in Figure 14. This object representation of queries is then generated into a query by an object representing the current persistence mechanism inherited from the abstract base class P ersistenceM echanism. This way this abstract presentation of a query can be generated to one that can be eectively run on the database currently in use, without coupling application to the database [1]. Our abstraction, as well as the one represented in [1] is not even nearly as 41
powerfull as SQL, but it is designed to be good enough for our persistence layer. Full fuatured object oriented query languages are far beyond our scope.
Figure 13: Class Structure Representing a Query
12
Development View
Here we describe the actual software module organisation of the persistence layer. This is a non-trivial question since as a library persistence layer must be able to adopt itself to dierent software congurations. In addition we also must cover little the application that is using our persistence layer, since the library is of little use by itself. Since one of the goals of our system was that it runs on dierent persistence mechanisms it is a client for many libraries itself. There are dierent ODBC client libraries, native client libraries for dierent databases and XML-parser libraries just to mention few. The persistence layer can not require all of these libraries to be installed. Especially in embedded systems it is extremely important to be as independent from third party libraries as possible. The persistence solution desinged by us is to be devided into several libraries to reduce dependencies. The idea is to split each abstraction of dierent persistence mechanisms into a library of its own. These libraries depend on the third party libraries necessery to access the persistence mechanism in question and the main persistence layer library. This way application can be linked against only the persistence mechanisms really needed. This is especially important on platforms where dynamic linking is not supported. The organisation of code into libraries is described in Figure 15. The arrows in the Figure describe the dependecies among dierent software components.
42
Figure 14: Class Structure Representing a Where Clause
Figure 15: Organisation of the software components of the persistence layer
13
Process View
Here we discuss the dynamic nature of the persistence layer. Questions like eciency in terms of bot hexecution speed and memory usage and concurrency 43
issues are to be addressed. References to the features listed in table 3 are mentioned when apropriate. Since our goal is to use persistence layer in an embedded environment eciency is an issue for us. Our design is kept as simple as possible to keep the library small. This simplicity also aims at faster execution. Since the database operations are probably to take most of the time so they should be optimized as well as possible. The persistence layer should not generate too much queries to avoid extra overhead. This is done by grouping the queries together as much as possible by storing prefetch information together with the metadata described in section 14. Since most of the application in our environment does not have to keep all the objects fetched from the database in memory at once, the persistence layer should not require this either. This is why the abstraction of database cursors is so vital to our solution, since is makes it possible to access large sets of objects in database without instantiating all of them simultanously which makes the memory usage of the application not dependent on the number of the objects processed. As can be seen in gure 10 the state of an object is hold in P ersistentObject abstract class. The state is modelled with classes with a common interface as stated in desing pattern state in [10]. The state dependant functionality is set in to the class representing the state in question. In our design this state class is used as a visitor class in ObjectM anager when persistence operations are executed. The main dierence in behaviour is for example, how object is stored. If object is new it has to be rst created to the persistence mechanism. If it is in state dirty is is simply updated. And if it is clean it does not have to be stored at all. A conservative aproach is used on the dirtyness of the objects. It can not be fully detected wheter an object is changed or not so it is assumed to be dirty when ever unsure. In simple implementation objects practically always are dirty. The abstraction for transactions as described in section 11 is used to keep database in consistent state. The persistence layer generates transactions for database operations, but in addition it should be possible for the application to generate transactions. The P ersistenceConnection class keeps track of the on going transactions on the persistence mechanism in question so that the persistence layer knows wheter a persistence operation is performed in the middle of an application initialized transaction or if it has to generate one implicitely. As mentioned in section 11 transaction interface class does automatic rollback for unnnished transactions in its destructor to allow transactions to interact with exception error handling. In addition to transactions also a higher level concurrency solution should be developed. Option for optimistic lockin scheme is taken into account in database desing as can be seen in section 14. The timestamp for objects is stored as an additional shadow attribute for each object, if an collision is detected, as described in section 3.5 and in [2], and exception is thrown to the application. For pessimistic locking scheme, application initialized transactions can be used. As discussed above, the database errors are all delivered to the application in form of exceptions to support native object oriented error handling as stated
44
in feature F 20. Combined with proper object oriented transaction support this reduces the eort that application programmer has to do to handle database related error conditions.
14
Data View
In this section we describe the datamodel that persistence layer uses to store data to a persistence mechanism. Main focus is on mapping a class structure into a relational database, but the same datamodel may be applied to dierent persistence mechanisms.
Figure 16: The meta data representing the class structure of an application The persistence layer can generate a data model from the class structures in the application. This data model forms the basis of the meta data that describes the mapping of the classes to the relational database. The data structure used to store the metadata is illustrated in gure 16 3 . The database structures were rst dened in normal form, but then optimized for performance by using denormalisation as described in [2]. The idea of the metadata based mapping is to reduce coupling between database and application [1]. Meta data can be customized to adopt the mapping to changes in database structure and to optimize application performance. As described in gure 16 the meta data describes classes their attributes and relationships between classes. A class carries information about how it is mapped
3 UML is used for data modelling here. There is no standard for this yet, but we use the syntax used in [2].
45
to tables. It can use a general mapping or it can be mapped to a table of its own as stated in section 5.2. If mapping of a single class to a single table is used the table and column is specied in the meta data. If the meta data model is compared to the generic structure represented in gure 7 it can be seen that the metadata is actually a sub set of this general structure. This way a general structure representing objects can be converted in to an one-table-toone-class mapping without changes in application, which allows the application development team to change the mapping when needed. The table relationship in gure 16 is used to capture the versatile nature of inter object relationships. When the persistence layer generates the metadata to store objects it gives default values for the relationship table based on the relationships between classes. As discussed in section 4.1 the type of the relationship between objects give hints about the cascading operations on objects. On inheritance relationship all cascading operations are used and multiplicities are xed, but for association by default no cascading operations are used. These attributes can be customized to optimized the memory requirements and execution speed of the application. The version table in the gure 16 stores a running version number for the data model. This issue is not fully addressed in our work but is further discussed in section 19.3.
15
Scenarios
In this section we will describe couble of scenarios about how our persistence layer design is actually performing its basic tasks, storing and restoring objects. When the application rst creates an persistent object it is like any other object in the application. It is created and initialized by its constructor. Its super class P ersistentObject set the state of the object to be N ew. When the object is stored trough the persistence layer, as in line 18 of the code example in gure 19, the persistence facade is called. It calls the object manager with the default persistence connection of the application, which in this case would be a connection to a relational database. The object manager then uses the methods dened in P ersistentObject interface class to read the attribute values of the object to be stored. It uses the information stored in ClassM anager to map these attributes to a set of query objects. After this it uses the persistence connection in question to start a transaction and then passes the query objects to the persistence connection using the interface class P ersistenceConnection which then generates the proper queries for the persistence mechanism in question. Object manager commits the transaction, returns and the object is stored to the persistence mechanism in question. In the code example in gure 19 an object is restored from the persistence mechanism using a simple query that denes a condition over the parameters of the object. As in case of storing the object, the P ersistenceF acade calls the ObjectM anager to restore the objects from default persistence mechanism. The object manager creates query objects to read objects and combines the condition provided by the application to this query. Again the mapping in-
46
formation is got from the ClassM anager. The query is then passed to the P ersistenceConnection interface that returns an abstraction of a database cursor for the persistence mechanism in question. The information returned by this iterator is then used to generate objects in the iterator the ObjectM anager returns to the application trough the persistence facade.
16
Usaging Persistence Layer from an Application
In this section I will briey describe how an application uses persistence layer at code level. I will use some code examples and class diagrams to illustrate this. Since C++ lacks reective programming capabilities, the application classes must be coupled to the persistence framework. In Figure 17 the organisation of classes using persistence framework is described. The classes in the application domain to be stored using persistence layer inherit this capability from the class P ersistentObject provided by persistence layer. When storing and retrieving classes from the persistence layer application calls services provided by P ersistenceF acade that is a singleton class acting as a portal to all the services provided by the persistence layer. The actual C++ code to dene the M usic class is shown in Figure 17. 4
Figure 17: Example of Persistence Layer Usage As seen on line 3 of the code example, class M usic inherits its persistence capabilities from P ersistentObject template class to which the type of the M usic class itself is passed as a template argument. On line 9 the default constructor of the M usic class calls the constructor of P ersistentObject and on the lines from 15 to 20 an abstract method of P ersistentObject is implemented to return the set of attributes to be stored to the persistence mechanism by the persistence layer. In the same way also the reference to the Cathegory class is dened.
4 Storing artist as a string here is not a good example of an object design, but serves well in this simplied example.
47
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
/ Class that
inherits /
persistence
c a p a b i l i t i e s from
PersistentObject
c l a s s Music : p u b l i c P e r s i s t e n c e : : P e r s i s t e n t O b j e c t<Music > { public :

/ constructor c a l l i n g the c o n s t r u c t o r of the PersistentObject /
Music ( ) : P e r s i s t e n t O b j e c t<Music > ( ) { } ;

/ method t o g e t a t t r i b u t e s o f t h i s c l a s s . Used by p e r s i s t e n c e l a y e r t o s e t and g e t a t t r i b u t e v a l u e s o f t h e object /
v i r t u a l Persistence : : Attributes GetAttributes ( ) { return Persistence : : Attributes () . AddAttribute<s t d : : s t r i n g >( name , name ) . AddAttribute<s t d : : s t r i n g >( a r t i s t , a r t i s t ) . AddReference ( c a t e g o r y ) ; } SetName ( s t d : : s t r i n g name ) ; SetArtist ( std : : s t r i n g a r t i s t ) ; private : s t d : : s t r i n g name ; std : : s t r i n g a r t i s t ;
/ Reference to a category o b j e c t that w i l l be r e t r i e v e d o n l y when r e f e r e n c e d t o . /
P e r s i s t e c e L a y e r : : L a z y R e f e r e n c e<Category > c a t e g o r y ; };
Figure 18: Code to Dene a Class of Persistent Objects
48
The code to store and retrieve persistent objects through the persistence layer is shown in Figure 19.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
i n t main ( ) { PersistenceFacade facade = PersistenceFacade : : GetInstance ( ) ;

/ Connection o b j e c t to connect p e r s i s t e n c e t o a p o s t g r e SQL d a t a b a s e / layer
f a c e d e>c o n n e c t<PostgreSQLConnection >( username , password , 1 2 7 . 0 . 0 . 1 ) ;
/ C r e a t e new o b j e c t u s i n g f a c a d e /
Music music = new Music ; music>SetName ( A l l You Need I s Love ) ; music>S e t A r t i s t ( B e a t l e s , The ) ;
/ Store the o b j e c t /
f a c a d e>S t o r e O b j e c t ( music ) ; d e l e t e music ;

/ Get a l l music o b j e c t s where name e q u a l s A l l You Need I s Love /
f o r ( P e r s i s t e n c e : : i t e r a t o r i= f a c a d e>R e s t o r e O b j e c t s<Music>( Attr ( name ) == Value ( A l l You Need I s Love ) ) ; i = i . end ( ) ; ++i ) c o u t<<i>G e t A r t i s t ()<< : <<i>GetName()<< e n d l ;
return 0; }
Figure 19: Code example of using persistent object in an application. On line 5 of the code, a connection to the persistence mechanism is created. This connection is used to store and retrieve objects. On line 9 a singleton instance of class PersistenceFacade is retrieved. On line 12 a new object of M usic class dened in gure 18 is created. On lines 14 to 15 some attributes for the newly created music are set. To store the music object the StoreObject-method of persistence facade is called on line 18. This stores the newly created object to the database. On line 20 the object is then deleted. From line 24 onwards the persistence system is queried for all the music objects with the name attributes value All You Need Is Love. The request returns an iterator that is then iterated through and all the returned objects are printed 49
out. Probably returning a single object.
17
Analyzing Results
Here we evaluate our solution by comparing it to the goals we have set in Section 9. We were able to develop a design that solves the most critical features stated in Table ??. Our solution could still be developed further into a general full featured persistence layer that fully abstracts the persistence mechanisms and provide easy data exports and imports. The biggest problem in our simple design was that wery much of the work is delegated to a persistence mechanism to implement as can be seen in Figure 11. The main goal for this work was to answer the questions set in section 9.1. We have gathered information both from literature and by designing our own solution and doing some implementation tests. Test implementation it self is mainly used to test the design and give an idea about the eort required. Q1 Is it worth the eort spent to build a persistence layer between applications and relational databases ? The design of a full featured general purpose persistence layer turned out to be fairly complicated [1]. Based on our design, number of requirements set and on our experience on how much eort embedded application development really needs, the eort needed to develop a persistence layer is nontrivial process that needs more eort than developing an average embedded application. Implementing one to be used in a single application may not be worth the eort spent, but if a persistence layer is reused in several applications it might very well pay the investment back. How soon depends of course on the projects. In [6] it is stated that 25 to 50 percent of the code is needed for object database integration. If a signicant share of this can be reused in terms of a persistence layer, the eort spent might well be worth it. Implementing our simplied implementation tests took less than one hundred hours so based on this information, using this kind of simplied version tailored for a single application to be application specic might still be cost-eective in long term due to simplied maintenance programming. In [1] is is also noted that a persistence layer is so complicated that it should not be implemented just to reinvent the wheel. Instead using an existing solution should be preferred if there is one available. In our case, this requires an existing solution with full source code, since it must be compiled for our embedded environment. Q2 Does the usage of persistence layer require extra eort from an application programmer? Especially when programming in C++ the programmer has to do some extra work to dene attributes for classes. On the other hand this gives the programmer more control than reectivity based approaches in java. Also very complicated data structures may turn out to be problematic, which forces the application programmer to think about persitency issues while writing 50
business domain classes. This is bad especially if the persistence system forces the design to be non-appropriate. So some eort is required from the application programmer, but still a lot less than that of manually storing objects. Q3 Is it possible to provide extra exibility for iterative development? Since persistence layer introduces a mapping between datastructures in relational database and class structures the class structures are less bound to the database strucutres [2]. In addition usage of general data structure makes changes in class structure quite easy to adopt. If lot of existing data is evolved, some data conversions may be needed. The persistence layer allows these conversions to be written at the object level instead of the database level making the task easier. Q4 Is it possible to provide exible access to legacy data Thanks to the abstraction of dierent persistence mechanisms, access to legacy data can be made considerably simpler [2]. Dierent databases and other datasources can be accessed through a uniform interface and possible data conversions can be written at object level. Also metadata based mapping provides some exibility to accessing objects in legacy data sources [2]. Still there will probably be huge problems when accessing legacy data [?]. Database schemas may be outdated and do not reect current requirements, data may be inconsistent etc. Persistence layer may be used to verify data, but inconsistencies still should be xed. In [?] it is proposed that legacy data is exported for development and for the new application a new database structure is used. If the legacy data is ported to a format supported by the persistence layer, importing this data will be fairly easy. Q5 Is it possible to provide easy transition over dierent databases Thanks to a high abstraction level, porting from one database to another is quite easy [1]. If existing data must be transferred between databases it can be done at the object level. Q6 Is it possible to achieve both exibility and eciency In our test implementation we used a general database structure to maximize exibility and optimized performance by de-normalizing datastructures and carefully designing queries 14. In a full featured version changes in the database mapping should be allowed so that the performance could be tuned for application needs [1]. Still there is some compromise between performance and exibility, but it is possible to gain a exible solution with moderate performance. Metadata mapping can be used to allow performance tuning. Since we did not develop a full implementation it is hard to estimate the real overhead produced by a persistence layer, but based on our implementation tests it seems that the performance penalties are moderate. Q7 Is it possible to use a persistence layer in our embedded systems? It seems that a persistence layer could be implemented in a way that performance penalties introduced by it are reasonable. This makes it well possible to use one on our embedded systems. Still it might turn out that in applications highly 51
data oriented performance penalty might be a bit problematic, but that is hard to estimate before hand. Q8 What kind of process changes does inauguration of a persistence layer require? In our solution the data modelling eort is trying to be reduced by allowing the persistence layer to generate a data model from a class model. Even when this simplies application design, it requires changes in the development process. The data model is no more developed early in the project. This aects the ways people are used to work, which makes adoption of the persistence layer more complicated. Also the fact that due to iterative development data models are not static, but instead are changing during the development process, may be hard to integrate to the existing processes. As stated in [2] this kind of changes to the processes require extra eort when adopting new technologies. Generally speaking people are often not willing to change the way they work. Especially if somebody tells them to do so. It is a pitty that we had no possiblity to fully implement and test our design in terms of these thesis. Only by fully implementing and testing in practical projects it would be possible to say wheter our design really reaches the goals we have set to it. In long term usage, also some statistic could be collected over the benets of the persistence layer, but it would still be extremly hard to say how much eort it really saves and how much does it ease maintenance programming.
18
Alternative Solutions
The solutions presented here are not the one and only silver bullet to the object mapping. Here we describe some of the other possibilities.
18.1
Alternatives To A Persistence Layer
Persistence layer is not the only solution to store object to database. In [2] dierent choices for database encapsulation are listed together with their advantages and disadvantages. These are briey summarised in table 18.1. As can be seen in table 18.1 persistence layer is not the only solution to store objects to a relational database, but provide a reusable solution and good encapsulation of the underlying database.
18.2
Object Oriented Data Bases
Object oriented and object relational databases are designed to store objects. This makes integration of them a lot easier than relational databases. In the ideal world no objects were stored to relational databases. Instead object oriented data bases were used. But in practise relational databases do exist in organisations, object oriented databases are relatively new technology not as widely accepted as relational 52
Solution Brute Force
Description Add SQL queries directly to business objects
Advantages Simple to implement
Disadvantages Highly inexible, Application developers must be familiar with SQL Database and objects schemas are still coupled
Data access objects
The database code is separated to objects of its own.
Business classes are not coupled with the database. Reuse of business objects might be possible. Application programmers do not have to know anything about the persistence mechanism. A high degree of reuse may also provide other services promoting very rapid application development. Platform independent application. Easy data exchange. Web services becoming widely adopted.
Persistence Framework
Separate layer mapping objects to a database schema.
Possible performance penalty, Data designs must be clean
Services
Data is accessed through a service framework.
Standard and tools still evolving. Performance is quite limited.
Table 5: Alternatives to Persistence Layer [2] databases. Often there is legacy data and applications that are highly coupled with relational databases and expect a data access trough them. In our case only relational solutions for embedded databases exist.
53
18.3
Alternatives to Persistence Interface to The Application
Our implementation of an application programming interface can surely be simplied and made more exible. These things are still very much subjective questions with varying opionions. Interfaces that some programmers consider practicle may seem dirty tricks to others. Few examples of dierent interfaces to persistence solutions can be found in [26] and [28].
19
Future Developments
In this section we discuss some questions that we did not not cover in terms of these thesis, but we think that would be intresting questions for further examination of the persistence solutions.
19.1
Management Utility
An application to manage mapping metadata as described in [1] to provide a graphical representation of the datamodel and the properties of it. In an application like this the dierent tunable attributes described in section 14 can be customized. The visual representation of database schemes are discussed in [23]. It might also be possible to develop schema conversion tools and manage datamodel versions wit ha visual tool, but these questions are out of our scope.
19.2
Security
Many large applications include some kind of access control to limit users access to dierent data entities [3]. These access control systems often involve grouping both users and objects and dening access control lists or capabilities between these. Instead of implementing these separately for each application, these services could be provided by the persistence layer [6, 1]. This would ease the development and increase security by sharing the already tested security related code among projects.
19.3
Database Schema Versioning
Applications tend to change in course of time, simply fallowed by the fact that user need and thus requirements tend to change. Together with the application, also database schemas may change when new classes or attributes are added. The update of an application is relatively simple when compared to updating of database schemas. A schema update must deal with existing data. When using simple embedding of SQL to the application every object must be aware of previous versions and handle versioning internally. This spreads version conversion code to every business class or data object class rendering the code to an error prone maintenance nightmare.
54
A better solution is to integrate schema conversions to the persistence layer [2]. Still there always is a need for application specic converters, but using persistence layer as a tool this conversion between versions can be done at a much higher level. Reading objects in from database using a converter mapping from an old schema to the new class structure requires some policy, like default values, to ll in missing information. After this new classes can be stored directly making the conversion a one time event. This would give more exibility since changes in database schemas are not limited by the old schemas thus giving agile developer better tools to react on the changing requirements of the customer. All this requires the possibility to use multiple simultaneous connections in the persistence layer.
19.4
Fine grained Versioning of Data
In addition to database schemas, namely the structure of data, also data itself is a subject to change and applications are designed to do operations on data. In many interactive applications some kind of versioning on changes made to data is needed between fetching and restoring from database to provide user undo/redo functionality. This can be done in a couple of ways. One is to use the command design pattern to capture operations made on business objects [10]. When the user selects undo, the application generates a reverse operation for the last operation done and executes it. Another way is to capture the objects state using memento design pattern [10]. When the user clicks undo the objects are returned to the previous state. Both methods require some extra memory to store previous attribute values, but in the limits of the available memory the user can be provided almost unlimited undo capabilities.
19.5
Storing Temporary State Between Sessions
In addition to the issues discussed with ne grained data versioning, in some special cases dierent states between operations must also be stored to a persistent storage. For example in browser based user interface application in embedded systems, the user operations consists of separate programs runs and the state of the application must be stored somehow. This is common for traditional CGI programming methods where every action done by the user generates a separate program run [34]. This generates a need for a temporary persistency for session data. This can be done in many dierent ways, but it often makes the application programming extremely complicated and the application itself hard to maintain. This kind of service could be provided by the persistence layer, which could store objects to a temporary session table during the session and to the real database only when requested. This kind of service is provided for example java servlets and other advanced Internet programming techniques, but limited resources of embedded systems usually prevent the usage of these technologies in our case. Also Enterprise Java Beans provide this functionality in terms of session beans.
55
20
Summary
Relational databases and object oriented technology are both technologies that are good at their own elds. Relational databases are well studied, they have solid mathematical background and they are already in use in organisations world wide. They provide ready made solutions to referential integrity, concurrency control and error recovery to mention a few. The object oriented technology in contrast is rised from software development experience. It is a practical concept to ease the modelling of realworld problems. Object technologies are not only a programming technique. It eects every aspect of software development from implementation to process models. Wide usage of relational databases and advances of object oriented technology often makes it necessary to combine these two technologies. Object oriented programs have a need to store data and often existing applications need to access the data stored in a relational database. But the fundamendal dierences in these two technolgies makes it hard to integrate these two technologies. The objects model both functionality and data and have clear responsibilities assigned to them. In relational databases the main focus is on eectively storing data. Relationships between objects model dierent concepts from those between relations in a database. Objects can inherit properties from other ones, an issue with no counterpart in relational databases. In addition to technological dierences there are also dierences in the processes how these technologies are usually applied. When designing an application using relational databases the database schema is often designed early in the process and then kept static. Object oriented applications are often developed in an iterative process where features are added little by little and thus data models keep on evolving. This adds its own process related political issues. In this work we explored the possibilities to create a persistence layer to abstract the relational database in our embedded system. Dierent designs found in literature were examined and a design of our own was developed based on these. Also some implementation tests were done to test the design in practice. The main focus in our work was on exibility of data model as well as to keep the persistence layer ecient enough to be used on an embedded system. In terms of this thesis it was not possible to implement a full featured persistence layer, but we investigated dierent issues in design and implementation of one as foundation to the possible future development of a persistence solution. The goals of our project and requirements set to the persistence system are discussed in further detail in Section 9 and the design we end up with is described in Section 10. The major result of our work was that it is a demanding task to design and implement a full featured persistence system. There a lot of special cases to take account and eciency issues to resolve. It was estimated that if a persistence layer is used in several application and it eectively reduces the coupling between the database and the application it is worth the eort spent. It pays itself back in terms of reduced modeling eort, more exible implementation and easier maintenance. It was also discovered that a persistence layer can help to access legacy data and exchange information with third party applications.
56
In Section 19 a few hints are given how a persistence layer could be extented to provide ready made solution to many common problems in application development. By this way extending a persistence layer to be an eective application framework for embedded applications it would really be a Swiss army knife for an application developer.
57
References
[1] Scott Ambler. The design of a robust persistence layer for relational databases, www.ambysoft.com. Software Development magazine, 1998. http://www.ambysoft.com/persistenceLayer.pdf,12.12.2003. [2] Scott W. Ambler. Agile Database Techniques. Number ISBN: 0471202835. John Wiley & Sons, 2004. [3] Ross Anderson. Security Engineering: A Guide to Building Dependable Distributed Systems. Number 0471389226. John Wiley & Sons, 2001. [4] Kent Beck, John Brant, Martin Fowler, William Obdyke, and Don Roberts. Refactoring. Number ISBN: 0201485672. Addison-Wesley, 1999. [5] G. Booch. Object-oriented development. 12(2):211221, 1986. IEEE Trans. Softw. Eng.,
[6] Whitenack Brown. Crossing chasms - pattern language for object rdbms integration, 1995. http://members.aol.com/kgb1001001/Chasms.htm, 12.12. 2003. [7] E. F. Codd. A relational model of data for large shared data banks. Commun. ACM, 13(6):377387, 1970. [8] C. J. Date and Hugh Darwen. A guide to the SQL standard (4th ed.): a users guide to the standard database language SQL. Number ISBN: 0-20196426-0. Addison-Wesley Longman Publishing Co., Inc., 1997. [9] Michel Dubois and Fay` A. Briggs. Eects of cache coherency in mule tiprocessors. In Proceedings of the 9th annual symposium on Computer Architecture, pages 299308. IEEE Computer Society Press, 1982. [10] E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design Patterns. Number ISBN: 0201633612. Addison-Wesley, 1995. [11] Ari Hovi. SQL-ohjelmointi Pro Training. Number ISBN 951-762-756-4. Talentum, 2000. [12] Xiaoping Jia. Object-Oriented Software Development Using Java. Number ISBN 020135084X. Addison-Wesley, 1999. [13] Alfons Kemper and Donald Kossmann. Adaptable pointer swizzling strategies in object bases: Design, realization, and quantitative analysis. VLDB J., 4(3):519566, 1995. [14] Philippe Kruchten. Architectural blueprints - the 4+1 view model of software architecture. IEEE Computer Society, 12(6):4250, 1995. [15] Craig Larman. Applying UML and patterns. Number ISBN: 0130479500. Prentice Hall, 2002. [16] Harry R. Lewis, Christos H. Papadimitriou, and Christos. Elements of the Theory of Computation, 2/e. Number ISBN: 0132624788. Prentice Hall, 1997. 58
[17] John Lombardo. Embedded Linux. Number ISBN: 073570998X. 2001. [18] Marcus Eduardo Markiewicz and Carlos J. P. de Lucena. Object oriented framework development. Crossroads, 7(4):39, 2001. [19] Scott Mayers. Eective C++, Second Edition. Number ISBN: 0201924889. Addison-Wesley, 1998. [20] The Institute of Electrical and Electronics Engineers. IEEE 1471 Recommended Practice for Architectural Description for Software-Intensive Systems. Number ISBN: 0-7381-2519-9. The Institute of Electrical And Electronics Engineers, 2000. [21] Jack A. Orenstein. Supporting retrievals and updates in an object/relational mapping system. IEEE Data Eng. Bull., 22(1):5054, 1999. [22] Platon. Teokset III, Pidot. suom. Marianna Tyni, Helsinki 1979. [23] Markku Rontu. Visaul queries for a student information systems, master thesis, 2004. [24] Stephen R. Schach. Object-Oriented and Classical Software Engineering. Number ISBN 007112263X. McGraw, 2001. [25] WWW site. Audio riders oy. URL, http://www.audioriders./, 2.5.2004. [26] WWW site. C++ boost. URL, http://www.boost.org/, 2.5.2004. [27] WWW site. Linux online. URL, http://www.linux.org/, 2.5.2004. [28] WWW site. s11n.net. URL, http://s11n.net/,2.5.2004. [29] WWW site. Xml query. URL, http://www.w3.org/XML/Query, 4.5. 2004. [30] WWW site. Etrax 100lx data sheet, 2001. http://developer.axis.com/products/etrax100lx/, 5.4. 2004. URL
[31] Bjarne Stroustrup. C++ Programming Language, Special Edition. Number ISBN: 0201700735. Addison-Wesley, 2000. [32] Andrew S. Tanenbaum. Modern Operating Systems, 2/e. Number ISBN 0130926418. Prentice Hall, 2001. [33] Jerey D. Ullman and Jennifer Widom. First Course in Database systems. Number ISBN: 0138613370. Prentice Hall, 1997. [34] William E. Weinman. CGI Book, The. Number ISBN: 1562055712. New Riders Publishing, 1996.
59
Glossary
Instance of a class in object oriented development Ability of data to be stored over a period of time. Object oriented programming language with avanced capabilities[31]. Relational DataBase Management System, Software storing structured data to relations Programming paradigm where data and functionality related to data are grouped together into objects that represent ones in the real world.
Object Persistence C++ RDBMS Object Oriented (OO) Development
60

Integrating Object Persistence To Relational Databases: Sampo Nurmentaus

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Integrating Object Persistence To Relational Databases: Sampo Nurmentaus

Hochgeladen von

Copyright:

Verfügbare Formate

HELSINKI UNIVERSITY OF TECHNOLOGY Department of Computer Science and Engineering Laboratory of Information Processing Science

Integrating Object Persistence to Relational Databases

Espoo, May 01, 2004

Professor Eljas Soisalon-Soininen Professor Eljas Soisalon-Soininen

HELSINKI UNIVERSITY OF TECHNOLOGY

ABSTRACT OF THE MASTERS THESIS

Name of the thesis: Date: Department:

Prof. Eljas Soisalon-Soininen Prof. Eljas Soisalon-Soininen

On a sunny spring day Nurmijrvi, May 19, 2004 a

6 Our Embedded System 6.1 6.2 Proxy Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Abstraction of Cursors . . . . . . . . . . . . . . . . . . . . . . . . Multi-object Actions . . . . . . . . . . . . . . . . . . . . . . . . .

specically: a good object oriented design

The Structure of This Document

Relational Data Bases

Operations on Relational Databases

Approach Overly mistic Opti-

Table 1: Dierent approaches for locking at the application level

Object Oriented Development

Relationships Between Classes

Full Encapsulation of Persistence Mechanisms

Object Oriented Frameworks

The Object-Relational Inpedance Mismatch

Design Time of Relations

Representing Objects as Tables

Figure 1: One-to-one mapping of objects to tables

Representing Collections in a Relational Database

Representing Object Relationships

Figure 2: Object-Relational Mapping from a Complex Relationship

Representing Inheritance Hierarchies

Figure 3: Example class hierarcy

Figure 6: Mapping each concrete class of the hierarcy to a table

Map inheritance hierarchies to a generic structure

Figure 7: Mapping classes to a generic structure

Comparison of Dierent Mapping Strategies

Mapping Query Results to Objects

Our Embedded System

Exceptions for Error Handling

Issues with Legacy Data and Applications

Several Persistence Mechanism

Requirements for a Persistence Layer

Minimize modelling eort

Store single objects

Manage objects relationships

Store inheritance hierarchies Object identity

Represent query results Several persistence mechanisms

Full encapsulation of persistence mechanism Support transactions

Table 3: Common Problems With Persistence Layers

Questions we are looking answers for

Goals for Our Implementation

Figure 10: The Architecture of The Persistence Layer

Figure 11: Abstraction of Persistence Mechanism

Representing Queries as Objects

Figure 13: Class Structure Representing a Query

Figure 14: Class Structure Representing a Where Clause

Figure 15: Organisation of the software components of the persistence layer

Usaging Persistence Layer from an Application

c l a s s Music : p u b l i c P e r s i s t e n c e : : P e r s i s t e n t O b j e c t<Music > { public :

Music ( ) : P e r s i s t e n t O b j e c t<Music > ( ) { } ;

Figure 18: Code to Dene a Class of Persistent Objects

i n t main ( ) { PersistenceFacade facade = PersistenceFacade : : GetInstance ( ) ;

f a c e d e>c o n n e c t<PostgreSQLConnection >( username , password , 1 2 7 . 0 . 0 . 1 ) ;

f a c a d e>S t o r e O b j e c t ( music ) ; d e l e t e music ;