Sie sind auf Seite 1von 49


why, when, how?

Bertil Chapuis

Creative Commons Attribution 2.5

Switzerland License

This paper compares java content repositories (JCR) and relational database management systems (RDBMS).
The choice between these technologies is often made arbitrarily. The aim is to clarify why this choice should be
discussed, when one technology should be selected instead of an other and how the selected technology should
be used. Four levels (Data model, Specification, Project, Product) are analyzed to show the impact of this choice
on different scopes. Follow a discussion on the best choice depending on the context. This defines the
foundations of a decision framework.
2 Table of content

Table of Contents

1 Introduction .................................................. 3 4.11 Synthesis .............................................. 30

1.1 What is compared? ................................ 3 5 Development process comparison .......... 31
1.2 Why is it comparable? ........................... 3 5.1 Data Understandability ......................... 31
1.3 What is the purpose of this comparison?3 5.2 Coding Efficiency ................................. 32
1.4 How will it be compared? ....................... 4 5.3 Application Changeability ..................... 33
2 State of the arts ............................................ 5 5.4 Synthesis .............................................. 34
2.1 Roles ...................................................... 5 6 Product comparison .................................. 35
2.2 Domains of responsibility ....................... 5 6.1 Theoretical analysis ............................. 35
2.3 Data Models ........................................... 6 6.2 Benchmark ........................................... 36
3 Data model comparison .............................. 8 6.3 Synthesis .............................................. 38
3.1 Model Definitions ................................... 8 7 Scenario Analysis ...................................... 39
3.2 Structure ................................................ 9 7.1 Survey .................................................. 40
3.3 Integrity ................................................ 12 7.2 Reservation .......................................... 40
3.4 Operations and queries ....................... 14 7.3 Content management........................... 41
3.5 Navigation ............................................ 16 7.4 Workflow............................................... 41
3.6 Synthesis ............................................. 17 8 Conclusion .................................................. 42
4 Specification comparison ......................... 18 9 Appendix – JCR and design...................... 43
4.1 Use Case Definition ............................. 18 9.1 Model.................................................... 43
4.2 Structure .............................................. 18 9.2 Convention ........................................... 43
4.3 Integrity ................................................ 20 9.3 Methodology ......................................... 44
4.4 Operations and queries ....................... 22 9.4 Application ............................................ 45
4.5 Navigation ............................................ 24 10 Appendix – Going further .......................... 47
4.6 Transactions ........................................ 25 10.1 Queries in semi-structured models ...... 47
4.7 Inheritance ........................................... 26 10.2 Queries on transitive relationships ....... 47
4.8 Access Control ..................................... 27 10.3 Modular and configurable databases ... 47
4.9 Events .................................................. 28 11 Bibliography ............................................... 49
4.10 Version control ..................................... 29
University of Lausanne & Day Software AG 3

1 Introduction

Day Software AG (Day) led the development of a 1.2 Why is it comparable?

JAVA specification which defines a uniform
application programming interface (API) to manage Each data model supports a philosophy, to structure
content. This specification is called content repository and access data. On the one hand, the success of
API for java (JCR) and is part of the java community the relational model comes in large part from the
process. Implementations of this specification are facilities which are offered to describe clear data
actually provided by well known companies such as structures. On the other hand, the success of the
Oracle, Day or Alfresco. JCR specification relates essentially to the facilities
which are offered to express flexible data structures.
JCR implementations are often used to build high
level content management systems and collaborative These aspects show us that the discussion takes
applications. Day also provides an open source place at the same level. Thus, it makes sense to
implementation of the specification which is called compare them, and to clarify their respective
Jackrabbit and which is used as a shell for some of possibilities and limits. It also makes sense to give a
its products. clear picture of their respective philosophies which
are promoted and used by each of the models.
This diploma thesis takes place in this context. Day
wants to clarify some points which relate to the data 1.3 What is the purpose of this
model promoted by their specification. The basic idea
is to compare their approach to managing content comparison?
with the approach promoted by competitors at
By making this comparison, Day wants to more
different levels. The following sections will clarify the
precisely position the data model, the specification
approach adopted to do this and give an overview of
and the products which relate to JCR. Doing this
the content developed in this report.
should help people to understand better the main
offers available on the market and show when it
1.1 What is compared?
make sense to use them.
As explained, the purpose is to locate JCR in the
More precisely, with an external perspective, the goal
database world. This work will be done by comparing
is to define and give clear advice, which can help
the relational model and the model promoted by
people to choose the approach which will best fit in
JCR. The relational model defined by Codd in the
with their needs. Some people are asking if their
70’s is actually the most widely used data model. The
applications should be implemented with a relational
unstructured or semi-structured model subtended by
database or a java content repository. Thus,
the JCR specification encounter a growing success
clarifying the philosophies promoted by each model
in the content management area. These two models
could help in making good decisions and
will be described and analyzed in this report.
understanding the impact of a choice made at a data
model level.
4 Introduction

With an internal perspective, some questions relate The chapter “specification comparison” will be the
on how a java content repository should be second level of comparison. This chapter will leave
implemented. Some companies are doing that over the theoretical point of view for a more practical
relational databases and some others are providing perspective. The SQL standard and the JCR
native implementations of the model. Should JCR be specification will be compared more precisely in this
seen as a data model or as an abstraction layer over chapter. This will allow us to show practically in
an existing data model. Answering this kind of which context the concepts described in the ―data
question can have a strong effect on the future model comparison‖ make sense. Some differences
implementation of the products and also on the best which relate more to the specification definition will
way promote them. also be pointed out.

1.4 How will it be compared? The chapter ―Project process comparison‖ will be the
highest level undertaken in this report. On the basis
First of all, the chapter “State of the art” will try to of the previous chapters, a discussion will take place
give a snapshot of the main data models which have on different aspects and notable advantages which
been described and used during the four last can significantly influence the development process
decades. This will be done with the purpose of will be looked into. This discussion will try to clarify
identifying the main influences which have lead to the parameters as the efficiency reached with each
current market environment. The goal is also to approach.
understand why some data models have
encountered success and why others have not. The chapter ―product comparison‖ will discuss the
impact of data models on the products. The
Then the comparison between the relational performance question constantly occurs at a product
approach and the JCR approach will start. Because level. This chapter will try to address this question
the two approaches show big differences on four with a theoretical cost analysis and a practical
different levels, these are the ones we will examine benchmark.
and compare, thus avoiding unnecessary discussion
regarding incomparable aspects. The ―Scenario analysis‖ chapter can be seen as a
synthesis of the main aspects pointed during the
The chapter “data model comparison” will be the first whole comparison process. Four test cases
level of comparison. In this chapter, the two models characterized by different features will be analyzed in
will be formally defined, respectively; the relational regard of the significant aspects presented in this
model and the model used by JCR. This should help report. The purpose is to set the foundations of a
the reader to understand the theoretical concepts framework which helps in choosing the best
hidden by each model. The purpose of this chapter is approach by doing quick requirement analysis.
also to show the impact of these theoretical aspects
on real world problems and help people to Appendices are also included in this document. They
understand more clearly why they should use one contain aspects which are not directly linked to the
approach instead the other to solve their problem. comparison but which are interesting for the person
who would like to study the subject further.
University of Lausanne & Day Software AG 5

2 State of the arts

The necessity of splitting information from made when relational databases or similar
applications became clear in the 60’s when many approaches are used to manage data.
applications had to access the same set of
information. This segregation has given birth to new
concepts and new roles which relate to the activity of
managing information. This chapter will clarify the
main roles and the main domains of responsibility
linked to information management. Some of the
main approaches which are used to handle
information will also be presented. Basically, the
idea is to build a common language for the following

2.1 Roles
People are generally involved in information systems
and data management. Three main roles can almost
always be distinguished when data models and Figure 2.2-1: Classical responsibility repartition
databases are mentioned:
The WordNet semantic lexicon gives the following
The database administrator (DBA) who definitions to the concepts identified as domain of
maintains the database in a usable state. responsibilities in Figure 2.2-1:
The application programmer who writes
applications which may access databases. Content: everything that is included in a
The user who uses applications to access, collection and that is held or included in
edit, and write data in the database. something
Structure: the manner of construction of
Each role generally relates to certain responsibilities. something and the arrangement of its parts
Several domains of responsibilities come from Integrity: an undivided or unbroken
disciplines such as the design, the development or completeness
the security. Domain examples could be the Coherence: logical and orderly and
structure, the integrity, the availability or the consistent relation of parts
confidentiality of data. Choosing a data model
impacts these different roles by attributing them Content and structure are relatively clear concepts.
more or less responsibility. However, in the context of this report, it makes
sense to be precise as to which meaning is given to
2.2 Domains of responsibility the integrity and to the coherence. Integrity here
relates to the ―state of completeness‖ of data which
The Figure 2.2-1 shows four main domains of always has to be ensured in the database. This state
responsibility which will be mentioned regularly in is preserved with integrity rules at a database level.
this report. This role/responsibility diagram tries to Coherence relates to the logical organization of data
translate the classical repartition which is generally and quality thereof. Coherence can be ensured with
6 State of the arts

constraints at a database level but also zero of the hierarchy which has no parental
programmatically at an application level. For several relationship. The second type characterizes all the
reasons incoherence can be tolerated during a time records which are located under the root record.
in the database. This is not the case for integrity. They are dependants in the sense that their lifetime
will never be longer than the lifetime of their parent.
Choosing a data model has an impact on the In the hierarchical model, each record can generally
responsibility repartition in different ways. This report store an arbitrary number of fields which allow for
will try to detail this impact and show the storing data.
consequences of these kinds of choices on the
different roles. While some real problems have a tree like structure,
the assumption made that only this kind of attribute
2.3 Data Models relationship governs the world is too strong. During
its history, the hierarchical model has probably
A data model should be seen as a way to logically suffered from this. Some people have probably
organize, link and access content. Since the 1960’s, abandoned it for models which seem to fit better with
some data models have appeared and disappeared the reality.
for several reasons; this section will give us a brief
overview of the history of the main data models. It The main implementation of the hierarchical model
will also give an overview of their respective reasons was in the 60’s by IBM. This database is called IMS
for success. which stands for Information Management System.
Today, IMS is still used in the industry for very large
Hierarchical Model scale applications. IBM sold it as a solution for
In a hierarchical model (1), data is organized in tree critical online applications. In fact, IBM continues to
structures. Each record has one and only one parent invest in this product and to develop new releases.
and can have zero or more children. A pure
hierarchical model allows only this kind of attribute Most directory services are using concepts inherited
relationship. If an entry makes an appearance in from the hierarchical model. Moreover,
several parts of the tree, this latter is simply reminiscences of the model are also visible on every
replicated. A directed graph without cycles as system. Everybody use hierarchical concepts to
depicted in Figure 2.3-1 gives probably the best organize files and folders. So every computer user is
representation of how entries are organized in this more or less familiar with the hierarchical
model. organization of information.

Furthermore, during the last decade, the hierarchical

model has found a new popularity with the
increasing use of micro format as XML or YAML. In
a web browser, the Document Object Model (DOM)
also uses a hierarchy of objects to organize the
elements of a web page. Thus, this model is not in
jeopardy of disappearing. It will probably continue to
encounter further success in the future as well.

Network Model
Instead of limiting the organization of data around a
tree structure, the network model allows to link
Figure 2.3-1: Tree graph entries between themselves in any direction. A
directed graph, as shown in Figure 2.3-2, is probably
In general two types of entries are distinguished, the
the best representation which could be given to
root record type and the dependent record type.
show how data is structured in a network model. The
The first type characterizes a record from the level
University of Lausanne & Day Software AG 7

other properties of this model are shared with the more often called tables, columns, rows and fields.
hierarchical model. Thus we can say that the Interestingly, today this model is so widely taught
hierarchical model is a subset of the network model. and used that the question of its pertinence to solve
specific cases is rarely questioned.

Figure 2.3-2: Network graph

Initially developed during the 70’s to bypass the lack

of flexibility of the hierarchical model, the network
model has encountered a lot of success during this
decade. This model has found hundreds
applications in different fields of computer science
Figure 2.3-3: Relation, domain, tuple and attribute
such as the management of in memory objects or
bioinformatics applications. However, it seems that Some people link the success of the relational model
actually not a lot of people are using it to organize to its mathematical foundation. However the
their data. However it still has notoriety in embedded implementations actually used are a far cry, from the
applications, whilst large scale applications built on it beautiful concepts defined at the beginning. The
are slowly disappearing. main building blocks are now hidden by features
which are provided to address practical
Relational Model
Before its definition by Codd during the 70’s, the
relational model (2) had not encountered a lot of Thus, the success of this data model should be
success. However, after this formal work based on linked to the practical answers which have been
the set theory and the first order logic, some given to solve problems encountered in the business
companies chose to make implementations of this world during the 80’s and the 90’s (3). The
model. IBM was one of the first companies which normalization principle was used to earn storage
took the lead in the market with the DB2 database. capacity. Furthermore, during this stage, information
Oracle is now the uncontested leader with its systems had been widely used for automation and
implementations of the relational model. monitoring tasks. The relational model has offered a
very good canvas to express and solve problems
The relational model defines the concepts of
such as these.
relations, domain, tuples and attributes which are
8 Data model comparison

3 Data model comparison

This chapter will define more clearly the JCR model

and the relational model. Several aspects which
relates to the model’s foundations will be presented
and compared. The main purpose of this section is to
understand the philosophy or basis of each model.

The ―Model definition‖ section briefly presents the

main ideas subtended by the models. The ―Structure‖
and ―Integrity‖ sections will mainly discuss the
aspects which relates to the place respectively of the
content, the structure and the semantic in both data
models. The ―Operations and queries‖ and
―Navigation‖ sections will show different ways used to
retrieve and edit content. Throughout the whole
chapter, an important place will be given to the
impacts of the choice made in terms of the data
model and the reasons which should drive this
Figure 3.1-1: JCR graph
Currently, some explanation of the schema which
3.1 Model Definitions relates to the data model definition can be founded in
the specification (4) (5). The Figure 3.1-2, based on
Some works and references give definitions to the this information, attempts to express more formally
different data models actually used (4) (5). Some the JCR data model. It’s interesting to note that at
tools are also available to understand the main this stage, no differentiation between the content and
concepts of these models. The purpose of this the structure can be made. In fact the structure
section is not to enrich these definitions but they are appears with the instantiation of items.
included simply to draw attention to some theoretical
aspects required in order to build a common
language for the comparison.

JCR Model
To organize records, this model includes concepts
inherited from the hierarchical and from the network
model. Thus, as shown in the Figure 3.1-1, records
stored with the JCR data model are primarily
organized in a tree structure. However, the limitations
of the hierarchical model are avoided by giving the
ability to link each record horizontally. Attributes
which point on other nodes can be stored at each
Figure 3.1-2: JCR class diagram
level to create network relationships. This type of
model permits the creation of a network in a sort of
tree structure.
University of Lausanne & Day Software AG 9

Relational Model
The relational model which was quickly introduced in
the ―state of the art‖ chapter is based on the set
theory. A relation as defined by Codd (2) made
reference to the mathematical concept of relation. In
his paper, he gives the following definition to a

R is a subset of the Catesian product S1 x S2 x … x Sn

Practically, because all these sets have to be

distinguished from the others they are identified as
domains. Thus, assuming the domains of first-names
F, of last-names L and of ages A, a Person relation is
a set of tuples (f, l, a) where f Є F, l Є L and a Є A.
The Figure 3.1-3represents a table view of this
relation. In this representation, each domain Figure 3.1-4: Relational class diagram
corresponds to a column and each tuple to a row.
3.2 Structure
A rich debate around the respective places of data
and structure in data models has been ongoing for
several years both on the web (6) and in academic
fields (3). This debate could be summarized as
following: Should data be driven by the structure or
should the structure be driven by data?
Figure 3.1-3: Relation, domain, tuple and attribute These discussions come from the fact that some
This basic definition does not mention the ability to concepts do not really fit into a predefined canvas. A
create associations between relations. In fact there is predefined canvas can covers a lot of advantages
no link between the name of the model and and facilities. For example, it’s easier to express
associations. The ability to express associations integrity constraints on a well known structure.
comes later with the joint operations defined by Equally indexation or query optimization (7) can also
relational algebra. These operations will be benefit from the assumption that a clear structure can
introduced later in the next sections. always be found to a problem. However, in real life
situations, there is always an exception which does
The Figure 3.1-4 show a class diagram which could not conform to the canvas.
be used to express relations. While the pertinence of
this kind of diagram can be discussed the purpose is The following sections will situate two models which
to give a simple and visual base of a relation. apply to this context. Both approaches will be
Furthermore, parts derived from this diagram will be presented with the data and the structure shown
reused later to express the intersections between the respectively in each case. Clarification of when each
relational model and the JCR model. strategy could be considered logical or illogical will
also be identified.

JCR model
In Figure 3.1-2, a class diagram shows the main
aspects of the JCR data model. In this figure, the
instantiations of nodes, properties and values leads
to the creation of content. If we try to identify the
10 Data model comparison

structure’s place in this diagram, it appears that no structure made of relations and domains has to be
real differentiation is made between the content itself instantiated. Then, tuples which fit into this structure
and its structure. can be created. While the DBA can choose the level
of flexibility in the initial structure, it appears that this
Thus, the model proposed by JCR does not require kind of model differentiates between the data and its
the definition of a structure to instantiate content. schema.
Instances of nodes, properties and values can be
created before defining any kind of structure. In fact, Differentiating the structure from the data can reap
the structure appears with the content. some benefits. For example this would be
appropriate for a problem solving approach rather
A parallelism can be made between this approach than a data storage approach. This is evident as
and the semi structured approach described during many developers will create an entity relationship
the end of the 90’s (8). No separation was made model during the early phases of defining data
between data and their structure. This provides two requirement.
possible advantages, firstly a dynamic schema, to
store data which does not fit into a predefined canvas However in real life situations the assumption that
or secondly to be able to browse the content without content and structure can be completely separated is
knowing its structure. not always valid. For example to handle expansion
in the relational database some artificial artifacts or
Some modern programming languages such as miscellaneous fields are often created to allow for
Ruby or Python also give the ability to extend objects this expansion in the relational structure. These can
on the fly with properties and functions (reflection). take the form of fields added to create hierarchies or
While a part of the structure appears at runtime, it is fields added to define customized orders in a set of
possible to define a semantic which identifies the tuples. These conceptual entities can become
main concepts. In JCR this is done with node-types. difficult to describe within the confines of the
Basically, defining a semantic does not limit the structure. As the application evolves and new
capacity of a node to store an infinite combination of requirements are added the management of the
sub-nodes and properties. To proceed in this manner additions can become difficult and dangerous. A
allows for the creation or evolution of records when change could even imply a rethink of the whole
and as required. structure of the implementation.

For example if we want to define a semantic item for Content, structure and responsibility
media, there is no real need to take into account all As shown in the state of the art chapter, in classical
the possible properties which could appears during situations, the DBA is generally responsible for the
the application life cycle under this node. Each data structure. The application programmer can
special case of media items, such as images, videos, influence decisions made in this area but he does not
etc. can have specific attributes which are not have the final responsibility for the structure. Finally,
impacting the whole set of media instances and the user has clearly nothing to say, his scope is
which do not necessarily have to be specified at limited by the functionalities developed by the
conception. application programmer to create, remove and
update data.
Relational model
Figure 3.1-4 represents a basic class diagram As shown in Figure 3.2-1, choosing a content driven
describing succinctly the main ideas proposed by the approach instead a structure driven approach
relational model. We see in this diagram that the significantly impacts the respective roles of the DBA,
concept of record which is represented by the the application programmer and the user. In fact the
Element class is separated from the structure. DBA loses his responsibility of main structure owner.
If the structure is driven by data, this ownership is
Remark that the paradigm is completely different in
shared with the application programmer and the
the relational model to the one proposed by JCR. A
University of Lausanne & Day Software AG 11

complex problems driven by data instead by

structure? Not necessarily.

In the example of the house and of the city the

problem could be seen as following. For houses,
because budgets and resources available are
generally known in advance, the most effective way
to proceed is to define a structure before the
construction. For cities, because resources and
budgets available are generally not known in
advance and are evolving, the most effective way to
proceed is to let their structure emerge. If necessary,
guidelines can be defined to control their growth.

Since information system problems involve a wide

Figure 3.2-1: Responsibility repartition revisited and growing community of stakeholders and
providers cannot know what will be done with their
It is true that a clear separation between the content applications, these kind of questions should be
and the structure makes some aspects of data debated at the onset of the design:
management easier. Splitting clearly the structure
and the content makes it easier to define roles and to Are the users known or not?
separate the duties. The DBA has the ownership of Is the behavior of the users known or not?
the database and of all the structures which allow to Is the final usage of the application known or
instantiate records. In this context, the application not?
programmer becomes a kind of super user with Are entities fitting in a canvas or not?
extended rights but the user may only access what is
available in the application. The response to these questions is probably one of
the best indicators when deciding upon one of the
This kind of scenario does give a lot of responsibility two approaches.
to the DBA and places him at the centre of database
evolution. Unfortunately he is not necessarily tuned The JCR model advocates clearly for a structure
in to the real needs of the users. It would therefore driven by data. By creating content, items, nodes and
be advisable that the DBA be responsible more for properties, users are building the structure. Database
aspects of data integrity, the availability or the administrators and application programmers are just
recoverability of data and not for the structure or the guiding this structure by defining rules and
content. In general these should be left under joint constraints. In model implementations made with a
definition to the application programmer and the relational approach, a structure is first defined by the
user. database administrator and the application
programmer. Then the users can register content
Choosing the right approach items which fit to this structure.
In a real working environment, some problems
benefits from being driven by a structure whereas Depending on the case in use each data model could
others clearly do not fit into any predefined be useful. It rests basically through which perspective
structures. A simple analogy may help to explain this we wish to view the data a fixed structure or a more
complicated situation. For example houses are rarely flexible data driven model. The choice of model will
built from scratch without blueprints. However, if we be based on the certitude or incertitude of the
take the scope of cities, there are generally no responses to the few decisive questions as
blueprints which plan their final states. So which stipulated.
lessons can we learn from this simple example? Are
12 Data model comparison

3.3 Integrity be treated programmatically at an application level in

a way which alleviates the work load of the system.
A strong association between structure and data
integrity is often made. Thus some people are afraid JCR Model
of letting their users taking part in the definition of the An analogy can be made between the JCR model
structure. However, it’s more correct to say that data and a black list. The most generic node sustains any
integrity belongs to semantic. kind of children, any kind of properties and any kind
of values. A mechanism is provided through the
Generally, integrity definitions do not make any concept of node-type to let the DBA defining integrity
mention of the structure. A structure made of constraints.
relations and domains is evidently an elegant way to
express a semantic. It’s also a good basis in which to In the JCR model, node-types are used to express a
declare integrity constraints. Nonetheless integrity semantic. Declaring constraints on this semantic
constraints can be defined at a lower level, directly allows the declaration of restrictions on the nodes
over a semantic. Advantages could be for example and on their content. Each node has a primary node-
that all the structures which respect the semantic type and can have several mixin node-types which
constraints can be instantiated in the database and extend the primary node-type. Node-types allow for
not only the records which fit into the structure. specifying constraints on the children of a node, on
the properties of a node and on the values of the
Furthermore, as mentioned in the ―state of the art‖ properties stored by a node.
chapter, integrity definitions generally do not make
mention of coherency. In the database environment,
an amalgam is often made between these two
concepts. While data coherency can be preserved by
integrity constraints, the integrity of a dataset is not
necessary lost if incoherent records are present in
the database.

Unquestionably data integrity means that no

accidental or intentional destruction, alteration, or
loss of data should ever occur. While data integrity
should be ensured at all times during a database’s
lifecycle the assumption that data coherency should
have the same property is probably too strong.

Some people have the habit of treating directly in the Figure 3.3-1: JCR model and integrity
database both aspects, everything which relates to
Using several node-types permits the possibility of
data coherency along with integrity constraints. This
ensuring the integrity of transitive relations in a
ensures that the data coherency is preserved in all
hierarchy. For example, it is possible to define a
the cases. However, this also has a cost in term of
node-type which support only children with a specific
performances and checks which have to be
type. The later could also have node-types which
performed each time a write access is made on the
declare constraints for their children. Proceeding in
database. Therefore a tradeoff has to be made
this fashion would narrow down the usage within a
between data integrity and data coherency.
node, that the children of the children of a specific
A balanced approach which can result in a better node should have a certain type.
user experience consists in identifying, sometimes
When integrity is mentioned, we often speak about
arbitrarily, what relates to integrity and what relates
entity integrity, referential integrity and domain
to coherency. Data Integrity will be treated with
integrity. These concepts relate closely to the
constraints at a database level. Data coherency will
University of Lausanne & Day Software AG 13

relational model but as shown in Figure 3.3-1 we can

find similar ways to express constraints in the JCR

Entity integrity is ensured by the fact that basically

each node is unique and identified by its location in
the data model or by its UUID. Paths cannot really be
considered as unique identifiers because same paths
sibling are allowed for XML compatibility. Referential
integrity is ensured by the fact that all the references
properties of a node have to point on a referenceable
node. Furthermore, a referenceable node cannot be
deleted while it is referenced. Domain integrity can
be ensured by forcing nodes to have specific
properties which contain values in predefined ranges.

Data coherence can be checked with integrity

constraints but the model does not provide all the
tools to do a complete coherency check. This proves
that making a separation between the two areas is
beneficial. Integrity should be ensured at the data
model level and data coherency at the application
Figure 3.3-2: Relational model and integrity
Relational Model
An analogy between the Relational model and a A structure known in advance and from which the
white list is appropriate. As explained in the last evolution is controlled is an elegant base to ensure
section, the relational approach made the integrity. The syntaxes which permit the expression
assumption that structure and content have to be of integrity constraints are generally derived from first
separated. Thus saving content is allowed only if a order logic. The fact that the main building blocks of
structure has been defined. Some integrity the relational model are based on well known
constraints are implicit to the relational structure. The mathematical disciplines, respectively the set theory
domain constraints ensure, for example, that all the and first order logic, permits the expression of
values stored in a same domain have the same type. implementation models which share these
The entity integrity constraints give the guaranty that, mathematical properties.
due to the primary key, all records in a table are
unique. In term of data integrity, this provides advantages
because the solidity of the implementation model can
Furthermore, the structure is generally taken as a be mathematically proven. In its simplicity, this way
base on which to declare other integrity constraints. of proceeding also allows the opportunity with short
The referential integrity ensures that a foreign key statements to declare rules and constraints for nearly
domain is a subset of the pointed domain. In the everything. As a result, solid implementation models
same way some other integrity constraints which can be quickly declared with a high level of accuracy
make use of the operations proposed by the model and a minimum level of programming effort.
can be described.
However, as mentioned before, the assumption that
each problem can fit in predefined structure is often
too strong. Furthermore, while the relational model
has the ability to express hierarchies and network
14 Data model comparison

structures, the first order logic is limited when having Choosing the right approach
to declare them with constraints. In conclusion, it’s The argument that the relational model has
often difficult to know what should be managed at a mathematical properties (2) which will ensure rock
model level or at an application level. solid data integrity is often selected for the wrong
reasons. In fact these properties are only used for
Integrity, coherency and responsibility very specific applications and the integrity of an
In general, DBAs have the custom of declaring very implementation model as understood here is rarely
strong structures. Their implementation models are proven mathematically because it is not a
thought of as white lists which preserve data integrity requirement.
and data coherence. However, to build generalized
and flexible implementation models it is really only The choice of the best approach should be made
the data integrity level which should be constrained with regard to the responsibility given to the DBA and
at model level. to the application programmer. The following two
examples can illustrate this idea. On one hand, a
Furthermore the argument that data integrity and prison guardian must control all the movements of
data coherency should be the responsibility of the the people in the prison during the day. In this case,
DBA does not really reflect the reality or the ideal, as a rock solid program conceived as a white list is
all of the tests made at an application level to ensure ideal. The people may only do the things that they
that users do not inject into the data, testify to the are allowed to do. On the other hand, a tourist guide
veracity of this fact. has to ensure that the travelers have a good trip by
directing them and giving them the right information.
In this case, a program conceived as a black list will
probably give more satisfaction to the user.

Some functional cases do not benefit from being

governed by a lot of constraints. Unfortunately, the
relational model often leads DBAs and application
programmers to design restricting implementation
models. This gives them the feeling that their
applications is well thought out but often it only
frustrates the users.

The following questions should be honestly asked:

Do users have to be guarded or guided?

Does data coherency have to be preserved
Figure 3.3-3: Responsibility repartition revisited at a database level or at an application level?
Therefore the clarification of the repartitions of Therefore choosing the good data model is not only a
responsibility of such checks would be of an question of preferences but it should be based on a
enormous benefit to the overall functionality. This choice which is always related to the analysis of the
would help in defining reasons in choosing any given case in use.
model. Equally it identifies any shortcuts on aspects
of data integrity and helps to avoid these sort of 3.4 Operations and queries
pitfalls. Furthermore, dividing clearly the
responsibility of the integrity and of the coherence Query languages are close to fields as relational
could enhance the ability to design more intuitively algebra, first order logic or simply mathematics.
applications which take into account the cost of the Depending on the cases, queries can be expressed
checks made at a data model level. with declaratives calls or with procedural languages.
In general, queries are composed of several
University of Lausanne & Day Software AG 15

operations which make use of the structure or of the Basically, in the JCR model, queries are seen as a
data semantic. way to perform search requests. This provides a way
of retrieving records but this selection criterion does
Some operations can be used in queries. These not however allow them to be sequentially deleted or
operations such as the selection, the projection, the updated. This functionality is not dictated by
rename or others set operations are inherited from conceptual barriers, it could be modified as required.
the disciplines mentioned at the beginning of the
section. In addition to these operations, some query As mentioned before, the structure and the schema
languages provide statements which allow creating, are not separated in this model. Thus, some
modifying or deleting of data. This section shall attributes of the records at their depth level or their
clarify the bounds of each model in term of queries hierarchical path can be viewed as properties. This
and operations. opens up the ability to easily perform queries on
things which are generally not taken into account in
JCR Model other models as transitive relationships in
An abstract query model is used as a basis to hierarchies.
retrieve data in the JCR Model (4) (5). This query
model makes a kind of mapping between the JCR Relational Model
model and the notions of relations, domains, tuples The relational algebra defines the primitive
and attribute present in the relational model. The operations available in the relational model (9).
Figure 3.4-1 is a modified version of the Figure 3.1-4 These operations are mainly the selection, the
which visually shows this mapping. projection, the rename, the Cartesian product, the
union and the difference. The power of this query
model states in fact that the input and the output of
these operations are always relations. Thus, it’s
possible to express complex statements and

In addition to these operations, some mathematical

operators can be used. It’s also possible to specify
additional domains for the output relation. Some
Figure 3.4-1: JCR model, operations and queries
domain operations are also provided to retrieve
It seems that, in the actual state, node-tuples are information for example the number of attributes
seen as relation, property as domain, nodes as tuple stored in a domain or the domain’s maximal value.
and values as attributes. Basically node-tuples are
arbitrary sets of nodes. However, node-types are The query languages which are provided by
used as the main source of node-tuples in queries. relational database implementations generally
While this kind of mapping could not be considered propose statements which allow modifying, creating
as an application of the principles of the set theory, it or deleting data (10). Used in conjunction with the
allows the running of some interesting queries which previously presented operations, these statements
can satisfy nearly all requirements. become very useful. They provide a means of
performing sequential changes on data sets which
The operations provided by this query model are the reply to precise conditions.
selection and the ensemble of set operations which
permit the performing of joins between node-tuples The possibilities given by the usage of these
sets. The result of a query is composed of all the operations are huge. However limitations are
nodes which satisfy the selection condition and the encountered when transitive relationships appear
join condition. (11). This sort of query cannot be expressed with first
order logic statements. For example, if it is not
possible to define a query which retrieves all of the
16 Data model comparison

descendants of an element some other solutions are 5. He can start from the owner of a set and
available (12). They do however often add sequentially access all the member records.
complexity to the implementation models. (This is equivalent to converting a primary
data key into a secondary data key.)
Choosing the right approach 6. He can start with any member record of a set
While JCR provide a means of carrying out some and access either the next or prior member
operations and queries, the relational model is clearly of that set.
more complete in this area. In some situations, this 7. He can start from any member of a set and
completeness can become a decision criterion if the access the owner of the set, thus converting
case in use implies that complex join operation may a secondary data key into a primary data
be required. key.

The features proposed by most of the relational These rules give the programmer the ability to cross
databases which allow the use of operations in datasets by following the references which are
conjunction with update and delete statements is structuring the records. The interesting point on this
also a significant advantage proposed by this approach is that the programmer can adopt access
relational model. For the use case which involves a strategies without knowing the whole structure of the
lot of write access, this possibility allows for quick database. As a navigator, he explores the database.
creation, update and deletion of content. However,
caution should be taken with this type of usage when
complex hierarchies are present.

3.5 Navigation
During the 70’s, Charles W. Bachman described
different ways of accessing records in databases
(13). By focusing on the programmer’s role, he
describes his opportunities to access data as the
following: Figure 3.5-1: Navigation path

1. He can start at the beginning of the Rules, as defined by Charles W. Bachman, can be
database, or at any known record, and implemented as procedural calls made over an API
sequentially access the "next" record in the or as declarative statements. The main difference
database until he reaches a record of between the queries mentioned in the previous
interest or reaches the end. section and the navigation principles defined here
2. He can enter the database with a database are the following. Queries are built over the semantic
key that provides direct access to the or over the structure of the data model. Navigation is
physical location of a record. (A database independent of the semantic or of the structure and
key is the permanent virtual memory address directly uses the content. Thus, in our context,
assigned to a record at the time that it was XQUERY and XPATH should be considered as
created.) navigational languages because they use the content
3. He can enter the database in accordance to navigate in XML files.
with the value of a primary data key. (Either
JCR Model
the indexed sequential or randomized
In the JCR Model, each record stores properties
access techniques will yield the same result.)
which relates to the localization of the item in the
4. He can enter the database with a secondary
database. The level, the path and, under certain
data key value and sequentially access all
conditions, the unique identifier are good examples
records having that particular data value for
of these specific properties. The rules mentioned
the field.
before are nearly all included in the model and allows
University of Lausanne & Day Software AG 17

for the navigation through the database with different 3.6 Synthesis
types of strategies.
The two data models show fundamental differences.
The root node can be seen as the beginning of the The approach’s choice highly relates to the degree of
database. As mentioned in the first rule, it gives the flexibility which has to be given to the user. This
ability to sequentially access all the sub-nodes. The choice also relate to the nature of the requirements
path and the unique identifier properties allows which involve clear or abstract entities. The choice of
navigating in a way which respects the second, the the data model should always be made by doing a
third, and the fourth rules by giving specific entry good analysis of the use case.
points for specific situations. The node types and the
parent nodes can be seen as set owners and thus The selection of an approach also affects the main
allows for the navigation of the database in ways roles and responsibilities which relate to data
which respect the fifth, sixth and seventh rules. management. A requirement would be that all of the
people using a database should be informed clearly
These possibilities offered by the JCR Model (4) (5) of their roles accompanied with guidelines of usage.
give the programmer a lot of flexibility. He is really Paying particular attention to certain previous data
able to navigate through the data and adopt usage habits as they would have to be changed or
strategies which will allow him to find data in their usage need to evolve if a new data model is
structures that are unfamiliar. chosen.
Relational Model Some users could voice reticence concerning these
In the relational model (2), records are seen as basic factors as conservative behavior is an obstacle when
tuples of values. Basically, these data structures do deep changes arise. The data model’s choice should
not know their localization in the database and are not be affected by this type of reasoning. The
not ordered in relations. To enter the database, a advantages engendered through good and coherent
programmer must have a good knowledge of the choices are enormous and can have a significantly
schema and of the data organization. impact on the application and the development
In one sense, we could say that the fifth rule
previously defined is fulfilled. However, because the
records are not ordered, it is not really the case.
Thus, the relational model does not take into account
these rules at all. The relational model only defines a
way to organize data and shifts the navigation
problem to a higher level.

Choosing the right approach

In term of navigation, both models are not
comparable. The signification given to the units of
content are really different. Thus choosing the right
approach depending on the use case is not really
hard. If the use case involves traversal access,
exploration or navigation in data, a model which
includes these concepts is always superior.
18 Specification comparison

4 Specification comparison

Specifications describe the features that databases He also wants to provide a book preview for the
should support. The main specification for relational authenticated customers and partners and let the
database is without doubt SQL which has been partners show the whole digital copy of the books. In
released several times (SQL92, SQL98, SQL**) addition to the ability to navigate through collections,
since its first edition and which is more or less partners and customers should be able to search
implemented by each relational database provider. products ISBN number, with full text criterions, or by
The JCR Specification was released in 2005 (JSR asking for the most successful items.
180) and a second version of the specification is in
incubation (JSR 283). Some companies as Day,
Alfresco or Oracle provide implementations of this
specification with different levels of compliance.

We could discuss the many aspects of each

specification which would take a long time but the
principal objective in this document is to highlight the
philosophy behind the specifications which provide
practical answers which solve common problems. It
is for this reason that, the examples shown in the
following sections are essentially based on the
SQL92 specification and on the version 1.0 of JCR.

The first section of this chapter presents a use case

which demonstrates how each specification can give
practical answers to running problems. Being well
balanced it shows the possibilities and limits of each
model. The four following sections will essentially
show how the concepts presented in the ―Data model Figure 4.1-1: Editor use case diagram
comparison” chapter actually take form in the
specifications. Finally, the last section will point to The Figure 4.1-1is a draft of the use case diagram of
practicalities by presenting features which respond to this application which summarizes the main actors
the more common differences in requirements. and the main features which have been identified
during the conception process. In the next sections,
4.1 Use Case Definition this use case will be used to point to some key
aspects which differentiate the relational databases
Consider an editor who sells books and wants to from the java content repositories.
create a system to manage his book collection and
his orders. A book collection is composed of books 4.2 Structure
and sub collections. A book can be tagged with
keywords. Through a website, the editor wants to let In term of structure, both approaches are radically
anonymous visitors navigating through the whole different. However, it makes sense to understand
catalogue by collection. how each specification makes use of the basic
concepts presented in the ―Data Models‖ chapter.
University of Lausanne & Day Software AG 19

This can assist people developing implementation

models and in solving practical problems.
<editor = ''>
JCR Specification [editor:person] > nt:unstructured
As other unstructured and semi-structured models, [editor:order] > nt:unstructured
the JCR Model does not make a separation between [editor:orderline] > nt:unstructured
data and their structure. Thus, there are no specific
[editor:collection] > nt:unstructured
needs to identify entities and attributes as required
[editor:book] > nt:unstructured
by relational databases. It is also important and
[editor:tag] > nt:unstructured
useful to identify the semantic beforehand or in other
words, identify the concepts represented by nodes in Table 4.2-1: Node-types
the content repository. This can be done by defining
The most intuitive way to design this structure or
a node-type or by specifying an attribute which
organization is to think in term of its composition.
declares the type of the node. The schema depicted
Simply the manner in which, one concept will always
in Figure 4.2-1 does not represent the structure of
be a component of another concept. If UML class
the repository. It simply shows how the main
diagrams are used during the design phase, it
concepts which can be found in the structure should
consists only of translating the composition
be organized.
relationships into hierarchies. The various other
associations will be stored as references or paths as
properties. More tips on how to design JCR
applications are available in the Appendix – JCR and
design‖ appendix.

In considering the environment as structured we are

often unable to translate clearly this structure.
Consequently, keeping the schema as weak as
possible, allows easily to take into account new
requirements at runtime by simply recording new
data. If node-types are used as markers, it make
sense to simply let them extend the nt:unstructured
node-type without adding more constraints.

Thus, at design time there is no real need to fix all

the attributes and all the entities. In this example,
some decisions can be taken later by the application
Figure 4.2-1: Semantic diagram programmer. The general idea is simply to leave
open the place for new requirements.
The root can be seen as the editor system which is
dealing with persons, orders, order lines, collections, SQL Specification
books and tags. This diagram does not take into As explained in the previous chapter, the relational
account the additional artifacts which could be added model implies that data and their schema are
in the content repository to organize data. separate. In practice this means that all the tables
and their respective columns have to be identified at
the time of design. During the development process
the entity relationship notations are often used for
this purpose.
20 Specification comparison

Figure 4.2-2: Entity relationship diagram

For the editor’s use case, means that some decisions of decision is completely arbitrary and has an
need to be made which will strongly impact the future enormous impact on the application’s life cycle.
evolution of the application. Data security and save
routines must make use of the predefined columns. 4.3 Integrity
Everything has to have been describe clearly
previously. For example the identification of what an As mentioned, integrity can have different meanings.
order, what a book is and what a customer is In the database vocabulary, integrity generally
imperative. Hence the final application must and will relates to the fact that accidental or intentional
reflect all these decisions which are often arbitrary. destruction, alteration, or loss of data should not
happen. It also relate to the state of completeness of
Figure 4.2-2: Entity relationship diagram shows a data which have to be preserved in all cases in the
database schema which reflects the decisions which database. This section will make a quick roundup of
have been taken during the design phase. In this use the possibilities proposed by JCR and SQL to deal
case, it is relatively easy to find relations and with integrity.
domains for the main entities as person, order, order
line and tag. At design time, their attributes can JCR Specification
clearly be identified and it is quite easy to conceive a Data integrity can be ensured in JCR with node-
relational schema for them. types. Some predefined node types are specified by
the JCR specification. These represent different
However, the book entity is difficult to fit into a table. concepts which are often encountered in repositories
For example, this schema only stores the title and such as folders, files, links, unstructured nodes, etc.
the description of the book. However as a These node-types can be extended and rules which
requirement there is a need to also store a digital force the nodes to respect certain rules can be
copy and a preview of the book. The content of the defined.
book could be part of the database or it could be
stored somewhere else in the file system. This kind In our use case, the state of completeness of data
which always has to be preserved in the database
University of Lausanne & Day Software AG 21

does not require a lot of constraints. In a real-time same comment can be made for the tags which are
situation, it could happen that a person places an made with an association of a similar nature.
order and comes to take direct delivery of the product
or a special edition of a book could have no ISBN. SQL Specification
We often say that this kind of decision has to be The fact that, in the relational model, the structure is
taken into consideration. However they should not be separated from the content and that it has to be
taken at a level which is detrimental for future described leads to creating data models which are a
requirements. representation of what will be the final usage of the
application. Furthermore because some integrity
The only integrity constraints we might choose to rules are implicit to the model, DBAs generally do not
define concern the orders and the order lines. For hesitate in defining all of the integrity rules which will
law compliance, it would be necessary that an order enclose the preservation of the entire data coherence
stores a date and that an order line stores a property at design time.
with a unit price and a quantity. This is shown in
Table 4.3-1. In practice for the editor’s use case, this means that
some application logic can be translated into integrity
<editor = ''> constraints. With check constraints, we could ensure
[editor:order] > nt:unstructured that the quantity attribute of an order line is always
- 'created' (Date) positive. With referential integrity, we can ensure that

[editor:orderline] > nt:unstructured

when a tag is deleted that, all the links which concern
- 'quantity' (double) this tag are also deleted. The statements in Table
- 'unitprice' (double) 4.3-3 and Table 4.3-4 show how this can be
Table 4.3-1: Node-type and integrity constraints
The fact that an order line can only be found under `Order_idOrder` NOT NULL,
`Book_isbn` VARCHAR(45) NOT NULL ,
orders node cannot be expressed at a repository `unitprice` DECIMAL(11) NULL CHECK (unitprice > 0) ,
`quantity` INT NULL CHECK (quantity > 0) ,
level. However, this constraint can be taken into PRIMARY KEY (`Order_idOrder`, `Book_isbn`))
account at an application level. We might also need
Table 4.3-3: Table and integrity constraints
to define a referential integrity constraint between the
ordered product and the order line. The code shown CREATE TABLE IF NOT EXISTS `mydb`.`Tag_has_Book` (
in Table 4.3-2 demonstrates how this can be done. `Tag_idTag` INT NOT NULL ,
`Book_idBook` NOT NULL ,
PRIMARY KEY (`Tag_idTag`, `Book_idBook`) ,
CONSTRAINT `fk_Tag_has_Book_Tag`
[editor:orderline] > nt:unstructured FOREIGN KEY (`Tag_idTag` )
- 'product' (reference) REFERENCES `mydb`.`Tag` (`idTag` )
CONSTRAINT `fk_Tag_has_Book_Book`
Table 4.3-2: Node-type and referential integrity FOREIGN KEY (`Book_idBook` )
REFERENCES `mydb`.`Book` (`isbn` )
The meaning for this kind of attribution could be ON UPDATE CASCADE)
discussed at length but keeping a strong reference
Table 4.3-4: Table and referential integrity
between the product and the order line which
implicates referential integrity does not really make The advantage of referential integrity constraints is
sense. A product can evolve and this sort of not negligible. They minimize the efforts made at
association would lose its signification. Furthermore application level to ensure the coherence of the data
the editor may want to sell in the future a service stored in the database. However in the case of the
instead of a book. Therefore imposing referential tag, if the tag is attributed a thousand times, deleting
integrity is probably extreme and we can one tag will imply a thousand and one write
consequently more realistically accept broken accesses. If tags are changing a lot, the system will
references between order line and product. The probably not sustain these integrity checks. A better
policy could be to allow incoherent tag attributions to
22 Specification comparison

survive in the database and to delete them if they are The second requirement which is aimed at changing
incoherent during the next read access. the status of some orders cannot be expressed with
a single query. However, the results can be
Specifying all the integrity constraints at a model accessed and modified through the navigation API. If
level can lead to performance and scalability the selection criteria involves domain conditions or
problems but it also restricts potential utilizations many connections this kind of query becomes very
which have not been identified at design time. complicated.
Implementing a new requirement would impose a
new development cycle which starts from the SELECT * FROM editor:order WHERE date < '+2008-11-
implementation model definition and finishes with the
implementation of the user interface. (…)

NodeIterator ni = queryresult.getNodes();
while (ni.hasNext()) {
4.4 Operations and queries Node n = ni.nextNode();
n.setProperty("status", "closed");
In term of operations and queries, we could consider
Table 4.4-2: JCR query and iteration on the result
the four following requirements. The editor wants to
identify the top 10 best sellers. He also wants to Retrieving all the books which are stored under a
change the status of all of the orders which respect collection is very easy to implement (Table 4.4-3).
some specific conditions. He wants to be able to Some properties which relate to the record (path,
retrieve all the books which are under a specific uuid, etc.) are accessible through XPATH and SQL.
collection and finally, he wants to perform full text The strengths of JCR and its features are very
search on all items stored in the system. evident in this type of situation.

JCR Specification SELECT * FROM editor:book WHERE jcr:path LIKE

The abstract query model of JCR is implemented in '/collections/science/%'

several ways for different utilizations. The version 1.0 Table 4.4-3: JCR query and hierarchy
of JCR uses a common subset of XPATH and SQL
which opens up the opportunity for some interesting JCR offers domain independent functions which
requests. The draft of the version 2.0 declares allow the execution of queries on all the properties
XPATH as deprecated and replaces it by a query stored in nodes. As mentioned, the JCR model is
language which uses java objects. unstructured, and the nodes do not have to reflect
the same properties. Therefore this is a very powerful
The first requirement which is aimed at identifying the functionality for all the use cases which require full
best sellers cannot be easily expressed with JCR in text searchs. As illustrated in Table 4.4-4 retrieving
one request. The reason being is that domains the set of nodes which contain a specific sequence of
operations as Max and Min are not included in the characters is very simple.
specification, joins only allow the retrieval of books
which have been ordered at least once (Table 4.4-1). SELECT * FROM nt:base WHERE CONTAINS(*, '*computer*')

Table 4.4-4: JCR query and full-text search

SELECT * FROM editor:book, editor:orderline
WHERE editor:book.jcr:path = editor:orderline.product
In conclusion, the use cases which are presently
Table 4.4-1: simple JCR query characterized by a lot of join and domain operations
will not really benefits from the features proposed by
As shown in Table 4.4-2, the top 10 can be realized
JCR. On the other hand, in term of operations and
by doing a query for each book which returns its
queries if the use cases characteristically require
number of related orders. Then, the sum of the
hierarchical queries, full text search queries and
results can be used to create the top 10. This is good
search queries in binary content, a java content
for simple queries but if connections which include
repository would be advisable.
domains operations are needed, the complexity of
the code is extensive.
University of Lausanne & Day Software AG 23

SQL Specification the hierarchy are randomized. Nested intervals (12)

As explained in the last chapter, the relational model solve partially this problem but, as nested sets, they
shows all of it power when the requirements need incur some maintenance complexity. While relational
connecting operations and domain operations. databases permit the management of hierarchies,
Furthermore, if the requirements need to perform a they do not exactly provide the right or effective tools
high volume of sequential changes to large volumes for this maintenance. Applications programmers tend
of records the possibilities offered by this model do to use frameworks to manage these requirements in
not respond favorably to these needs. a more elegantly manner.

The first requirement, retrieving a top 10 of the most Performing full text search queries on a relational
sold books can easily be expressed with SQL. The database require a good knowledge of the structures.
Table 4.4-5 shows how this can be done with a In fact, only the columns specified in the statement
simple join and a group clause. will be considered in the result. For complex models,
alternative solutions with external indexes are often
SELECT b.isbn, b.title, sum(o.quantity) used to perform this kind of request.
FROM b JOIN editor.orderline o ON
GROUP BY b.isbn
ORDER BY sum(o.quantity) SELECT * FROM book as b
DESC LIMIT 10; WHERE b.title LIKE '%computer%'
OR b.description LIKE '%computer%';
Table 4.4-5: SQL query and simple join operation SELECT * FROM collection as c
WHERE c.title LIKE '%computer%'
OR c.description LIKE '%computer%';
Updating the status of the orders is also quite easy to
SELECT * FROM tag as t
implement with one query (Table 4.4-6). This kind of WHERE t.title LIKE '%computer%'
OR t.description LIKE '%computer%';
statements is very useful when sequential
modifications which answer to complex conditions Table 4.4-8: SQL query and full-text search limitation
have to be performed on the dataset.
The Table 4.4-9 present the non standardized syntax
UPDATE editor.`order` o
proposed by MySQL for full text search.
SET o.`satus` = ('closed')
WHERE o.`date` <= curdate() - INTERVAL 1 YEAR;
Unfortunately, a problem linked to the structure is not
really solved and this solution does not support full
Table 4.4-6: SQL update query text search for multiple tables.
The third requirement is more complicated to realize.
SELECT * FROM book as b
In this case, the depth of the hierarchy of collections WHERE MATCH (
is not known in advance and it is not possible to b.description,
define an SQL query which takes into account this ) AGAINST ('word');
unknown parameter. Another possible way to
Table 4.4-9: MySQL and full-text search
proceed is to recursively retrieve the collections with
a statement similar to code found in Table 4.4-7, The first requests in this section show the power that
followed by running a query on all the books stored can be reached by combining different operators in
under these retrieved collections. declarative statements. For complex models which
imply sequential data modification in conjunction with
SELECT FROM collection AS c1 JOIN collection c2 ON
c1.parentId = domain operations, relational databases make more
WHERE = $categoryId; sense. However, the force engendered by a structure
SELECT * FROM book as b
WHERE b.collectionId = $categoryId[0];
disappears when the case in use involves features
OR b.collectionId = $categoryId[1]; linked to hierarchies, networks and search on semi
OR b.collectionId = $categoryId[n];
structured data. Therefore a good knowledge of the
Table 4.4-7: SQL query and recursion limitation whole use case is required before being able to
make a choice between the two options.
Nested sets can be used to avoid recursive calls.
However the performance costs needed to update
24 Specification comparison

Figure 4.4-1: Unstructured entity

4.5 Navigation session.getRootNode();

In our use case, the entity ―book‖ has not been Node.getNodes();
clearly defined. This type of entity is difficult to Node.getProperties();
concretize. Some other unknown entities are
Table 4.5-1: JCR navigation API
identifying it as a title, paragraphs, images, pages or
covers. Furthermore these entities can vary from one As mentioned, in our use case the entity ―book‖
book to the other. For the editor’s use case, we could cannot be completely defined at design time. That is
consider the two following types of books saved in why the application programmer should give the user
the system. Firstly one could be considered a roman, the ability to decide what a book is at the entry point.
essentially composed of ordered chapters, titles, and At the moment of creation the application
paragraphs. Secondly another one as a comic programmer will not be occupied with what types of
composed of ordered cartoon boards or planks. entities are present in a book. He will let the user
define them at a later stage. The book can be
JCR Specification
identified by displayed the configuration of its
Without a doubt, navigation constitutes the main
feature proposed by the JCR specification. Creating
and exploring a tree or a network structures is not
public void displayBook(Node book) throws
always easy. Navigation simplifies this. RepositoryException {
The API proposed by the JCR specification allows public void traverse(Node node) throws
navigation in and through records with direct access RepositoryException {
NodeIterator nodeIterator = node.getNodes();
or traversal access. A session is the main entry point displayNode(node);
while(nodeIterator.hasNext()) {
of the repository and provides a traversal access to traverse(nodeIterator.nextNode());
the root node and a direct access to each node by }
using their uuid or path. Each item of the repository public void displayNode(Node node) {
// display logic...
also provides navigational functionalities which make }
use of direct access through relative path or traversal
Table 4.5-2: JCR traversal access
access through children, properties, references or
parents. This API also provides write features to the The methods shown in Table 4.5-2 try to schematize
repository. Thus, the Table 4.5-1 show how new the advantages that can be reached by using
nodes, properties and values can easily be created navigation. There are a few possibilities now
and saved. accessible to the application programmer. He could
provide tools to let the user store the display logic
University of Lausanne & Day Software AG 25

directly in the nodes, giving the maximum flexibility. SQL does not standardize mechanisms which
This kind of strategy can be adopted through simplify the navigation through records during a
features proposed by the JCR specification. A session. Furthermore, there is no real context of
framework as Sling can facilitate this task. position in the database which is conserved during a
sessions and which can be reused simply.
SQL Specification
As mentioned, the relational model does not take To navigate the application programmer is obliged to
navigation into consideration and forces the build a mechanism which is able to perform dynamic
responsibility on the programmer to implement these queries on the model. Therefore even if the model is
features. Furthermore, all the entities have to be extremely abstract and able to take into account all
defined at design time and semi-structured data is the possible situations, the application programmer is
not catered for. forced to develop all the application logic to navigate
the structure. This task is by no means trivial.
For the editor’s use case, the implications are that
the application programmer will face some problems It is possible to make an implementation model which
if he is not able to define an abstract entity for the adds artifacts or miscellaneous entities to the records
content of the book. Figure 4.5-2 shows how the to create hierarchies, networks or explicit orders.
application programmer could choose to design his However, this methodology exposes the application
relational model to take into account that the programmer to some conception failures, which are
structure of the book appears and can only be very difficult to correct once the system is in
concretized at the time of input. production.

4.6 Transactions
In the current context, we can identify two levels of
transaction. The transactions which deal with one
resource and ensure that a sequence of changes can
be considered as a unit of work can be considered as
local. The others referred to as global transactions
(14), deal with several resources and require a
coordinator or a transaction manager to make sure
that the changes can be committed to the pertinent

Figure 4.6-1: Global and local transaction

Figure 4.5-2: SQL and unstructured entity
26 Specification comparison

JCR Specification features and most JDBC drivers can therefore be

The JCR specification includes both cases. In a local used with JTA.
manner, if the application programmer deals with
only one repository instance, he can ensure that a 4.7 Inheritance
sequence of changes can be considered as a unit of
work. All the changes between two save calls can be To enrich our use case with a wider panel of
considered as unit of work. associations, we could consider a subsequent new
requirement which implicates inheritance features.; The editor wants to differentiate between his;
collaborators, his partners and his customers but he
Table 4.6-1: JCR and local transaction also wants to take into consideration that an
individual can have several roles.
In an application, a content repository can be used
as a resource in conjunction with other resources as JCR Specification
a relational database, a messaging service or For the inheritance requirement, node-types and
something else. The specifications mention that a mixin-types can be used. For example let us consider
repository implementation can be used in conjunction a Person node-type which has three mixin-types
with the Java Transaction API (JTA). In a java respectively customer, collaborator and partner. By
container, when the Transaction API is used, the taking one or more mixin-type, a node which has
changes made on the JCR resource are determined been defined as a person can take on all the roles
only at the end of the transaction. encountered in the system.

// Get user transaction (for example, through JNDI)

UserTransaction utx = ...

// Perform some changes in a java content repository

// Perform some changes in a relational database

// Commit the user transaction


Table 4.6-2: JCR and global transaction Figure 4.7-1: inheritance semantic

SQL Specification <editor = ''>

The SQL specification allows the regrouping of [editor:partner] > editor:person

statements as a unit of work. These statements will
[editor:collaborator] > editor:person
only be permanent in the database if they all mixin
succeed. This determines that local transactions as [editor:customer] > editor:person
the one shown in Table 4.6-3 are part of the
standard. Table 4.7-1: node-types and inheritance

The primary advantage is that queries made on the
(Statement list…) person node-type will return all nodes of this type
and it will also including nodes which inherit from this
Table 4.6-3: SQL and local transaction node-type. All the properties of the returned nodes
However, using the database in conjunction with are immediately accessible and a node which was
other resources is not taken into account by the not considered as a person can also acquire this
specification. Some implementations provide status through the mixin-type.
statements to manage this kind of scenario similarly
SQL Specification
to the XA statement of MySQL. All the same, this can
Inheritance tends to be encountered at application
and is more often completed at a higher and more
level. However, some relational databases, for
standardized level. Some APIs provide these
example PostgreSQL can have extensions which
University of Lausanne & Day Software AG 27

manage inheritance. However these tools are not 4.8 Access Control
standardized and tend not to be used in practice.
Access control can be defined as the action of
A classical way to administer this requirement authorizing or denying access, modification and
consists of creating tables for each susceptible entity creation of records. While this is nearly always a
which will inherit characteristics from the person requirement in business applications, specifications
entity. The identifier of these sub entities is known as rarely respond to real-time situations.
a foreign key which point to the person table. Figure
4.7-2 visually represents how this could be In the editor’s use case, it was mentioned that a
implemented with SQL. person should be able to see a digital preview of the
book and under certain conditions the whole book.
This implies that books’ components can have
different access policies.

JCR Specification
Since the 1.0 version of JCR (4), access control is
one of the core feature. In its first release, the
specification only declares how to login to the
repository and how to check the permissions
attributed to the items of the repository. The
hierarchical path of the items stored in the repository
is used as the basis on how to check these
permissions. However, the specification does not
specify how access control should be implemented
and manage.
Figure 4.7-2: SQL and inheritance
Repository.login(Credentials cred);
Session.checkPermission(String absPath, String actions);
It is quite easy to create a query which retrieves the
entire set of persons and all their inherited properties.
Table 4.8-1: JCR 1.0 and access control
The one depicted in Table 4.7-2: SQL query and
inheritance, shows how this can be done with left The version 2.0 of the specification (5) defined how
outer joins. Additionally a view can be created to the concepts of privileges and access control policies
avoid having to rewrite the query. in the repository would function. Each item stores
properties which relates to privileges. These
SELECT * FROM person p
LEFT OUTER JOIN partner pa ON properties can be modified through the API. Thus the
OUTER JOIN collaborator co ON
OUTER JOIN customer cu ON;
access control feature can be delegated to the
content repository which is able to manage the list of
Table 4.7-2: SQL query and inheritance
permissions at an item level.
While JCR seems a more flexible way to express
inheritance, this can lead to the conclusion that both UserManager.addUser(…);
approaches are approximately equal in expressing
this kind of associations. However in reality it Session.getAccessControlManager();
demonstrates that the advantage in JCR is that each Policy.addEntry(…);
AccessControlManager.setPolicy(path, policy);
node can inherit from several mixin node-type. With
the annotation that this advantage relates more to Table 4.8-2: JCR 2.0 and access control
the semi-structured approach rather than inheritance In both cases, this means that for the editor’s use
problems. case, the application programmer will only have to
define the structure and to use the repository
28 Specification comparison

features provided to manage access control. The

access control granularity proposed by the API is
close enough to the data to address all the potential
use cases. Consequently, further access control logic
is not required.

SQL Specification
In SQL, access control is basically managed with the
data stored in the information schema (10). This
provides the ability to grant and deny privileges at a
table or a column level. However, while the base
functionalities provided by SQL allows the
declaration of implementation models which manage
permissions at a record level, there is no inherent
standard solution provided. This comes from the fact
that the identifiers of the records in relational
database can be distributed across several domains.
Conserving this property makes it difficult to specify a
generic way to manage access control at a record

Basically, for the editor’s use-case, managing the

readability of the information of which a book is
composed imposes that access control should be
administered at a record level. This is obligatory
because the SQL specification does not provide this
feature. The application programmer must therefore
include it in his implementation model.
Figure 4.8-1: JCR and access control
The Figure 4.8-1 shows the solution where each
record has a unique identifier stored in a column. The
record controller table allows for the identification of
accessible resources within the database. The
4.9 Events
record_accessor table allows for the identification of
Another requirement often encountered concerns the
the persons accessing the database, they can then
observation of the changes which can be applied to a
be stored through out the database in a user or a
dataset. At the infrastructure level, messaging
group table. This model still means that the
services are common examples of components
application programmer must manage and
which make use of these types of events. Some use
implement the logic which will perform the privilege
cases benefit from being event driven one such case
would be the management of flows. The editor’s use
case could also benefit from this type of
methodology. For example, the editor may want to
notify some clients each time a new book is added to
a specific collection.

JCR Specification
The JCR specification provides an Event Listener
interface which traces all the imaginable operations
which have to be performed when a specific event
University of Lausanne & Day Software AG 29

occur. These listeners can be registered for different body of the trigger generally contains procedural
types of event for example: calls which can be launched before or after queries.

when nodes are added or removed CREATE TRIGGER editor.book_insert AFTER INSERT ON
for events which occur under a particular FOR EACH ROW
path, at a specific level (Statement list…)
for events which occurs on the instances of a
node-type or on a single node identified by a Table 4.9-2: SQL and triggers
For the editor’s use case the trigger shown in Table
The coded example presented in Table 4.9-1 shows 4.9-2: SQL and triggers listens in on the registration
how an event listener can be registered for all the of new books. However, it is not possible to listen in
events which occur when a book is added to the on only the events which occur in a subset of the
computer collection. table. In addition, there is no standard way to
propagate the event from the procedural language to
ObservationManager om = the application. Hence triggers are mainly used to
modify data in the database following inserts or
EventListener el = new EventListener() {
@Override updates.
public void onEvent(EventIterator ei) {
System.out.println("A book has been added");

} 4.10 Version control
String[] nt = { "editor:collection" };
om.addEventListener( Version control is often an issue when people are
el, Event.NODE_ADDED,
"/collections/science/computer", collaborating on the same data. It is therefore
true, null, nt, false);
prudent to retain to memory the history of an object
Table 4.9-1: JCR and observation and to give the user access to the evolution of an
object. For the case in question, we could imagine
This observation mechanism allows listening in on
that after a certain lapse of time, the editor decides to
events with a fine granularity. Furthermore, the fact
manage in the system the different versions and
that the observation mechanism is provided directly
editions of the books.
through a java API instead a specific procedural
language allows a high level of interaction between JCR Specification
the application and the repository. Version control characterizes how content
repositories are fully compliant with the JCR
However, an important aspect is that the listeners are
specification. The JCR specification includes
not permanent. This means that if the repository is
versioning as a part of the standard. It can be
restarted, all the listeners have to be reregistered. In
supported for individual items and for hierarchies of
certain situations, especially those which occur when
items. This simplifies the life of application
the event listeners are registered at runtime, the
programmers who normally have to deal with these
recovery of the application’s state can be difficult and
kind of needs. As shown in Table 4.10-1, managing
versions of a hierarchy does not require an
SQL Specification enormous effort.
The SQL specification addresses the observation
// mixin versioning type
problem with triggers. One of the main advantages of book.addMixin("mix:versionable");;
triggers is that they remain in the information
// version creation
schema. This ensures that the state of the database book.checkout();
including the triggers can be easily recovered. book.addNode("chapter1");;
Triggers can be registered for insert, update or delete book.checkout();
operations which are visible on specific tables. The
30 Specification comparison; difficulty to provide a coherent set of features which

take this factor into consideration. Improvements
book.setProperty("isbn", "0-85131-041-9");
could be made in these areas for both models with;;
recommendations and enhancements being shared
book.checkin(); mutually.
// get the second version
VersionIterator vi =
book.getVersionHistory().getAllVersions(); As an additional key aspect the differences between
Version v;
v = vi.nextVersion();
each specification is note worthy. Generally, it
v = vi.nextVersion(); appears that the JCR specification is pragmatic in
// restore the second version relation to the SQL specification. The features
book.restore(v, true); provided by JCR give practical answers to common
Table 4.10-1: JCR and version control and recurrent problems.

SQL Specification Providing a standard way to solve running problems

Some relational databases implementations provide in a natural and elegant manner is not obligatory but
versioning functionalities. However, versioning is not by doing so this actually protects the application
part of the SQL standard. Any person wishing to programmer from conception failures. Failures which
build an interoperable application have to include could relate to the managing of versioning or access
versioning in their implementation model. Managing control.
properly complex graphs in relational databases is
While relational databases implemented on the SQL
quite difficult. So while versioning could be
specifications have the potential to represent all
implemented this task would not be undertaken with
types of use cases which could appear in real life,
They are often badly constructed due to the
constraints which impact and govern a projects
4.11 Synthesis
evolution or lifecycle. This does not detract from the
It seems that for both specifications the structural fact that the relational model does contain a
part and the integrity parts are well defined. complete set of main building blocks for a database.
However, while the relational model provides very At specification level, SQL makes extensive use of its
clear foundations for operations and queries, the base components to express its various extensions.
JCR specification seems to provide operations and
Conclusions can be drawn from this, principally that a
queries on a relatively obscure basis.
specification’s foundation should be able to handle
The same remark can be made for navigation. While and manage all kinds of use cases and secondly that
the JCR specification provide a strong navigational a specification should evolve and build onto its
basis, the last versions of the SQL specification have foundation and not away from it.
University of Lausanne & Day Software AG 31

5 Development process comparison

Figure 4.11-1: Agile and iterative development process

Another perspective is taken in this chapter to

compare relational databases and java content
repositories. The purpose is to show the key 5.1 Data Understandability
differences between data models which impact the
application’s development process. These Making architectural and implementation models
differences cannot really be measured but are understandable is one of the key aspects of the
significant enough to be mentioned. elaboration phase. Clear architecture which can
easily be communicated allows people to enter more
Agile development processes such as ―Extreme quickly into the project. It is also easier to define
Programming‖, ―Rational Unified Process‖ or ―Open tasks and duties if the architecture is clear and made
Up‖ divide project life cycles into steps such as of separate modules.
inception, elaboration, construction and transition.
These phases can be interactively executed. The Generally the architecture is defined or refined by an
process depicted in Figure 4.11-1: Agile and iterative architect or an analyst during the elaboration stage.
development process summarizes a possible This actor takes the requirement identified during the
segmentation of the time taken for the Open Up inception phase as input and delivers blueprints
development process. The following sections will which explain the behavior of the system at different
make reference to these steps. The purpose is to levels. At an application level, these blueprints
show where and how both models, the JCR one and generally include use case diagrams, collaboration
the relational one, can respectively impact this diagrams or class diagrams. To show how the
process. application’s data persists, these schemas are often
32 /Development process comparison

translated into database schemas which take the Relational development

properties of the data model into account. Class diagrams can be used as input to generate
relational schemas. Entity-relationship diagrams (15)
JCR development or Crow's Foot diagrams are often used to represent
As mentioned, the structure and the content are them. Translation rules are generally needed to
indivisible in JCR. However it is possible to define a produce these schemas. Far from summarizing the
semantic which shows how data and structure will architecture, they enumerate to a high degree all the
be instantiated. In this semantic, some aspects of aspects of the final application.
the content can be omitted.

For example, if a semantic item has an unstructured

basis, all the possible and imaginable properties can
be saved under it. Thus, there is no need to mention
them if they are not mandatory or don’t have to
respect specific constraints. It is enough to declare
them in the application’s schemas as made in a
class diagram. Thus, the semantic diagram of a java
content repository says less than the other
architectural diagrams. This impacts its readability.
In fact, reading the semantic of a repository gives a
snapshot of the final application and helps to
understand its general behavior.
Figure 5.1-2: SQL translation

Everything has to be explicitly mentioned in these

database schemas. Only the records which respect
the data structure can be instantiated in a relational
database. Thus, it is necessary to carefully define
this structure and make it fit in perfectly with the
application architecture.

Many-to-many associations cannot be represented

in relational database schemas without reification.
This means that many-to-many associations will
always require intermediary entities. Consequently,
the internal complexity of a relational schema
increases faster than the complexity of the other
architectural diagrams. Thus, they don’t really help
to understand the application. They are more often
Figure 5.1-1: JCR translation
used as implementation’s blueprints.
Another interesting aspect is that the complexity of
the JCR semantic is not decupled by many-to-many 5.2 Coding Efficiency
relationships. No intermediary nodes or artifacts are
The construction phase of a development process is
needed to represent these associations. Thus, these
highly influenced by efficiency. Coding requires time,
diagrams are very much closed from the other
resources and money. These parameters are very
architectural schema. No translation rules are
sensitive. Furthermore, if developers have to write
needed to create them.
code twice, there is a high probability that they will
make more than double the programming errors.
Thus, efficiency also impacts quality.
University of Lausanne & Day Software AG 33

Measuring coding efficiency implies some soft quantity. However, if the use case implies
parameters. The programmer’s education and requirements such as navigation or versioning, the
knowledge should be taken into account. developer will have to add some artifacts into his
Furthermore, the semantic and the readability of the implementation model to manage parameters such
code are also significant. These parameters make it as tree structure or order. He will also face the
difficult to judge the technology’s efficiency. Without problem of having to implement huge applicative
going too deep into these questions, the following logic. Thus, in terms of efficiency, the model’s choice
sections contain useful information which can be should be driven by an honest analysis of the use
taken into consideration when making a decision in case’s properties.
this area.
5.3 Application Changeability
JCR development
Programmers are not really familiar with the JCR Requirements which appear during the development
API and don’t really know the best practice linked to process are often difficult to include in previously
content repositories. However, the API is in large defined architecture. Modern software development
part self-explanatory and people generally have the processes generally address this problem with
habit of thinking in terms of hierarchies. These iteration cycles (16). Well managed, iterations
parameters should give to JCR a good learning should allow to include efficiently new requirements.
curve. However, because each logic level is generally
impacted by architectural changes made during the
Some interactions are possible between the query elaboration phase, last iterations are more
part of the API and it’s navigational part. One of the expensive than early iterations.
big advantages of JCR is stated in the fact that
these aspects are merged coherently and are not Decoupling clearly logic levels can reduce this
considered as different abstraction levels. increasing cost. Thus, data models which can
transparently accept changes are really appreciated.
The code quantity highly relates to the use case. If To make this point, we will consider how simple
complex joining operations are mainly required, JCR changes are impacting the data logic of a system.
will not be an efficient choice. However, if navigation
is required, the size of the code will be much JCR development
smaller. If special requirements such as versioning As mentioned in the ―Schema understandability‖
or fine grained access control are needed, it section, repository’s schemas summarize the other
becomes clearly difficult to reach the same level as architectural diagrams. While this could appear
the one proposed by JCR. meaningless, it is really not the case. Keeping the
repository as weak as possible allows and includes
Relational development new requirements without touching the data logic
Nearly all programmers are familiar with the level. Only the application logic level is impacted.
relational model and people have often used it in Thus, adding a property at an application level
recent years. Thus, SQL and API as JDBC are part doesn’t necessarily require or touch the repository’s
of the common language. In real world situations, organization.
this general knowledge often favors the relational
model. Some problems need to be treated in a To be sure, deep changes impact data logic and
specific manner and the intuitive approach often JCR, and they do not provide a magic solution
gives bad results. either. The JCR allows for a decoupling of most of
the data logic from the application and the interface
If complex operations are required by the use case, levels. It is also interesting to note that frameworks
the relational model should not be bypassed. The like Sling allow decoupling in a similar manner to the
completeness of the queries and the panel of application logic from the interface logic. This
operations made it very efficient in term of code
34 /Development process comparison

approach is clearly an attractive one, especially in changes into their environment. In situations where
environments driven by changes and agility. some changes have to be performed the semi-
structured nature of JCR will certainly be
Relational development appreciated. Furthermore, the inclusion of features
Nearly each modification made on the overall such as navigation, versioning or access control can
architecture will impact the data logic level. This gain us a lot of time.
comes from the fact that relational databases do not
allow for instantiate elements which have not been Nevertheless, it is important to keep in mind that the
previously defined in the structure. Thus, there is a efficiency of both solutions relates in a large way to
great probability that a change made in a formulary the nature of the use case. The agility of JCR should
of the interface or in the application logic will require not influence this aspect. Furthermore, the agility is
perform changes on the data model logic. inked in no small way to the project team. Thus,
saying that JCR is a way to achieve agility is a too
Some frameworks provide tools to automate these big a shortcut.
changes. However, if the system has a production
version, once executed the change will have a big In all cases, the choice of a database technology
foot print on all the database’s items. Furthermore, should always be discussed during the inception and
classical model-view-controller frameworks are not elaboration phases of the first iteration of the
really decoupling the applications level from the development process. This can be done by leveling
interface. For example, a change made on a the different parameters. Changing the persistence
controller will often impact on views and models. technology cannot easily be achieved after the first
iteration. Consequently, this choice will have a
5.4 Synthesis strong impact for the rest of the project.

At a project level, people are often looking for

solutions which will allow for the quick integration of
University of Lausanne & Day Software AG 35

6 Product comparison

Choosing between database products implies that Creating an association between two nodes also has
we use different criteria. We can mention the a constant cost because the number of operations
compliance with a standard, the additional features needed to perform this is always the same.
proposed by the provider, the support offered by a
company or by a community or the scalability of the Thus, the cost of crossing and creating associations
solution. All these criteria have an importance. They is constant and could be noted as O(1) in big O
should be weighed carefully and a choice made notation. Some people say that these associations
depending to the situation. are pre-computed.

In our context, basic and significant differences Some strategies allow the representation of directed
distinguish java content repositories from relational graphs such as those needed by the hierarchical and
databases. Thus, a decision to employ one the network models. The most classical
technology instead of another should be taken at a representations of this are adjacency lists and
lower level. However, in relation to the product, adjacency matrixes (17). Generally, the choice
people often ask in terms of performance, if they between one approach instead of another is made
should use a relational database or a java content simply by analyzing the density of the graph.
repository to manage their hierarchical information.
If the graph has a number of arcs which are close to
This section will try to address, and answer this issue the square of the number of edges, selecting an
by reminding us of some basic theoretical concepts adjacency matrix will show a better result. However,
which relate to data structures and to the cost of the JCR model is mainly driven by hierarchical
associations. Then, at a more practical level, a associations. In this context, the number of arcs will
benchmark of several database products will verify if not be a lot taller than the number of edges. Thus, an
these assumptions can be proved. adjacency list will show more respect for the memory
usage by requiring only the space needed to store
6.1 Theoretical analysis the associations. It is also interesting to note that this
kind of organization allows, with a certain amount of
In general, database products use basic data ease, the giving of an order to the children of a node.
structures to manage their data. This section reminds
us of simple concepts which relate to these
structures and to the cost of associations made
between data items. The goal is to determine if the
product’s performances will be significantly impacted
by the subtended approach.

Hierarchical and network database

In the hierarchical and network models, associations
are made by storing references or pointers between
items. The advantage of this kind of structure is that,
because each node stores direct references with
other nodes, a constant number of read accesses
are needed to go from one node to its target.
36 Product comparison

in the target. However, most database products

provide indexation facilities such as b-tree indexes.
So, in most cases, finding the matching entries has a
cost of O(log(n)). While b-tree indexes are good,
some articles (18) argue that in the network models,
because associations are pre-computed, it is
possible to reach better performance.

However, in most cases there is no need to use other

comparison operators other than ―= ― or ―≠‖ to
express relationships as these are presented in a
hierarchical or network model. Consequently, hash
indexes can be used on the domains which
constitute the association. If the relational database
provides good hash indexes’ implementations, the
cost of retrieving data through associations will be
close to O(1). It also results in a constant cost of O(1)
when new items are added to the targeted sets and
in the index. Thus, there are virtually no significant
differences between the associations of the relational
model and of the hierarchical model.

6.2 Benchmark
The previous section has summarized very succinctly
Figure 6.1-1: A hierarchy and its adjacency matrix and too quickly a huge problem. However, the main
point to keep in mind is that intolerable differences
Implementing this with a programming language can should not appear if hierarchical data is managed
be accomplished by using several data structures with a content repository or a relational database.
such as arrays, maps or hash-tables. Some other The following benchmark has been done to verify this
solutions could also be presented but the main idea assumption.
is that crossing an association has a constant cost
and that crossing a graph has a cost which is Four products are included in this benchmark. CRX
proportional to the number of arcs and edges is a native implementation of the JCR specification.
traversed. Thus, managing this kind of data is cost The persistence of the items is managed with a
effective. proprietary technology which is based on the tar file
compression (19) and implemented with java. H2
Relational database and Derby are two open source relational databases
In the relational model, associations are made written in java. MySQL is one of the most widely
between relations by computing the matching values used open source databases.
stored in two domains. This allows for the expression
of all imaginable associations between two or more A simple wrapper has been defined for this
data sets. benchmark. This wrapper proposes basic functions
to create trees made of nodes and properties. The
What is the cost implication of computing and CRX wrapper uses directly the functionalities
creating associations in a relational database? To provided by the API. The SQL wrapper uses a simple
compute an association, a relational database has to database schema. One table stores the nodes and
cross the targeted set to find the matching values. In the other table stores the properties. The
this case, the cost of the association equals O(n), associations between items are managed with a
with n the number of tuples stored in the source and parent foreign key and the default indexes of the
University of Lausanne & Day Software AG 37

database are used on all fields. JDBC allows except leaves which only have 5 properties. The first
performing queries and prepared statements to avoid hierarchy has one level. The following ones always
parsing the SQL statements each time. include one more level. The tests have been
launched 5 times on a Dell Latitude D820 installed
The benchmark is composed of four parts which all with windows XP (processor: Intel Core Duo 2.00
measure the time required to perform an operation in GHz, virtual memory: 2.00GB). The average result is
hierarchies of different sizes. Each node of these used in the following diagrams.
base hierarchies has 5 sub-nodes and 5 properties

crx h2 mysql derby Writing the hierarchy

18.00 This test measures the time required to

16.00 create the base hierarchy. The
throughputs correspond to the time
14.00 needed to write one item of the
12.00 hierarchy. While the differences seem

10.00 huge, all the throughputs are constant.

The assumption that native
8.00 implementations of JCR and relational
6.00 databases should be equivalent in term
4.00 of performance is true in this case.
MySQL cannot be embedded in the
2.00 application. This has a high impact on
0.00 the result. H2 does not appear in the
chart because its performance for write
36 186 936 4686 23436
accesses is too good.

crx h2 mysql derby Reading the hierarchy

0.20 This test consists to read once all the
0.18 items of the base hierarchy from the root
0.16 to the leaves. The throughputs displayed
0.14 in the chart correspond to the average

0.12 time needed to read one item of the

hierarchy. For most databases the
0.10 results seam to be constant. Derby is just
0.08 out of range. When recursive queries are
0.06 performed on this database, the results
0.04 are not tolerable.
36 186 936 4686 23436
38 Product comparison

Randomly writing the hierarchy crx h2 mysql derby

The test consists of randomly writing 100 300.00
sub-hierarchies in the base hierarchy. Each
sub hierarchy has a depth of 2 levels. Each 250.00

level has two sub nodes and two properties.
Thus, each sub hierarchy is composed of 200.00
21 items. The throughputs relate to the
average time required to create all the
items of one sub-hierarchy. The results of 100.00
the first test are quite similar to this one.
The good point is that all the databases 50.00
have constant results.
36 186 936 4686 23436

Randomly reading the hierarchy crx h2 mysql derby

The test consists of randomly reading 100 9.00
nodes and their descendants on two levels 8.00
in the base hierarchy. The throughput 7.00

relates to the average time required to read 6.00

one node and its descendant. As in the
second test, Derby is just out of range. The 5.00
same problem is encountered with 4.00
recursive queries. It appears that CRX is 3.00
well optimized for these situations. To be 2.00
really pertinent this test should be launched 1.00
on bigger hierarchies. However, the
difference between the results is constant 0.00
and relational databases are not showing 36 186 936 4686 23436
extremely bad performances for recursive
queries. Items

6.3 Synthesis this does not mean that java content repositories
should be implemented as a layer over relational
As shown in this chapter, performance should not be databases. Some base concepts of both
used as the main argument to choose one technology specifications are in a mismatch and make a relational
over another. The aspects mentioned in the previous schema for JCR, which include all the aspects of the
chapters are more important. The choice should relate specification, will look unsuitable. More modularity (3)
to the nature of the problem which has to be solved in the database world could benefit from both
and not to the nature of the product. approaches. While this goal is not achieved, native’s
implementation of JCR is probably the better of the
The assumption that relational databases are able to proposed solutions.
effectively manage hierarchical data is true. However,
University of Lausanne & Day Software AG 39

7 Scenario Analysis

The following diagram synthesizes the main aspects pointed out during the whole comparison process. Four use
cases characterized by different features will be shortly analyzed in regard to their respective requirements and
to the presented approaches.

Data Model Level
Structure Unstructured Structured
Semi structured

Integrity Entity integrity Entity integrity

Domain integrity Domain integrity
Referential integrity Referential integrity
Transitive integrity in hierarchies Tools to manage data coherency

Operations and Queries Selection Selection

Equi-join operations Projection
Full text search operation Rename
Transitive queries on hierarchies Join operations
Domain operation
Create, read, update, delete statements

Navigation Navigation API Not supported

Traversal access
Direct access
Write access

Specification Level
Inheritance Node types inheritance Not supported
Node inheritance

Access control Record level Table and Column level

Record level not supported

Observation Record level Table level

Un-persisted event listeners Persisted triggers
Application interaction supported Application interaction not supported

Version control Supported Not supported

Project Level
Schema understandability DataGuides or Graphs Entity Relationship
Summarize the architecture Represent the whole architecture
Not impacted by many-to-many associations Impacted by many-to-many associations

Code complexity Simple for Navigation Complex for Navigation

Complex for Operations Simple for Operations

Changeability More agile More rigid

Decoupled from the application Coupled with the application
40 Scenario Analysis

7.1 Survey 7.2 Reservation

An agency wants to implement an application which An event organizer wants a portal which gives the
is able to carry out surveys over the web. This tool opportunity to buy tickets for events. The event
should be able to allow for the collection of data from organizer should be able to create the events
questionnaires, to configure the type of answers, characterized by a name and a short description.
and to aggregate the survey’s results in a suitable The customer should be able to browse and search
form. the event’s catalogue and to order tickets. On the
other hand, the event organizer wants to monitor his
Main characteristics of the application: sales and manage his prices depending to the
success of the event.
All the entities can easily be identified at the
design time. (Structure) Main characteristics of the application:
Some verification has to be made on the
data. (Integrity) All the entities can easily be identified at
The results aggregation implies complex design time. (Structure)
operations. (Operations and Queries) Some verification has to be made on the
Once in production the application will not data. (Integrity)
evolve to a great degree. (Project) Monitoring the sales can imply some
operations on the dataset. (Operations and
The choice of a relational database for this kind of Queries)
scenario is probably the best alternative. The Browsing and searching the catalogue
features provided by a content repository will not require traversal and direct access.
really be used. Furthermore, programming (Navigation)
operations will only add complexity in the As a strategic application, the application is
application. subject to improvements. (Project)

This application has strong needs which relates to

the relational database world. The clear structure
linked to the management of orders and events
could lead us to conclude that a relational database
is the ideal candidate. However the need of
navigation and the potential extensions linked to the
catalogue could benefit from the features of a
content repository.

A balanced approach could consist of storing the

orders in a relational database and using a content
repository for the events catalogue. This also fits in
particularly well with the fact that the catalogue will
mainly be subject to read access and the ticketing
service to write access. This should not be a
problem because complex interactions between the
JCR and the RDBMS can be managed with the Java
Transaction API. Making hybrid decisions can in
certain contexts allow us to benefit from both
applications, thus having the best of both worlds.
University of Lausanne & Day Software AG 41

7.3 Content management 7.4 Workflow

A publisher wants an application to be able to An editor wants to manage the interactions of his
manage all the content generated by its collaborators. The situation could be the following:
collaborators. The content will be composed of The editor in chief and the board decide which
videos, photos, text or anything else. Several subjects have to be treated in the next edition of a
taxonomies should be available to organize the publication. These subjects are communicated to the
content. The main purpose of the publisher is to workforce (journalists and photographers). Once
offer a coherent set of features which allow for the edited, the articles are sent for proofreading. Once
easy retrieving of resources for each type and to corrected, the editor in chief is notified. He decides if
enable the reuse of them in different contexts or in the article can be published or not. If the article will
other publications. appear in the publication, it is sent to a typography
service which produces a model which includes
Main characteristics of the application: pictures. Once the publication integrates all the
articles and all the pictures, the editor in chief will
The editor wants to take into consideration
read it once again and take the decision to publish it
that new entities of content could appear.
or not.
The main verifications regarding data Main characteristics of the application:
concerns virus. (Integrity)
Searching requires full text indexation. The entities are composite and difficult to
(Operations and Queries) design. (Structure)
Taxonomies imply simple operations. The structure mainly involves graphs.
(Operations and Queries) (Structure)
Exploration is needed everywhere. Editing and exploring the process implies
(Navigation) traversal access. (Navigation)
Future improvements could imply Notifications imply to observe local events.
versioning, observation and access control. (Observation)
(Specification features) Notifications imply interactions between the
The system will continuously evolve with the data model and the application.
enterprise. (Project) (Observation)

The flexibility and the features provided by JCR are This kind of scenario involves semi-structured
typically made for these types of scenarios. Content models in conjunction with good observation
as understood here is difficult to store in a relational capabilities. While the other features proposed by
database. Furthermore, all the complex JCR such as versioning or access control do not
requirements such as versioning or access control directly find an application, the foundations of the
can be included during the application life cycle model will really be appreciated in this case. The
without too much of a problem. workflow structure can be directly designed with
nodes and items and once instantiated the workflow
will clearly benefit from the observation mechanisms
proposed by JCR.
42 Conclusion

8 Conclusion

The choice of a data model or of a database is often Some frameworks are partially effective in solving
arbitrary. Sometimes, specific technologies are these problems. Depending on a middleware layer
imposed by an enterprise policy or simply by for features such as access control, navigation, or
irrational preferences. When the time comes to versioning only push the hot potato at a higher level.
choose a technology, the good arguments are not This does not really solve the problem but adds
often put forward. Furthermore, the myth of a general complexity to the overall environment.
multi-purpose database is still ingrained in some
minds and people are always looking for a magic Java content repositories cannot replace relational
solution which can be used in all imaginable databases in every situation. Actually, the features
circumstances. proposed by the API fit very well with all the
requirements encountered in content management
Today, the cohabitation of several infrastructure and collaborative applications.
components can be achieved with minimal effort. A
platform such as J2EE provides tools to manage Nevertheless, JCR enriches the debate around
distributed resources. In this context, the choice of a databases and data models in relation to two
data model or of a database should not be reduced important aspects. Primarily JCR includes some
to an arbitrary decision. features at a data model and specification level.
Secondly the specification is aware of its
As shown in the ―Scenario Analysis‖ chapter, a environment and takes into account that java content
pragmatic analysis gives quick results. The repositories can be used in conjunction with other
technology which fits in best with the requirements infrastructure components. This is not the case for a
can be identified and used to the greatest effect. In specification such as SQL.
some cases, hybrid strategies can also be adopted.
A coherent choice can lead to significant advantages This tendency seams relatively new but will probably
and this question should always be discussed during be consolidated during the next few years. With a
the early phases of each project. position of precursor, Day can play an important role
in this debate and will gain in notoriety. Some
Relational databases have been successfully used challenges will arise with the growing popularity of
for several years. However, the growing power of the the JCR specification. Selecting good opportunities
user and the rigidity of the relational approach make should allow for the database field to make its mark.
it difficult to implement features which are actually This in its turn will create a footprint that will overflow
required by some applications. It’s possible to push into the world of infrastructure components.
the boundaries of the model but the constraints of
time and money make it difficult to do so correctly.
University of Lausanne & Day Software AG 43

9 Appendix – JCR and design

As mentioned in the ―data model comparison‖ more paths values in a node property. This method
chapter, a Java Content Repository schema is has an advantage because the hierarchical property
dynamic and evolves with the content. The structure of the target can be used in queries. Another
appears when nodes and properties are instantiated. relationship consists to store one or more UUID
However, during the development process the need values in a node property. The maintains the validity
to establish a semantic for the repository appears. of the link even if the target is moved. Any one of
these approaches could be used and be appropriate
Several publications which treat semi structured depending on the context.
approaches propose solutions in how to represent
these schemas (20) (21). These representations are 9.2 Convention
called DataGuide (DG) or Approximate DataGuide
(ADG). The lesser elaborate version can capture Semantic items which will be instantiated as node or
visually the organization of semi-structured properties are respectively represented by circles
databases. The JCR specification (4) use graphs to and boxes. The circle’s label refers to the node-type,
represent the example of the structure which can be the box label to the property-type. Without a label,
found in the content repository. the circle or the box means that the node can be
found. An empty circle means that everything which
DataGuides and other graphs notations fit is not mention is allowed under the semantic item
particularly well with Java Content Repositories but (black list). A barred circle means that everything
are not expressive enough to be used as which is not mention is not allowed (white list). An
implementation blueprints. The goal of this appendix empty box means that the property is simple. A box
is to summarize the possibilities offered by JCR to which contains a ―M‖ means that the property can
organize content and to enrich the notation store multiple values.
proposed in the specification which needs to
communicate the whole semantic of a repository. The composition of associations is represented by
filled arrows which link two semantic items. The
9.1 Model arrow’s label refers to the relative path which links
the two semantic items. Only descendant relative
The most common relationship provided by the paths are allowed. Stars (*) and variables
model is the composition. Semantic items can be (<variable>) can be used to express pattern in the
instantiated as node and properties. A node can be path. Without a label, the arrow means that a
composed of sub-nodes and properties. A property semantic item, as the one targeted, can be found
can only be composed of values. Except for the root everywhere under the source. The arrow can end
node, all other nodes and properties are with a cardinality (1..N). Without cardinality the
components. meaning is N.

Some as seen allow for the creation of horizontal Horizontal associations are represented by dotted
relationships between the branches of a hierarchy. A arrows. They always start from a box and finish on a
common relationship is achieved by storing one or circle. No labels are put on these arrows. They are
44 Appendix – JCR and design

only used to give implementation information. The always go from the bottom to the top. No labels are
arrow can end with a cardinality (1..N). Without put on these arrows. The elements which are
cardinality the meaning is N. represented with a bold style are mandatory. If
specific constraints have to be declared they can be
Inheritance associations between semantic items shown as comments in the diagram.
can be represented by empty arrows. They should

9.3 Methodology
Designing a JCR semantic can be made with different approaches. If a development process is used, the
semantic will be obtained by translating the applications diagrams. The approach proposed here consists of six
steps which can be iteratively be executed and which result in a semantic blueprint which can be implemented in
a repository.

Input Output Activity

Step 1 Existing semantic Semantic items Identifying the concepts which relate to
Identifying the semantic Requirement the requirement and which have to be
items localized in the repository.

Step 2 Existing semantic Inheritance semantic Identifying inheritance relationships

Identifying the inheritance Requirement between the semantic items.
relationships Semantic items

Step 3 Existing semantic Hierarchical semantic Identifying hierarchical relationships

Identifying the hierarchical Requirement between the semantic items.
relationships Semantic items Thinking in term of composition.

Step 4 Existing semantic Horizontal semantic Identifying horizontal relationships

Identifying the horizontal Requirement between semantic items.
relationships Semantic items Identifying relationship’s types.
Thinking in term of association or

Step 5 Existing semantic Organizational semantic Identifying the patterns which link
Defining cool structure Requirement hierarchical semantic items.
artifacts Hierarchical semantic
Horizontal semantic

Step 6 Existing semantic New semantic Only if necessary, declaring in the

Carefully defining the Requirement semantic the level of coherence which
integrity rules Semantic items has to be preserved at a repository
Inheritance semantic level.
Hierarchical semantic
Horizontal semantic
Organizational semantic
University of Lausanne & Day Software AG 45

9.4 Application
Based on a very simple use case, this section shows such as text, images, etc. A post can belongs to zero
how the methodology and the notation previously or one category and can have zero to an infinite
defined can be applied. The purpose is to deliver a number of tags. A category can have subcategories.
blueprint which shows how data is organized and all From any category it should be possible to find all the
data aspects required to build the application. posts which relates to it and to its subcategories.
When a category is deleted, the related posts are not
The specifications of the case are as follows: A blog deleted. Anonymous readers can respond to posts
application deals with posts. A post always stores its with comments. For navigation, it may be useful to
creation date and should contain some information organize posts by years, months and dates.
46 Appendix – JCR and design

Output Comments

Step 1 Properties do not have to

Identifying the semantic items be localized in the

Step 2 The requirement does

Identifying the inheritance not contain inheritance
relationships but we could imagine this
kind of relations.

Step 3 Post and categories are

Identifying the hierarchical not linked with a
relationships composition relationship.

Step 4 To satisfy the requirement,

Identifying the horizontal posts are linked to
relationships categories with path values
and with UUID values to

Step 5 The year, month, year

Defining cool structure pattern is part of the
artifacts hierarchical association.

Step 6 In our case we only have

Carefully defining the integrity to ensure that a post
rules always has a creation date.
University of Lausanne & Day Software AG 47

10 Appendix – Going further

Only a few subjects have been mentioned in this limits the whole number of paths to the number of
report. This appendix presents three fields which leafs.
relates to JCR and to databases in general. These
fields could benefit from being studied in more depth. Doing the same for horizontal relationships is a bit
Furthermore, some existing products could be more problematic. To summarize, in a network
improved if these questions were addressed. structure, pre-computing all the paths is not
proportional to the number of leafs but to the square
10.1 Queries in semi-structured models of the number of nodes (11). The storage capacity
required to store the transitive paths between the
In the JCR Model, the notions of sets, relations and nodes also grows in a similar manner.
domains, which provide the means of expressing first
order logic statements over the model, are present Some use cases such as those which involve social
but currently not formally defined. It seems that at the networks need to store these kind of relationships.
present, node-types are seen as relations, properties Defining a standardized way to manage this could be
as domain, nodes as tuples and properties’ values as very useful in some situations. However, it demands
attributes. that some research be made on finding the best
algorithms and solutions which relate to this problem.
The fact that these notions are well defined in
relational databases procures advantages. For Furthermore, query languages based on first order
example, on this basis, some databases engines are logic are limited when having to define queries on
able to analyze queries and to optimize them in transitive closures and transitive relationships in
regard to the structure. In semi-structured databases, general. It is in this measure and area that
query optimization is a known issue and research is improvements still have to be accomplished.
still being conducted in this area (20).
10.3 Modular and configurable
It is currently not clear if mapping as proposed by databases
JCR could ensure more efficiency when queries are
performed. Greater work on this question and further As shown in the ―product comparison‖ chapter, the
improvements of the JCR’s query model could be a relational model is able to manage efficiently
very interesting field of investigation. hierarchical relationships. Therefore is it really
necessary or intelligent to implement, from the
10.2 Queries on transitive ground up, a data model which can be constructed
relationships from another, with approximately the same results?

The model proposed by JCR stores the hierarchical Some reasons could lead to this conclusion.
paths of each node. This allows the performing However, the base differences between JCR and
queries on transitive relationships in hierarchies by SQL cannot be omitted. For example, does it make
using the path property. Assuming a tree structure sense to create a procedural API over a declarative
48 Appendix – Going further

query language which will be retranslated in modular and configurable approach to build
declarative calls in the database? While the cost databases (3). These recommendations lead
relating to the parsing of a query is insignificant, it is developers into using database components at
also a good reason indicating that it is preferable not different level depending on their needs.
to proceed in this manner.
JCR and SQL are two high level backend solutions
In reality databases are presently used with many which have possibilities but also limits. Their
different purposes in many different contexts. A few significant differences do not mean they do not have
applications are embedding databases to manage common denominators. More modularity in their
small data sets in single client applications while architecture could give a better understanding of
others are dealing with thousands of connections their behavior. This could also allow them to share
and scalability problems. In this context, a components and to be adapted more easily to
multipurpose monolithic database is unimaginable specific requirements and contexts.
even mythological. Margo Seltzer promotes a more
University of Lausanne & Day Software AG 49

11 Bibliography

1. Tsichritzist, D. C. and Lochovsky, H. Hierarchical 13. Bachman, Charles W. The Programmer as Navigator.
Data-Base Management: A Survey. New York, New York : Waltham, Massachusetts : ACM, 1973.
ACM, 1976.
14. Distributed Transaction Processing:The XA
2. CODD, E. F. A Relational Model of Data for Large Specification. s.l. : The Open Group for distributed
Shared Data Banks. San Jose, California : ACM, 1970. transaction processing, 1991.

3. Sestzer, Margo. Beyond Relational Databases. ACM 15. CHEN, PETER PIN-SHAN. The Entity-Relationship
Queue. New York, New York : s.n., 2005. Model-Toward a Unified View of Data. Cambridge,
Massachusetts : ACM, 1976.
4. Nuescheler, David and Piegaze, Peeter. Content
Repository API for Java™ Technology Specification. s.l. : 16. Introduction to OpenUP. OpenUp. [Online] October 27,
Java Community Process, 11 May 2005. version 1.0. 2008.

5. —. Content Repository API for Java™ Technology 17. Cormen, Thomas H., et al. Introduction to Algorithms,
Specification. s.l. : Java Community Process, 2 July 2007. Second Edition. Cambridge, Massachusetts : The MIT
version 2.0 Public Review. Press, 2001.

6. Mazzocchi, Stefano. Data First vs. Structure First. 18. Bates, Duncan. Embedded databases: Why not to
Stefano’s Linotype. [Online] July 28, 2005. use the relational data model. Embedded Computing Design. [Online] January 01, 2008. http://www.embedded-
7. Chaudhuri, Surajit. An Overview of Query Optimization
in Relational Systems. Redmond, Washington : ACM, 19. Müller, Thomas. CRX Tar PM. [Online]
1998. Day Software AG, November 11, 2008.
8. Buneman, Peter. Semistructured Data. Tucson, ml.
Arizona : ACM, 1997.
20. Goldman, Roy and Widom, Jennifer. DataGuides:
9. Aho, Alfred V. and Ullman, Jeffrey D. Universality of Enabling Query Formulation and Optimization in
data retrieval languages. San Antonio, Texas : ACM, Semistructured Databases. Palo Alto, California : Stanford
1979. University Press, 1997.

10. Database Language SQL. Information Technology. 21. —. Approximate DataGuides. Palo Alto, California :
[Online] July 30, 1992. Standford University Press, 1999.
xt. 22. Nuescheler, David. David's Model: A guide for blissful
content modeling. Jackrabbit Wiki. [Online] August 22,
11. Li, Zhe and Ross, Kenneth A. On the cost of 2007.
Transitive Closures in Relational Databases. New York,
New York : Columbia University Press, 1993. 23. Priti, Mishra and Margaret, Eich. Join Processing in
Relational Databases. Dallas, Texas : ACM, 1992.
12. Tropashko, Vadim. Trees in SQL: Nested Sets and
Materialized Path. [Online] April 13, 2005.