Sie sind auf Seite 1von 90

Master Thesis

WebData
Definition of a Middleware for Exposing and Accessing
Object-oriented Domain Models as Web Resources

Jan Schulz-Hofen

Supervisors
Shel Finkelstein, SAP Research, Palo Alto, U.S.A.
Prof. Mathias Weske, Hasso-Plattner-Institute, Potsdam, Germany

25 September 2007
Abstract
The term ’Representational State Transfer’ (REST) has gained a lot of attention
over the past years. However, a lot of people still consider it mainly as a lightweight
approach to what has been known as Web Services in the past. REST is in fact
an architectural style for application interaction which induces a focus shift from
behavior (i.e. services) to state (i.e. data).
”WebData” describes a framework which leverages REST as an architectural style
and HTTP as a protocol to allow (a) exposure of business objects in arbitrary domain
models as resources on the World Wide Web and (b) integration of Web resources
into arbitrary application environments for access through an object-oriented API.

Zusammenfassung
Obwohl der Begriff ’Representational State Transfer’ (REST) in den letzten Jahren
einige Aufmerksamkeit erlangt hat, wird REST vielerorts immernoch hauptsächlich
als leichtgewichtiger Ansatz für das gehalten, was in der Vergangenheit unter dem
Namen Web services firmierte. REST ist allerdings vielmehr ein Softwarearchitek-
turstil, gedacht für die Kommunikation und Interaktion zwischen Anwendungen und
Systemen. REST leitet darüberhinaus einen Paradigmenwechsel ein, der darauf
abzielt, den Fokus weg von reinen Funktionsaufrufen und hin zu Daten in verteilten
Systemen zu lenken.
”WebData” definiert eine Middleware, die sich die Konzepte von REST im Sinne
eines Architekturstils und HTTP als Protokol zu Nutze macht, um (a) Business-
Objekte aus beliebigen Anwendungsdomänen als Ressourcen im World Wide Web
verfügbar zu machen und (b) den Zugriff auf solche Resourcen in beliebigen Anwen-
dungsumgebungen über eine objektorientierte Schnittstelle zu ermöglichen.
Ich erkläre hiermit, dass ich die vorliegende Arbeit selbständig verfasst und keine
anderen als die angegebenen Quellen und Hilfsmittel verwendet habe.
Berlin, den 25. September 2007
Acknowledgments
The research work presented in this thesis has been carried out from August 2006
to March 2007 at the research centers of SAP Labs in Montréal, Canada and Palo
Alto, U.S.A. While being a student at Hasso-Plattner-Institute (HPI) the author
was employed as a research assistant by SAP Research and worked within the Ad-
vanced Web Technologies team for his Master’s thesis.

I would like to thank Shel Finkelstein (SAP) for many fruitful discussions and very
inspiring collaboration. I would also like to thank Mathias Weske and the members
of his Business Process Technology chair at HPI, especially Hagen Overdick, for
their continuous advice. Furthermore, I would like to thank Cedric Ulmer (SAP)
and Bernd Schäufele (HPI) for their valuable feedback during the application of
my work in their research projects, Nolwen Mahé and Anne Hardy (both SAP) for
steadily supporting me during my time at SAP Research, and Gero Decker, Johannes
Nicolai, and Volker Gersabeck (all HPI) for their comments on my thesis. Thanks
goes also to James Hogg et Monsieur Brian Bauer for spell checking and general
corrections.
Contents

1 Introduction 1
1.1 Web services and service-oriented architectures . . . . . . . . . . . . . 2
1.2 Web resources and representational state transfer (REST) . . . . . . 5

2 Preliminaries 9
2.1 Object-oriented domain models . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 The Active Record pattern . . . . . . . . . . . . . . . . . . . . 11
2.2 The Hypertext Transfer Protocol (HTTP) . . . . . . . . . . . . . . . 12
2.2.1 Uniform Resource Identifiers . . . . . . . . . . . . . . . . . . . 13
2.2.2 Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.3 Request methods . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.4 Headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.5 Status codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.6 Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3 Atom Publishing Protocol (Atompub) . . . . . . . . . . . . . . . . . . 19
2.3.1 Resources and representations . . . . . . . . . . . . . . . . . . 20
2.3.2 Operation semantics . . . . . . . . . . . . . . . . . . . . . . . 21

3 WebData 23
3.1 URI references for entities in object-oriented domain models . . . . . 24
3.2 Representation types for Web resources exposing entities in a domain
model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.1 Collection representations . . . . . . . . . . . . . . . . . . . . 28
3.2.2 Member representations . . . . . . . . . . . . . . . . . . . . . 30
3.2.3 Value representations . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.4 Schema representations . . . . . . . . . . . . . . . . . . . . . . 34
viii Contents

3.2.5 Mixed collection representations . . . . . . . . . . . . . . . . . 36


3.3 Exposure of object-oriented domain models as Web resources (server
part) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3.1 Operation semantics for Web resources standing for applica-
tion data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3.2 Query mechanisms for Web resources . . . . . . . . . . . . . . 43
3.3.3 Authentication and authorization for secure resource access . . 46
3.3.4 Prefetching support . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3.5 Expiration information and cache validation . . . . . . . . . . 48
3.4 Object representation and mapping for REST Resources (client part) 49
3.4.1 Discovering, creating, reading, updating, and deleting Web
resources like objects . . . . . . . . . . . . . . . . . . . . . . . 50
3.4.2 Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.4.3 Authentication support . . . . . . . . . . . . . . . . . . . . . . 58
3.5 Request-based content negotiation for representation formats . . . . . 59
3.6 Concurrency and transactional behavior . . . . . . . . . . . . . . . . 59
3.6.1 Lost updates and optimistic concurrency . . . . . . . . . . . . 59
3.6.2 Transactions involving multiple resources . . . . . . . . . . . . 62

4 WebData for the SAP


R
Cross Application Timesheet architecture 63
4.1 SAP R
Cross Application Timesheet and the Business Application Pro-
gramming Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.1.1 Personnel time management and Cross Application Timesheet 63
TM
4.1.2 mySAP integration through the Business Application Pro-
gramming Interface . . . . . . . . . . . . . . . . . . . . . . . . 65
4.2 A widget for time entry through CATS . . . . . . . . . . . . . . . . . 66
4.2.1 Domain model . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.2.2 Authorization . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.2.3 Widget application . . . . . . . . . . . . . . . . . . . . . . . . 68

5 Conclusion and related work 69


5.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.1.1 Google Data API . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.1.2 Queso . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.1.3 Service Data Objects . . . . . . . . . . . . . . . . . . . . . . . 70
5.1.4 Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Contents ix

Bibliography 73
List of Figures

1.1 The SOA triangle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1 Sample object-oriented domain model in UML . . . . . . . . . . . . . 10


2.2 Sample Active Record . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Basic HTTP Architecture . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 Basic HTTP Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.1 Sample architecture using WebData . . . . . . . . . . . . . . . . . . . 24


3.2 Entities in an object-oriented domain model and their mapping to
resource types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 Basic representation types . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4 Sample request-response interaction using HTTP authentication . . . 46
3.5 Sample request-response interaction using optimistic concurrency . . 61

4.1 SAP
R
transaction CATSXT for time entry . . . . . . . . . . . . . . . 64
4.2 CATS BAPI
R
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.3 CATS widget system architecture . . . . . . . . . . . . . . . . . . . . 66
TM
4.4 Domain model for time entry using CATS in mySAP . . . . . . . . 67
4.5 CATS widget screenshot . . . . . . . . . . . . . . . . . . . . . . . . . 68
Listings

2.1 Uniform Resource Locator . . . . . . . . . . . . . . . . . . . . . . . . 13


2.2 Request message . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Response message . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Header line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5 Collection document . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.6 Member document . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1 URI Scheme extension for WebData . . . . . . . . . . . . . . . . . . . 26
3.2 Collection representation in extended Atom . . . . . . . . . . . . . . 29
3.3 Collection representation in JSON . . . . . . . . . . . . . . . . . . . . 30
3.4 Member representation in extended Atom . . . . . . . . . . . . . . . 32
3.5 Member representation in JSON . . . . . . . . . . . . . . . . . . . . . 32
3.6 Value representation in XML . . . . . . . . . . . . . . . . . . . . . . 33
3.7 Value representation in JSON . . . . . . . . . . . . . . . . . . . . . . 33
3.8 Sample Relax NG schema representation . . . . . . . . . . . . . . . . 34
3.9 Sample Relax NG schema representation (generic part) . . . . . . . . 35
3.10 Sample Relax NG schema representation (specific part) . . . . . . . . 36
3.11 Mixed collection representation in extended Atom . . . . . . . . . . . 37
3.12 Usual form for URI query parameter . . . . . . . . . . . . . . . . . . 44
3.13 Free URI queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.14 Sample queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.15 Sample authorization rule set embedded in the definition of an Ac-
tiveRecord class. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.16 Sample class collection specific class definition (Ruby) . . . . . . . . . 50
3.17 Sample discovery with prefetch call (Ruby) . . . . . . . . . . . . . . . 57
3.18 Sample class collection specific class definition with look-ahead con-
figuration (Ruby) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
1. Introduction

Enterprise software has been built following traditional client-server architecture


patterns for a long time while consumer software has been deployed on client ma-
chines, running autonomously. However, with the rise of networking technologies,
the Internet, and rich browser-based user interface techniques, even in consumer
software, a shift in paradigms can be identified over the past few years: more and
more applications are provided using the World Wide Web, relieving the burden of
installation and deployment on client machines. Applications are hosted on central
servers and users interact with it using rich Web applications running in a Web
browser. User interfaces are built using concepts like the Hypertext Markup Lan-
guage (HTML, [W3C99a]), JavaScript [ECM99] and the Hypertext Transfer Proto-
col (HTTP, [FGM+ 99]) imitating native desktop applications and their well-known
interaction patterns. This paradigm is widely known under the name software-as-a-
service. A vast number of businesses are evolving in this area at the time of writing
which are referred to by a fuzzy term: Web2.0. Revenue models shift from one-time
fees or license agreements to usage-based billing or alternative opportunities like
advertisement.
This move of applications towards central servers entails a move of data and func-
tionality outside of each user’s realm of control. Interaction and integration issues
between different applications are no longer resolved on the client machine but exist
outside of the user’s reach. Nonetheless users are expecting seamless integration of
functionality and data across the boundaries of different systems. There is even a
distinct kind of software which does nothing but integrate data and functionality
from different applications: Mashups.
As mentioned before, these integration problems have been known in the enterprise
space for a long time and a number of concepts and technologies have constantly
populated the solution space, some of which have become extremely sophisticated
yet highly complex. The different WS-* (i.e. Web services) specifications and the
service-oriented architectural style are the most popular representatives.
Protagonists in consumer-oriented software, however, have been seeking an easier
and pragmatic approach to interaction and have reconsidered the basic principles
which made the Web in itself successful, namely Representational State Transfer
2 1. Introduction

(REST) [Fie00]. REST is an architectural style which takes the service-oriented


approach of application integration to another, more state-centric level. In REST,
interaction types are uniform rather than manifold and resources and the represen-
tations of their state are key concepts. For REST, the following assumptions must
be valid: (a) all named entities are resources, (b) resources can be represented and
representations act as primary means for communication with a resource, and (c)
all resources propose a uniform interface consisting of a defined and closed set of
operations and returning status information.
A lot of enablers and frameworks are being developed at the time of writing1 and the
number of concrete applications which propose and use REST interactions is grow-
ing at a rapid pace. However, most of these approaches target behavioral aspects of
systems rather than state. In a lot of approaches to defining a framework or API
for applications to provide RESTful interactions, application developers still need to
implement procedures, methods or any type of action to implement a resource’s uni-
form interface. In other words, when reconsidering the basic model-view-controller
pattern (MVC, [Ree79]), most current frameworks are treating controllers as first-
class citizens for RESTful interactions.
The main proposal for this thesis is that behavior which implements the uniform
interface should also be uniform, i.e. not subject to change by application developers.
Instead, entities within existing domain models (i.e. the models in MVC) will be
treated as resources and enriched in a way such that they can send and receive
representations to interact using the uniform interface.
The thesis is structured as follows: Chapter 1 gives an introduction and aims at po-
sitioning this thesis within the context of service-oriented architectures and the Web
of resources. Chapter 2 outlines the technologies and concepts which the work pre-
sented in this thesis is built upon or which are leveraged and enhanced to achieve the
described goals. Chapter 3 discusses necessary requirements and specifies a domain
and programming language independent middleware which allows applications (a) to
expose business objects in arbitrary domain models as resources on the World Wide
Web and (b) to integrate Web resources into arbitrary application environments for
access through an object-oriented2 programming interface. Chapter 4 describes a
short case study where the elaborated concepts are put to work in a concrete sce-
nario. Finally, chapter 5 concludes the work presented in this thesis and alludes to
related technologies, approaches and research.

1.1 Web services and service-oriented architec-


tures
The term service-oriented architecture (SOA, [MLM+ 06]) has been stressed a lot
during recent years. Although, there is no such thing as one particular service-
oriented architecture – SOA refers to an architectural style for software systems.
1
E.g. the definition efforts for a Java API for RESTful Web Services (JSR 311, [JSRb]) in
which the author is involved as an expert group member.
2
When using the term object-oriented the author emphasizes the concepts of types, classes,
objects, attributes and their associations. The work described here does not require all aspects of
object-orientation (e.g. inheritance, polymorphisms) in domain models.
1.1. Web services and service-oriented architectures 3

Architectural Style
SOA is meant to be used for interoperation between different software systems which
can potentially be operated by entirely different organizations. A commonly referred
use-case is business-to-business (B2B) interaction. But also for integration of func-
tionality performed by existing systems within the same organization (i.e. Enterprise
Application Integration, EAI), the SOA style is promoted by a large number of pro-
fessional individuals and companies.
According to the SOA style, interfaces between different and potentially heteroge-
neous components should be designed as coarse-grained, loosely coupled and highly
interoperable functions, i.e. services [CHET03]. SOA is independent from concrete
service protocols and can theoretically be implemented using techniques such as
CORBA, SOAP WebServices, RMI, etc.
The concept of calling procedures remotely dates back to 1984 [BN84] and defines
the very foundation of current Web services: a procedure (i.e. method, function) is
called on a remote machine using networking techniques. In order to perform the
call, the calling system can provide several input parameters. The called system
will perform actions depending on received parameters and potentially the state of
other systems (i.e. “world-state” ) and may consequently return a number of output
parameters and invoke changes to the state of other systems (i.e. “side effects”).

Dynamic Invocation
According to Burbeck [Bur00], a main aspect of service-oriented architectures is the
description and organization of services such that they can be discovered and used
dynamically and in a semi-automated or even entirely automated manner. Service
description is realized in dedicated documents which usually follow a standardized
scheme (e.g. WSDL [CCMW01]). Thus, the same type of service could theoretically
be carried out by a number of different providers which should be entirely trans-
parent for the invoking entity. Organization of services is to be done in hierarchical
categories or taxonomies which are managed by dedicated entities within an SOA
landscape, called brokers.
Consequently, any entity within an SOA can play one of the following roles: service
requestor, service provider, or service broker. Figure 1.1 (commonly also referenced
to as the “SOA triangle”) illustrates these basic types of entities and their interac-
tions.

Figure 1.1: The SOA triangle

Service providers register their services with the broker in order to be discoverable by
requestors. Subsequently, requestors can query services and would acquire a service
4 1. Introduction

description including a reference to a concrete provider. Communication between


requestor and provider would then be established directly and the provider would
carry out the requested service – potentially – according to specified parameters and
– again potentially – return data.

Data hiding in current Web services


While many currently deployed services deal with getting or updating some sort
of data, data structures and actual data are internal to services and the service
interface acts as a facade without clearly defined semantics regarding their effects
on data. Data on the service is hidden. While many protagonists in the SOA space
value this as an advantage the author argues that data hiding entails considerable
disadvantages.
Data which lives “inside” a service can not be accessed directly as no coherent ad-
dressing scheme is defined. This has a number of implications.

Link passing
Web services do not offer a mechanism for passing data by reference. Subsequent
actions have to work on values which can quickly become outdated or be very large
and expensive in terms of network transmission.

Indexes and search


Data hiding makes it impossible to index and search data; search functionality has
to be provided by service providers and search results cannot easily spread across
boundaries of providers and their data stores.

Caching
Caching can only be done on the service provider’s side. It has to be carried out
behind the scenes of the service interface and with knowledge of the service’s be-
havior. External caches on the client-side or within the transport infrastructure
would very likely break a service as functionality cannot be stored or imitated by
the cache without concrete knowledge of side-effects which service invocations may
have. Even though many service invocations which perform mere read operations
could be cached, the lack of protocol level information on side-effects prevents their
environment from performing the action effectively.3 As a consequence, network
outages or problems at the provider’s side cause definite service unavailability unless
redundant providers are defined (and consistency is taken care of).

Data structures in current Web services


The vast majority of current programming languages centers around the concept
of object orientation [DN66], whereas Web services focus merely on the procedural
aspects of it as a least common denominator. While the SOA paradigm describes
3
The author acknowledges that recent web service description techniques such as WSDL 2.0
[RMCW07] will include information on side-effects to overcome the mentioned problems on appli-
cation level.
1.2. Web resources and representational state transfer (REST) 5

suggestions for definition and tailoring of services, a majority of Web service imple-
mentations still depend on the basic RPC principle which dates back to 1984 [BN84],
which was long before object orientation had become widely accepted.
Nevertheless, most service requestor and provider implementations will very likely
use object orientation and will have their native application data organized in an
object-oriented (or at least entity-relationship model [Che76] based) structure.
As a consequence, it is commonplace to strip down those data objects or records to
their atomic values and reorganize them according to the signature defined by the
service description document. Usually, this leads to replication of (subsets of) an
application’s data schema and metadata in several places throughout the application,
which opens the floodgates to all types of malfunction due to inconsistencies.

1.2 Web resources and representational state trans-


fer (REST)
Roy Fielding identified Representational State Transfer (REST, [Fie00]) as one of the
main architectural styles which underpin the Hypertext Transfer Protocol (HTTP,
[FGM+ 99]) and thus make the World Wide Web of today a successful platform for
communication and interaction.
While the concepts behind Web services and SOA (as in 1.1) can be grasped rel-
atively quickly, REST tends not to become completely meaningful without a con-
crete definition of operations and semantics. REST will consequently be introduced
by mentioning its four core principles, their application in the Hypertext Transfer
Protocol, and related specifications. The principles are: resources, representations,
uniform interface, and stateless communication.

Resources
According to Fielding, a resource can be any piece of information which can have a
name. Given examples are documents like hypertext or images, but also temporal
services (e.g. “today’s weather in San Francisco”), real-world objects (e.g. a person),
or concepts. A resource can also be a collection of other resources. A resource name
does not imply the existence of that particular resource. However, if the resource
exists, other entities may be allowed to interact with it. Resources are thus active
components which have behavior and state. They expose interaction mechanisms to
allow for state transition and state representation.
On the Web, resources are identified by Uniform Resource Identifiers (URIs, [BLFM05])
which comprise the more commonly known Uniform Resource Locators (URLs). For
example, a valid (and suitable) resource identifier for the resource Jan Schulz-Hofen
could be http://jpgmag.com/people/yeah while http://jpgmag.com/people would
be the identifier for the collection of person resources.

Representations
While the resource itself is the specific concept identified by its name (cf. “to-
day’s weather in San Francisco” ), a representation is an actual (“tangible”) but
volatile piece of data describing the state of a resource or parts of it. Furthermore,
6 1. Introduction

representations contain metadata describing the representation and eventually the


resource itself. Representations can contain names of other resources which can be
interpreted as reference to that resource. When interacting with a resource, rep-
resentations describing the current state of a resource can be retrieved. Moreover,
representations can be sent to a resource to request a state transition – in this case
the representation describes the intended state for the resource. While resources
are active components, representations are passive as they only describe a resource’s
state. They do not have behavior or state of their own.
Data which could potentially be retrieved using a Web browser and the URI http:
//jpgmag.com/people/yeah for instance, would represent the resource Jan Schulz-
Hofen or certain aspects of it4 . The representation would possibly contain links to
other resources (e.g. http://photos.jpgmag.com/25082 17727 a0745f8095 m.jpg 5 .).

Uniform Interface
As opposed to SOAP Web services where every port type can define a number of
custom operations [CCMW01], resources in a REST style landscape share a common
and well-defined limited set of operations and returning status information which
compose their interface. Entities wishing to interact with a resource are required to
use one of these operations. A type of operation can be (a) safe, which means that it
does not have an effect on the resource, (b) idempotent, meaning that an operation
being carried out once has an identical effect on the resource as it would have if
carried out more often, or (c) neither (a) nor (b). For some operations it is allowed
to supply a representation which describes the intended state of the resource after
transfer. In most cases, a response will include a representation as well. Returning
status information can encompass information on whether the operation succeeded
or failed, on reasons and responsibility for failure, and on consequences.
The HTTP protocol, for instance, defines eight different operations, their proper-
ties and semantics, as well as a set of 41 status codes classifying returning status
information. Four operations are commonly known and will be explained in 2.2.3 .
Those operations are: GET, POST, PUT, and DELETE. Relevant status codes for this
work will be explained in 2.2.5.

Stateless Communication
While resources have a certain state and potentially change state over time, all
interactions with a resource are stateless: session state (if any) is not maintained
by the resource, but rather by the entities interacting with it. As a consequence,
each request can be understood by itself and without knowledge of the preceding
operations. This entails the requirement for representations to include all necessary
data for the intended interaction.

Protagonists from industry and academia are sometimes referring to SOA and REST
as two entirely different and competing concepts these days. However, after closer
4
Some ego information, as of January 2007 to be precise.
5
This URI identifies a photograph which has been taken by Jan Schulz-Hofen.
1.2. Web resources and representational state transfer (REST) 7

examination, one could conclude that REST fits into the larger scheme of SOA quite
nicely: It could be argued that resources as they are defined in REST are indeed
service providers which have a name, propose a number of services (i.e. operations),
expect parameters to operate on (i.e. representations) and return resulting data
(again representations). Reconsidering the principles of REST, the limitation to
a defined set of generally interpretable operations including an agreement on their
semantics and the consequent use of URI references throughout representations can
be considered a restriction to SOA. This makes REST systems a more concrete
subset of the vast number of service-oriented landscapes, but does not contradict its
fundamental idea.
Hence, we believe that the latter restrictions lead to clearer understanding of and
simpler interaction with providers, operations, and messages which makes REST a
promising new approach to service-orientation6 . This is underpinned by the fact
that large companies such as Amazon7 , eBay8 and Google9 are already providing
REST interfaces to their data. Moreover, architecting systems according to REST
concepts induces a change in thought: while traditional Web services tailoring is
merely centered around functionality, the definition of REST resources, operations
and representations involves considerations about structured data access, state of
system components and their relations with each other. While in traditional SOA
landscapes many services only consist in data access, a lot of business functionality
can be refactored using a restricted set of operations and a well-tailored definition of
resources and relations [Cra07]. In consequence, a REST style SOA would propose
openly accessible, semantically well-defined but restricted data-driven interfaces to
functionality. This contrasts to traditional SOA landscapes where a wealth of differ-
ent services with heterogeneous parameter structures are sometimes rendering access
to encapsulated data complicated and error-prone.

6
Interestingly, in recent days one was able to observe a number of applications which offer their
services using the HTTP protocol and expose their functions as URI endpoints while misleadingly
calling this REST. Those interfaces have been entitled “Service-trampled REST” by Duncan Cragg
[Cra06].
7
http://docs.amazonwebservices.com/AmazonS3/2006-03-01/RESTAPI.html
8
http://developer.ebay.com/developercenter/rest/
9
http://code.google.com/apis/
2. Preliminaries

This chapter describes the technologies and concepts which the work presented in
this thesis is built upon or which are leveraged and enhanced to achieve the described
goals. Those are oject-oriented domain models, the Hypertext Transfer Protocol, and
the Atom Publishing Protocol.

2.1 Object-oriented domain models


The term domain model has been coined by Martin Fowler and is described in
[Fow02]. At its very basis, the domain model architecture pattern describes an ar-
chitectural layer within an application that replicates concepts of the real world (i.e.
the domain) which an application has to deal with. While its main purpose is data
storage and the handling of persistent data records which represent domain enti-
ties, enforcement of business rules and modeling of behavior are also key objectives.
Fowler describes it thus:

At its worst business logic can be very complex. Rules and logic de-
scribe many different cases and slants of behavior, and it’s this complex-
ity that objects were designed to work with. A Domain Model creates a
web of interconnected objects, where each object represents some mean-
ingful individual, whether as large as a corporation or as small as a single
line on an order form. [Fow02, p. 116]

In a domain model, every aspect of the application domain which is relevant to the
application is modeled and represented close to reality. Moreover, domain models
are meant to be as independent as possible from the actual software system they
are used in. In terms of object-orientation [DN66], one will usually find a class
for every type of entity in the domain and instances of that class are meant to
represent concrete incarnations of that type. The task of those domain objects is to
capture aspects of the real world entities and support the application with its task
to handle (i.e. manage, store, interact with) them. Following concepts of object-
orientation, those classes will specify attributes for themselves (class attributes) and
10 2. Preliminaries

for their instances (instance attributes) which allow to store data which is related to
the respective entity. Furthermore, they will define methods (again on class and on
instance level) which represent behavior that the corresponding entities can perform.
Hence, a class definition ties together process and data structure which belong to
conceptually related entities. Object instances of the same class share the same set of
attributes and behavior – they will, however, have different values and state during
their life cycle. Another important characteristic for classes in a domain model (and
their instances) are associations. Classes define associations to other classes which
are usually named according to the role they play in the context of the associated
classes and quantified according to their possible multiplicity.
Figure 2.1 illustrates a sample domain model in a UML class diagram.

Figure 2.1: Sample object-oriented domain model in UML

In figure 2.1, three classes of objects are defining a domain model: customer, order
and product. The classes define attributes for their respective object instances (e.g.
firstname, lastname, etc.). Furthermore, the domain model defines a behavior called
calculate total price which is bound to order instances, meaning that its execution
will only make sense in the context of a concrete order object instance. The following
basic distinction between method types can be made when investigating domain
models: business methods and accessors. Business methods define behavior which
is carried out within the scope of an object or class and which realizes some sort
of business functionality or process. Business methods are explicitly disregarded for
the work in this thesis and the author’s belief is that many of this functionality
can be refactored [Cra07] into standard behavior of entities in domain models (see
below). Accessors, however, are relevant to the work presented here. Accessors are
methods which are exposing read (i.e. getters) and write (i.e. setters) functionality
to attributes defined within an object instance. One should differentiate between
basic accessors which are exclusively performing the described behavior of read and
write and augmented accessors which may perform value-based transformations or
calculations before or after attribute access. Augmented accessors are also used to
realize virtual attributes. Virtual attributes are attributes which do not actually exist
on the respective instance. However, accessors are in place and perform a certain
behavior which emulates the existence of the respective attribute (e.g. while working
on other attributes internally). Ideally, calculate total price would be refactored to
a virtual attribute called total price that would be available through an accessor
which sums up the prices of all associated products and thus returns a total price.
In figure 2.1, the classes customer and order and order and product are associated,
which is expressed using an edge. Cardinalities are annotated and express that (a1)
one customer object can have associations to none or multiple order objects, (a2) an
order object has exactly one association to a customer object, (b1) an order object
2.1. Object-oriented domain models 11

can have associations to none or multiple product objects and (b2) a product object
can have associations to none or multiple orders. Note that the domain model shown
in figure 2.1 will serve as an example case for many concepts presented in chapter 3.
As mentioned before, one main use for domain models is its capability of providing an
interface to persistent storage. In many cases, entities in domain models can easily be
mapped to database structures where the Active Record pattern (see 2.1.1 below) can
be used to deal with mapping in an automated manner. Fowler differentiates between
simple and rich domain models, where simple models are basically very similar to the
database design and mapping is straightforward. Simple models encompass classes
and their instances, attributes, behavior and associations. Rich domain models
bring inheritance and strategies to the table as well as a non-trivial mapping to
actual entities in database design.
The work described in this thesis relies on simple domain models which exclude
inheritance and strategies and focus more on a straight-forward database mapping
mechanism, such as the Active Record pattern.

2.1.1 The Active Record pattern


The Active Record pattern has also been described by Fowler [Fow02] and consists
of a pattern which concretizes the domain model pattern. Active Record defines
special behavior for classes and instances in a domain model which is responsible for
interaction with persistent storage (e.g. a database). Fowler describes it as follows:

An object carries both data and behavior. Much of this data is persis-
tent and needs to be stored in a database. Active Record uses the most
obvious approach, putting data access logic in the domain object. This
way all people know how to read and write to and from the database.
[Fow02, p. 160]

The obvious mapping which is the essence for Active Record is that of database
concepts to object-oriented principles: a table maps to a class, a row maps to an
instance, a column maps to an instance attribute, a foreign key relationship maps
to an association. Figure 2.2 illustrates the concept.

Figure 2.2: Sample Active Record

In order to realize the mapping in a generic and automated way for a domain model,
Active Record basically adds the following behavior to classes and instances:
12 2. Preliminaries

1. Discovery of one or more data records in persistent storage and construction


of a domain model object instance to represent the record. Realized through
find behavior on class level.

2. Creation of a new domain model object instance which will be able to carry
data to be mapped to a new record in persistent storage. Realized through
create behavior on class level.

3. Storage of data carried in the domain model object instance into the corre-
sponding cells of the database. Realized through save behavior on instance
level.

4. Destruction of a domain model object instance and removal of the correspond-


ing database row. Realized through destroy behavior on instance level.

In concrete Active Record implementations (such as Ruby on Rails’ ActiveRecord,


[H+ 04a]), discovery behavior can actually be provided by a number of different finder
methods per class, where generic finders will only query the database based on
basic SQL SELECT statements supplying columns and values while custom finders can
encapsulate more complex custom SQL queries. Creation of a domain model object
instance goes along with an INSERT statement against the database and mentioning
the table which corresponds to the current model class. Storage will carry out an
UPDATE statement affecting the row which corresponds to the current model object
instance and destruction will issue a DELETE.

2.2 The Hypertext Transfer Protocol (HTTP)


The Hypertext Transfer Protocol is the concrete specification for interactions and
messages which are the foundations for today’s World-Wide Web. Definition efforts
started with version HTTP/0.9 back in 1990 and have led to the currently latest
and widely adopted HTTP/1.1 specification which is published in [FGM+ 99]1 . This
section will present the basic considerations behind HTTP/1.1 and introduce some
of the more advanced concepts which are relevant for the thesis work in chapter 3.

Figure 2.3: Basic HTTP Architecture

HTTP is an application layer protocol which acts on top of the Transmission Control
Protocol (TCP, [Pos81]) according to the OSI Reference Model [ISO84]. HTTP
follows the traditional client-server pattern, where servers usually run on dedicated
server machines serving documents which describe the hosted resources. HTTP
1
When mentioning HTTP is this document, the author refers to HTTP/1.1 as defined in
[FGM+ 99].
2.2. The Hypertext Transfer Protocol (HTTP) 13

usually uses TCP port 80 for unencrypted communication and 443 for Transport
Layer Security (TLS, [DR06]) encrypted communication respectively. Clients will
send requests to servers using a unique address which specifies the concrete machine
and port and the concrete resource or resources. Clients can transmit messages with
a request and will receive a response from the server which may again contain a
message. Every HTTP communication is triggered by the client. Servers can not
contact or notify clients2 . Proxy servers may cache HTTP messages in order to
attain improved performance or to overcome outages.

2.2.1 Uniform Resource Identifiers


The basic principle of names identifying resources as presented in 1.2 is realized
by Uniform Resource Identifiers (URIs) which are defined in [BLFM05]. There
are two sub concepts to URI, namely Uniform Resource Locators and Uniform Re-
source Names. URLs can be differentiated from URNs by the fact that they provide
information about the primary location of the referenced resource in addition to
identifying it. Obviously, locating a resource is one of the relevant mechanisms used
in HTTP.
According to [BLFM05, p. 16] an HTTP URL is generally3 defined as in listing 2.14
Listing 2.1: Uniform Resource Locator
URL = scheme "://" authority path - abempty [ "?" query ] [ "#" fragment ]

In HTTP, scheme can have the values "http" and "https" which refer to HTTP and
HTTP over TSL respectively. authority is used to locate a specific machine on the
Web and can thus be an IP address or any resolvable name [Moc87a], [Moc87b] and
optionally authentication credentials and port information. path-abempty can be empty
or a hierarchical path expression to identify a specific resource on the machine. query
can be used to further identify a resource using non-hierarchical data and fragment is
a means of indirect identification of a secondary resource (e.g. subset of the primary
resource, specific view on representations of the primary resource).

2.2.2 Messages
HTTP messages are used for client to server (request) and server to client (response)
communication.
A request message contains at least a request line which specifies the request method,
identifies the resource (using the path-abempty [ "?" query ] [ "#" fragment ] part of the
URI, [BLFM05]) and the used HTTP version (i.e. "HTTP/1.1"). Furthermore, it can
contain an entity (i.e. the representation of the resource). Both message and entity
can contain a number of headers which describe either the request itself or the entity
body. Listing 2.2 shows a simplified example for a request message.
Listing 2.2: Request message
GET / people / yeah HTTP /1.1
Accept : text / html
2
“Asynchronous” behavior however, can be emulated using threading and continuous polling on
the client (e.g. XMLHttpRequest [vK07] , sometimes imprecisely referred to as “Ajax”).
3
Refer to [BLFM05, p. 16] for more detailed information on URI grammar.
4
Unless otherwise stated, URI schemes are defined in Augmented Backus-Naur Form (ABNF,
[CO97]) throughout this thesis.
14 2. Preliminaries

Each request yields a server response which is expressed using a response message.
A response can be composed of a status line, an entity body and a number of headers
which describe either the request itself or the entity body. A status line is composed
of the HTTP version the server operates with, a status code and a reason phrase.
The status phrase is a human-readable phrase which is used to briefly explain the
status code. Listing 2.3 shows a simplified example for a response message.

Listing 2.3: Response message


HTTP /1. x 200 OK
Content - Type : text / html ; charset = UTF -8
< html >
< head >
< title > JPG Magazine : People : Jan Schulz - Hofen </ title >
</ head >
< body >
...
</ body >
</ html >

2.2.3 Request methods


The request methods implement the Uniform Interface concept described in 1.2 for
HTTP. In addition to the aforementioned methods GET, POST, PUT, and DELETE which
are considered most important for the work presented in this thesis, HTTP defines
the methods OPTIONS, HEAD, TRACE, and CONNECT. The specified semantics for each of the
eight methods are as follows:

GET is employed to obtain a representation of the identified resource. It is defined


to be safe [FGM+ 99, 9.1.1] which means that it should not have any effect
other than retrieval. Furthermore, it is defined to be idempotent [FGM+ 99,
9.1.2], i.e. the side-effect of multiple identical requests is the same as for a single
request. Obviously, the actual compliance with safety and idempotence is at
the discretion of the respective server and resource. However, the definition
of those properties along with the uniform interface ensure that clients can
explicitly request operations with or without side-effects.

POST is used to request a state change on the server. A POST operation can contain
a representation. The semantics of POST are that the represented resource will
be stored and appended as a subordinate of the resource which is referenced by
the URI that the POST operation is directed to. It is very likely that the resource
identified by the latter URI is in fact a collection of resources. Usually (but
not necessarily), the URI of the newly created resource and its representation
will be advertised back to the initiator of the operation. POST is neither safe
nor idempotent.

PUT operations request that either an existing resource referenced by the request
URI is being updated or a non-existent being created according to the enclosed
representation. Consequently, PUT is idempotent, but not safe.

DELETE requests that the resource referenced by the given URI is removed. The
requesting entity is not expected to send a representation along. As DELETE
yields the removal of the resource, it is not safe, but idempotent.
2.2. The Hypertext Transfer Protocol (HTTP) 15

OPTIONS is used to discover information about communication options available


for a specific resource without actually initiating an interaction (e.g. retrieval)
with it. OPTIONS is safe and idempotent.

HEAD has an identical behavior as compared to the GET method except that no
entity body (i.e. no resource representation) is returned by the server. It is
consequently safe and idempotent as well.

TRACE is used for testing purposes. It initiates an application layer loop-back of


the request message, i.e. the server receiving the message returns the received
message as entity body back to the client. TRACE is safe and idempotent, as no
specific resource is involved in this testing request.

CONNECT has been reserved in the HTTP specification for use with proxies
which are able to switch to tunneling mode [Luo98] dynamically.

2.2.4 Headers
HTTP specifies a number of headers which can be used in either request or response
messages or both. An HTTP header is expressed as a single line containing a colon-
separated key value pair as shown in listing 2.4. The following paragraph will briefly
mention those headers which are relevant to the work presented in this thesis. Their
applicability to message types (i.e. request, response or both) is denoted in brackets.

Listing 2.4: Header line


message - header = field - name ":" [ field - value ]

Accept (request) can be used to specify a number of media types or ranges [FGM+ 99,
3.7] which are acceptable as responses and their respective options. HTTP
specifies a sophisticated tuning mechanism which allows clients to gradually
specify their preference. As a basis for the work presented in this thesis, it
is sufficient to understand that the Accept header can be used to specify the
desired media type.

Content-Type (request, response) is used to specify the media type of the entity
enclosed with the message.

Authorization (request) is used to authenticate the client towards the server. The
field value will then carry the user’s credentials which the server uses to de-
termine their validity. Credentials can be expressed in a number of different
encryptions standards [FHBH+ 99].

WWW-Authenticate (response) is used to inform the client about the authenti-


cation scheme supported by the server [FHBH+ 99].

Last-Modified (response) is used as meta information about the entity enclosed


in a response. It specifies the date and time at which the origin server believes
that the resource was last modified.
16 2. Preliminaries

If-Modified-Since (request) The set of If-* headers is used to make an operation


conditional. The actual behavior and response for a given request depends of
the true or false evaluation of the specified conditional expression. This is used
for caching and version control purposes and is further explained in 2.2.6.
is used to specify a timestamp. The server will perform the
If-Modified-Since
requested operation if the entity has been modified after the given point in
time.
If-Unmodified-Since respectively requests that the operation is being carried out
if the entity has not been changed since the given timestamp.
Etag (response) is used as meta information about the entity enclosed in a response.
Conceptionally, an Etag (i.e. entity tag) specifies the current version of a repre-
sentation (i.e. entity). Etag, If-Match, and If-None-Match (see below) can be used
as an alternative to timestamps in cases where time differences or inaccuracies
in timestamps may occur.
If-Match is used to specify a (number of) entity tag(s) or the * operator to match
the current version (expressed as ETag) of a resource.
If-None-Match (request) is similar to If-Match but evaluates to true if the current
entity tag of a representation does not match any of the specified tags.
Location (response) is used to either redirect the client to another location or to
identify a newly created resource. For the work presented in this thesis, the
latter is more important. A Location header is usually received after requesting
the creation of a resource using POST or PUT.
Retry-After (response) is used when the requested resource is not (yet) available.
The server will specify a point in time or interval that it expects the client to
wait before it tries again.
Expires (response) is used to inform the client of the expected expiration date
and time of the current representation. Clients are expected to not use the
representation after the given timestamp.

2.2.5 Status codes


Status codes are employed to inform the client about the attempt to understand
and satisfy the request. The following paragraphs mention status codes which are
relevant to the work in this thesis and a brief description of their meaning. For full
specification refer to [FGM+ 99, 10].

200 OK indicates that the request has been successful.


201 Created indicates that a new resource has been created as requested by the
client. The response can contain a Location: header indicating the URI of the
newly created resource.
204 No Content indicates that the request has been successful but no entity-body
has to be returned. However, new meta-data (in headers) may be available for
the resource.
2.2. The Hypertext Transfer Protocol (HTTP) 17

304 Not Modified indicates that the resource has not been modified according to
the version or timestamp the client has used in its conditional request.
401 Unauthorized indicates that the request can not be performed if the client
does not authenticate with the resource.
403 Forbidden indicates that the request is not allowed. This reason can either
be that the request is generally forbidden or forbidden for the currently au-
thenticated client.
404 Not Found indicates that no resource can be located using the given request
URI.
405 Method Not Allowed indicates that the resource is available but does not
allow the used request method.
406 Not Acceptable indicates that the resource can not supply representations
in a content-type which the client can accept (and mentioned using a Accept:
header).
409 Conflict indicates that the requested operation could not be performed be-
cause the client expected the resource to be in a state different from its current
one. Refer to 3.6.1.
500 Internal Server Error indicates that the request did not succeed for a rea-
son which the client can not account for. The response should include an
explanation of the cause for the failure if possible.

2.2.6 Caching
One of the main features of HTTP/1.1 is its precise specification of caching func-
tionality, control, and algorithms. It entails one of its main advantages over Web
service based approaches to system interoperability, because of the fact that (multi-
ple) caches can be established on the line between clients and servers.
The semantics defined by the uniform interface and the caching directives in HTTP
headers allow far more flexibility. As other approaches would have to specify caching
semantics themselves and negotiate them with clients, service providers are likely
to limit themselves to perform caching behind the invocation boundary (i.e. the
interface) of a service. Obviously, this results in more load on the provider’s side
and reduced dependability due to the fact that this architecture imposes the single
point of failure problem.
Figure 2.4 illustrates the basic architecture of HTTP servers and proxies and denotes
that (a) caches can be on the line between clients and servers, (b) clients can interact
with caches or servers directly while using an identical uniform interface, (c) caches
can interact with original servers or cascaded caches, and (d) clients can keep their
own cache of representations in order to allow performant local operations.
Caching is usually carried out by retrieving response messages from origin servers
and by storing them for future communication with clients. The goal of caching in
HTTP is to reduce the number of network roundtrips and bandwidth requirements
while maintaining a high level of semantic transparency for end users and client
applications.
18 2. Preliminaries

Figure 2.4: Basic HTTP Caching

Semantic transparency
The term semantic transparency refers to the ideal cache behavior in which end
users and applications do not perceive any semantic difference in interactions with
remote resources due to caching. However, caching entails a number of problems
regarding detection of validity of cached data and – in case of modifying operations
– concurrent access and conflicts.
In general, semantic transparency is inversely proportional to the gain in perfor-
mance that caching results in. In respect to that, HTTP allows for relaxated trans-
parency and defines mechanisms to establish an equilibrium between performance
and semantic transparency. Relaxation can be requested or denied by end users and
origin servers and warnings are defined by the protocol in order to notify users and
client applications about relaxed transparency.

Expiration model
One mechanism for relaxed transparency is the expiration model which is defined
in HTTP: Usually, a cached response message is expired if the origin server would
return a different response at the moment it would receive an identical request.
Ideally, caches would know when this moment occurs in order to acquire a fresh
response message to store. HTTP defines an expiration model (refer to [FGM+ 99,
13.2] for full specification) which attempts to approximate this behavior by a number
of different mechanisms. Expiry detection can either be based on different client
side calculations and heuristics or on server-specified predictions and annotation of
messages using the Expires: (cf. 2.2.4) header.

Validation model
Technically, a client application or cache which stores representations according to
the expiration model, would have to refresh its cached messages after they have been
considered expired. However, the calculated or predicted expiration of a message
does not imply that the actual representation has indeed become invalid. Thus, a
reload of the entire representation could yield unnecessary network usage in some
cases.
It seems pertinent for a client or cache to check with the respective origin server
whether or not a fresh representation should be acquired before the actual transfer
is started. HTTP defines conditional methods and validators to combine those two
actions into one. Both are expressed using special headers mentioned in 2.2.4. The
2.3. Atom Publishing Protocol (Atompub) 19

Last-Modified header combined with the respective If-Modified-Since and If-Unmodified-


Since headers operate on a point-in-time basis whereas the Etag and If-Match and If-
None-Match headers work with concrete version identifiers (e.g. consecutive integers).
Once one of the If-* request headers is given, the requested operations (e.g. GET,
POST, . . . ) are only carried out by the respective server, if the conditions match.
Otherwise the server will respond using the status code 304 Not Modified (cf. 2.2.5)
indicating that the cached representation is still fresh.

Side effects and disconnected operation

Caching in HTTP is not possible or pertinent in all cases. It is actually not possible
for all operations which have side effects. However, the clearly defined semantics
regarding side-effects (cf. safeness and idempotence in 2.2.3) in HTTP allow for
accurate and automated decision on whether caching is appropriate or not.
Moreover, in some cases it is required that even operations with side-effects are car-
ried out on locally stored or cached data and that modifications are then written back
to origin servers at a later point in time. This is specifically useful for disconnected
clients (e.g. on mobile devices) and applications which propose a large amount of
modification options to users where synchronous modifications would yield an unac-
ceptable decrease in performance. In fact, those modifications immediately lead to
the Lost Update Problem described in [FL99]. While the validation model is designed
to help detect these kinds of problems, section 3.6.1 will discuss a similar approach
to the problem which has been chosen for the work presented in this thesis.

2.3 Atom Publishing Protocol (Atompub)


While REST (cf. 1.2) and HTTP (cf. 2.2) define the basic foundation for “dis-
tributed, collaborative, hypermedia information systems” [FGM+ 99, 1.1], the Atom
Publishing Protocol (Atompub, [GdH06]) aims at defining enhanced semantics for
HTTP operations, URIs and representations when dealing with collections and mem-
bers, such as feeds and entries in publishing.
Atompub is being defined5 by the Internet Engineering Task Force’s (IETF) Atom
Publishing Format and Protocol Working Group whose protagonists are Paul Hoff-
man and Tim Bray. The working group’s goal is to define a

feed format for representing and a protocol for editing Web resources
such as Weblogs, online journals, Wikis, and similar content. The feed
format enables syndication; that is, provision of a channel of information
by representing multiple resources in a single document. The editing
protocol enables agents to interact with resources by nominating a way
of using existing Web standards in a pattern. [HB07]
5
At the time of writing, the Atompub protocol specification is still in the RFC Editor Queue.
The work in this thesis is based on draft 17 which has been submitted on Jul 11, 2007. The parts of
this thesis which are based on Atompub may become incoherent with future versions of Atompub.
However, the author believes that the concepts remain valid and changes in Atompub may be easily
transferred onto the work presented here.
20 2. Preliminaries

Atompub messages are based on the Atom Syndication Format (Atom, [NS05]) which
it employs as standard data format for representations.
While this section introduces the Atompub and the parts which are relevant to the
work presented in this thesis, section 3 describes how concepts of Atompub are
interpreted, extended, mapped to other serialization formats, and applied to more
generic data.

2.3.1 Resources and representations


Atompub basically defines four types of representations: collections, members, ser-
vice documents, and category documents. For the work presented in this thesis, only
collection and member representations are relevant. Please refer to [NS05] to learn
more about service and category documents.

Collections
A collection resource can be associated with one or more member resources. Thus, a
collection document mentions a number of URIs referencing member resources and
may give a brief or full representation of each of them. A collection resource can be
interpreted as the set of member resources it is associated with.
Collection documents are serialized as XML [XML98] files following Atom. The
root node of this serialization is atom:feed6 and must contain at least the nodes atom:
id mentioning a unique identifier for the collection, atom:title mentioning its title in
a human-readable language, and atom:updated mentioning the point in time when the
resource had its last significant update. A feed can contain one or more atom:entry
nodes which represent member resources belonging to this collection.
An atom:entrynode in a collection document must –again– contain at least the fields
atom:id, atom:title, and atom:updated containing information as outlined above but
describing the entry instead of the feed respectively. Usually, entry representations
within a collection representation only list the non-optional properties and a URI
reference which points to the entry resource itself, such that a full representation
can be retrieved using a separate GET statement.
Listing 2.5 shows a sample collection document which is minimal with respect to
required and optional nodes as specified by Atompub and Atom
Listing 2.5: Collection document
< feed xmlns =" http :// www . w3 . org /2005/ Atom " >
< title > A feed of complete nonsense </ title >
<id > urn : uuid :43 gadc91 -7624 </ id >
< updated >2004 -01 -14 T19 :31:03 Z </ updated >
< entry >
< title > Amok - Powered Robots Run Atom </ title >
< link href =" http :// example . org / articles / atom04 "/ >
<id > urn : uuid :9 fa59bc1 -75 d4 </ id >
< updated >2004 -01 -14 T19 :31:03 Z </ updated >
</ entry >
< entry >
< title > Atom - Powered Robots Run Amok </ title >
< link href =" http :// example . org / articles / atom03 "/ >
<id > urn : uuid :1225 c695 - cfb8 </ id >
< updated >2003 -12 -13 T18 :30:02 Z </ updated >
</ entry >
</ feed >

6
XML elements and attributes which are defined for Atom in the http://www.w3.org/2005/Atom
namespace are denoted with the prefix atom:
2.3. Atom Publishing Protocol (Atompub) 21

Members
A member resource can belong to one or more collections and thus be referenced by
them. It stands for a single entity of information, such as a news article or a blog
entry.
Member documents are serialized as XML files following Atom. The root node is
atom:entry. It must contain at least the fields atom:id, atom:title, and atom:updated.
An entry can contain a atom:content node which should be used to transport the ac-
tual content information that this entry represents. It can contain XHTML [IM07],
foreign markup in a different XML namespace [HTBL06] or just plain text. Fur-
thermore, Atom member representations can contain several links which have to be
denoted as atom:link elements. A link must have a atom:href attribute mentioning an
URI identifying the linked resource. Furthermore, links can have a atom:rel attribute
which defines the link relation type, i.e. how the respective resource is related to
the linked resource. Values for the rel attribute are defined in [NS05, 4.2.7.2] and
[GdH06, 11]. The value range can be extended. Values which are used throughout
this thesis are related meaning that the linked resource is related, self meaning that
the linked resource is equivalent to the current resource, edit meaning that the linked
resource is an editable equivalent of the current one and that this URI must be used
for editing (i.e. using a PUT request).
Listing 2.6 shows a sample member document containing a plain text content node
and links as specified by Atompub and Atom.
Listing 2.6: Member document
< entry xmlns =" http :// www . w3 . org /2005/ Atom " >
< title > Atom - Powered Robots Run Amok </ title >
< link href =" http :// example . org / articles / atom03 " rel =" self "/ >
< link href =" http :// example . org / articles / edit / atom03 " rel =" edit "/ >
<id > urn : uuid :1225 c695 - cfb8 </ id >
< updated >2003 -12 -13 T18 :30:02 Z </ updated >
< content type =" text / plain " >
Lorem ipsum quod eros cu pro .
Vel accumsan invenire appellantur eu .
At insolens efficiendi conclusionemque eos ,
ad amet splendide democritum per .
</ content >
</ entry >

2.3.2 Operation semantics


Atompub defines semantics for the four important HTTP operations GET, POST, PUT
, and DELETE which are more specific with respect to their definition in the HTTP
protocol. They are as follows:
In combination with collection resources Atompub only defines semantics for the
operations GET and POST. While GET yields a representation of the collection resource,
i.e. a collection document, i.e. an Atom feed, the server expects to receive the repre-
sentation of a member resource upon a POST request. It will subsequently attempt to
create a member resource according to the received representation and associate it
with the referenced collection. Upon successful creation, the response will include a
member document representing the resource which was just created.7 Furthermore,
7
Note that the returned representation does not need to be equivalent to the one that was
received. Enclosed representations in requests only describe the state that the client intends for
the resource.
22 2. Preliminaries

the response will contain a Location: header (as defined in HTTP, refer to 2.2.4) men-
tioning the URI which the server assigned to the new resource for future reference
and a status code of 201 Created (cf. 2.2.5).
In combination with member resources, Atompub defines semantics for GET, PUT, and
DELETE. A GET request results in a message which represents the member: an entry
document. A PUT request must contain an enclosed entry document and expects the
specified resource to alter accordingly. As with creation of resources, a successful
PUT request will return a response enclosing a representation of the modified resource
according to its current state. A DELETE request does not contain a representation
and requires the resource to be removed.
3. WebData

This chapter defines WebData, a middleware for exposing and accessing object-
oriented application data as Web resources. WebData is the central part of the
thesis. Its concept, design, and prototypical implementation are the author’s con-
tribution. The WebData middleware is split into two main types of components:
server connectors, which expose object-oriented application data as Web resources,
and client connectors, which access Web resources and provide an object-oriented
programming interface to interact with them. The following is a specification rather
than the documentation of a concrete software component or a product.
Both server side and client side connectors are defined as generic software compo-
nents, which means that they do not depend on a specific application domain with
respect to resource types. Furthermore, the server connector specification is inde-
pendent from a programming language and from the underlying framework realizing
the domain model. Concrete server connector implementations will require that
(a) the domain model is made available to the connector using an object-oriented
programming interface and (b) there are means for the connector to gather infor-
mation (e.g. using reflection) on the actual structure of application data within the
domain model in terms of classes, instances, methods, attributes, and associations.
Likewise, the client connector specification is independent from the programming
language and technology in which implementations are realized. Concrete imple-
mentations, however, will only be useful if the embedding software components are
built upon object-oriented concepts. Client and server connectors which are real-
ized in different programming languages or are embedded in technologically different
software systems are compatible and can interoperate if they are respecting the spec-
ification. Implementations of server and client connectors that contain modifications
or enhancements must respect this specification in a way that they can interoperate
with their respective counterparts as specified even if the latter are not part of the
implementation in question.
Figure 3.1 illustrates a sample architecture where client and server connectors are
employed on different tiers within multiple layers. In this example, the domain
model which could be made available using an object-relational mapper implement-
ing the Active Record pattern (cf. 2.1.1) is accessed by the server connector using
24 3. WebData

the same object-oriented programming interface that other components (denoted as


“application logic”) within the business tier are using as well. This enables the server
connector to expose the domain model as resources on the Web which, in essence,
means that the connector is able to dereference URI requests and performs requested
operations on the domain model. The client side connectors on the Web and client
tier can consequently access the domain model while performing requests against
the server’s URIs. The client side connectors will do so when triggered by their en-
vironment, i.e. the components realizing the presentation logic on both upper tiers
in figure 3.1. The WebData connectors are thus providing an end-to-end interaction
between the layer(s) embedding the domain logic and its application data and other
parts of the system while exposing an object-oriented programming interface to its
environment.
Note, that in other application scenarios, the architecture may be completely differ-
ent. Tiers and system structure may differ and applications do not necessarily need
to use a server or client side connector to expose or interact with a domain model.
In some cases, direct interaction using HTTP operations might be more pertinent.

Figure 3.1: Sample architecture using WebData

Sections in this chapter are structured as follows: For each aspect of WebData, ba-
sic considerations and a description of the problem or requirement is given. Where
applicable, different approaches are discussed and a final solution is specified. Those
aspects are: URI references for entities in object-oriented domain models, Repre-
sentation types for Web resources exposing entities in a domain model, Exposure of
object-oriented domain models as Web resources, Object representation and mapping
for REST Resources, Request-based content negotiation for representation formats,
and Concurrency and transactional behavior.
Where pertinent, implementation alternatives and interoperability with existing
components in neighboring layers of the overall architecture are discussed.

3.1 URI references for entities in object-oriented


domain models
In order to access and interact with data object instances, the latter need to be made
available as resources and, therefore, need to be referencable by a URI. Theoretically,
3.1. URI references for entities in object-oriented domain models 25

any arbitrary URI could be assigned to an instance. However, it seems pertinent to


establish a general mapping mechanism which infers URIs from instance properties,
rather than maintaining some sort of assignment table.
In many cases, identification of single instances is not sufficient when interacting with
domain models: sometimes, collections of instances (e.g. classes, search results) will
need to be referred to at once, whereas in other cases, only single attributes of an
object instance may need to be referenced.
WebData defines three main resource types: collections, members and values, where
the definition of collections and members is adopted from Atompub and enhanced
and values are newly introduced. A collection resource is defined to be a set of
member resources which in turn can have several named value resources.
When accessing entities in object-oriented domain models, the mapping between
those entities and resource types is as shown in figure 3.2.

Figure 3.2: Entities in an object-oriented domain model and their mapping to re-
source types

Figure 3.2 reveals the following facts about resource types and domain models:

1. A class maps to a collection resource referencing all its existing instances. This
type of resource is more specifically referred to as class collection resource.
2. An array or set of instances maps to a collection resource referencing all con-
tained instances. This type of resource is more specifically referred to as arbi-
trary collection resource.
3. An instance maps to a member resource.
4. An instance attribute which consists of an atomic value maps to a value re-
source.
5. An instance attribute which references an associated instance with respect to
the structure of the domain model maps to a member resource. This type of
resource is more specifically referred to as associated member resource.
26 3. WebData

6. An instance attribute which references a set of associated instances with re-


spect to the structure of the domain model maps to a collection resource ref-
erencing all associated instances. This type of resource is more specifically
referred to as associated collection resource.

WebData defines a URI scheme for entities in object-oriented application data in


order to reference the aforementioned resources while reflecting the structure of
referenced data. The scheme specializes the general scheme for URIs by further
specifying the path-abempty and query the parts with respect to 2.2.11 .

Listing 3.1: URI Scheme extension for WebData


path - abempty = prefix [ "/" class [ ( "/" instance1 [ "/"
attribute [ "/" instance2 ] ] ) / findername ] ]
class = [ " complete -" ] classname [ prefetch ]
instance1 = instancename1 [ prefetch ] [ "/" version1 ]
attribute = [ " complete -" ] attributename [ prefetch ]
instance2 = instancename2 [ prefetch ] [ "/" version2 ]
prefetch = " - with -" prefetch_attributename *( [ " - and -" ]
p r efetch_attributename )
query = ( key "=" value *( "&" key "=" value ) ) /
free_query
classname = pchar
instancename1 = pchar
attributename = pchar
instancename2 = pchar
version1 = DIGIT
version2 = DIGIT
p r ef e t c h_attributename = pchar
findername = pchar
key = pchar
value = pchar

Base URI

In listing 3.1 prefix can be an arbitrary path expression as defined by path-abempty


(which includes "") in order to differentiate WebData URLs from other URIs on a
server. In WebData server side connector implementations prefix should be con-
figurable by the application developer using the connector and should default to
"webdata". The base URI does not reference a specific resource but nevertheless ex-
poses a limited interface. See 3.3.1 for more information.

Class URI

If is given it must be a pchar expression which is unique within the given


classname
prefix and authority. It is recommended that classname is the name of the program-
matical class within the domain model whose instances are to be referenced. The
prefix "complete-" is used before a classname in order to receive a complete collection
representation (see 3.3.1). The postfix expression prefetch can be used to refer-
ence a collection representation containing associated instances (see 3.3.1) where
prefetch_attributename must be a pchar expression as in attributename. A URI ending on
class references a collection resource as in 1 and is called class URI.

1
Listing 3.1 refers to elements from [BLFM05] unless specified in this document.
3.1. URI references for entities in object-oriented domain models 27

Instance URI
If instancename1 is given it must be a pchar expression which is unique within the
given classname, prefix, and authority. It is recommended that instancename1 is the key
attribute value which is used within the domain model to identify instances (e.g. the
value of a primary column key in a database table row). A URI ending on instance1
is called instance URI. If an instance URI does not contain a prefetch expression it
references a member resource exposing the identified instance, otherwise it references
a collection resource as in 2 containing member resources exposing the identified
instance and its associated instances (see 3.3.1 and 3.4.2).

Versioned instance URI


If version1 is given it must be a DIGIT expression representing a version number of the
referenced resource. Please refer to 3.6.1 to learn more about versioned resources
and optimistic concurrency. A URI ending on version1 is called versioned instance
URI. Versioned instance URIs are instance URIs.

Attribute URI
If attributename is given it must be a pchar expression which is unique within the given
instancename1, classname, prefix,and authority. It is recommended that attributename is
the name of a valid attribute or association as defined by the class which is referred
to by classname. A URI ending on attribute references a value, member or associated
collection resource, according to 4, 5, 6 with respect to the structure of the domain
model. It is called attribute URI. The prefix "complete-" is used before a attributename
in order to receive a complete collection resource (see 3.3.1). The postfix expression
prefetch can be used to reference a collection containing associated instances (see
3.3.1) where prefetch_attributename must be a pchar expression as in attributename. An
attribute URI ending with a prefetch expression references a collection resource such
as described in 2.

Associated instance URI


If instancename2 is given it must be a pchar expression which is unique within the
given attributename, instancename1, classname, prefix, and authority. It is recommended
that instancename2 is the key attribute value which is used within the domain model
to identify instances (e.g. the value of a primary column key in a database table
row). A URI ending on instance2 is called associated instance URI. If an instance
URI does not contain a prefetch expression it references the respective member in an
associated collection, according to 3, otherwise it references a collection resource as
in 2 containing the aforementioned member and its associated instances (see 3.3.1
and 3.4.2).

Versioned associated instance URI


If version2 is given it must be a DIGIT expression representing a version number of the
referenced resource. Please refer to 3.6.1 to learn more about versioned resources
and optimistic concurrency. A URI ending on version2 is called versioned associated
instance URI. Versioned associated instance URIs are associated instance URIs.
28 3. WebData

Finder URI
If findername is given instead of instancename1 and following path components, it must
be a pchar expression which is unique within the given classname, prefix, and authority.
It is recommended that findername matches the name of a finder method (cf. 2.1.1)
within the class referenced by classname. A URI containing findername references a
collection resource containing the member resources which the finder method re-
turned, according to 2. In case the finder method expects parameters, they must be
passed as subsequent pairs of key and value within the query part (cf. 2.2.1) of the
URI. If the employed programming language supports named parameters, keys must
match the finder’s parameter names and values will be passed accordingly, otherwise
values will be passed in order of occurrence disregarding keys. Refer to 2.1.1 for more
information on finder methods. This type of URI is called finder URI.

Query URI
If path-abempty ends on classname and a query is given, it must be a free_query expression
according to the rules in listing 3.13 in section 3.3.2. Such a URI references a
collection resource as in 2 referencing the member resources which match the search
criteria expressed in free_query. Refer to 3.3.2 for more information on queries. This
type of URI is called query URI.

3.2 Representation types for Web resources


exposing entities in a domain model
Communication with resources is done by exchanging representations which are re-
alizing the main goal: interaction. As stated in 1.2, representations are a description
of the resource state of a resource which can be valid for a specific timeframe (cf.
2.2.6). Representations can be enclosed both in requests and responses. While repre-
sentations describe intended state for a resource to be updated or created when sent
with a request, they describe current state of a resource when received in response.
For each of the three resource types that are defined for WebData, representation
types are defined respectively.
WebData representations are inspired by the Atom Publishing Protocol (cf. 2.3).
While representation semantics will be defined regardless of actual serialization for-
mats in order to support different client requirements and leave format determina-
tion to content negotiation (see 3.5), WebData connector implementations must at
least support the WebData Format which is based on the Atom Syndication Format
[NS05] and extended by XML constructs in an additional namespace [HTBL06].
WebData Format constructs and examples are mentioned together with the ba-
sic concepts in the following paragraphs. Sample realizations of those concepts in
WebData Format and JSON (application/json, [Cro06]) are shown at the end of the
respective sections to serve as examples for future definition of additional formats.

3.2.1 Collection representations


The main purpose of a collection representation is to reference URIs of member
resources and a set of (minimal) information describing the member resources. Fur-
thermore, information about the collection itself must be included in the represen-
3.2. Representation types for Web resources
exposing entities in a domain model 29

tation. The corresponding WebData Format root element for a collection represen-
tation is: atom:feed2 . The actual entities of information which must be included in a
collection representation are as follows:

Identifier Every collection representation must mention its identifier which has to
be globally unique. It is required that this identifier is a URI (cf. 2.2.1).
Corresponding WebData Format element: atom:id.

Title A title must be given for each collection and should be a human-readable text
introducing the purpose of this collection.
Corresponding WebData Format element: atom:title.

Update information A collection representation must carry an information entity


mentioning the date and time of its or its entries last significant update.
Corresponding WebData Format element: atom:updated.

Members A collection representation must at least include minimal member rep-


resentations (i.e. member representations excluding value representations as
opposed to complete member representations which contain all value repre-
sentations) of all enclosed members and may include more information on
members as described in the next section on member representations.
If the represented collection is an associated collection resource as defined in
3.1, every member of the collection must be represented using its canonical
and its edit link (see next paragraph on member representations). Otherwise,
canonical links are sufficient.
Corresponding WebData Format elements: atom:entry.

Listing 3.2 shows a sample collection representation serialized to Atom (i.e. an Atom
feed containing Atom entries) which is extended by a WebData Format schema (see
below) for the Order class as in 2.1.
Listing 3.2: Collection representation in extended Atom
1 < feed xmlns : wd =" http :// example . org / webdata / orders " xmlns =" http :// www . w3 . org
/2005/ Atom " >
2 <id > urn : example . org : webdata : customers :42: orders </ id >
3 < title > Orders for customer #42 </ title >
4 < updated >2007 -01 -16 T04 :12:26 Z </ updated >
5 < entry >
6 <id > urn : example . org : webdata : orders :0815 </ id >
7 < title > Order #0815 (#2 of orders for customer #42) </ title >
8 < updated >2007 -01 -16 T04 :12:26 Z </ updated >
9 < link href =" http :// example . org / webdata / customers /42/ orders /2" rel =" edit
"/ >
10 < link href =" http :// example . org / webdata / orders /0815" rel =" self "/ >
11 </ entry >
12 < entry >
13 <id > urn : example . org : webdata : orders :4711 </ id >
14 < title > Order #4711 (#3 of orders for customer #42) </ title >
15 < updated >2006 -02 -17 T01 :49:48 Z </ updated >
16 < link href =" http :// example . org / webdata / customers /42/ orders /3" rel =" edit
"/ >
17 < link href =" http :// example . org / webdata / orders /4711" rel =" self "/ >
18 </ entry >
19 </ feed >
2
XML elements and attributes which are defined in the http://www.w3.org/2005/Atom names-
pace are denoted with the prefix atom: while prefix wd: denotes elements and attributes from the
respective WebData namespace, refer to 3.2.4 Schema representations.
30 3. WebData

The described concepts are represented by the lines in listing 3.2 as follows: identifier
– line 2; title – line 3; update information – line 4; edit links for enclosed member
representations – lines 9,16; canonical links for enclosed member representations –
lines 10,17.
Listing 3.3 shows a collection representation serialized to JSON.

Listing 3.3: Collection representation in JSON


1 {
2 " id ": " urn : example . org : webdata : customers :42: orders " ,
3 " title ": " Orders for customer #42" ,
4 " updated ": 2007 -01 -16 T04 :12:26 Z ,
5 " members ": [
6 { " id ": " urn : example . org : webdata : orders :0815" ,
7 " title ": " Order #0815 (#2 of orders for customer #42) " ,
8 " updated ": 2007 -01 -16 T04 :12:26 Z ,
9 " links ": [
10 { " type ": " edit " ,
11 " uri ": " http :// example . org / webdata / customers /42/ orders /2" } ,
12 { " type ": " canonical " ,
13 " uri ": " http :// example . org / webdata / orders /0815" } ]
14 },
15 { " id ": " urn : example . org : webdata : orders :4711" ,
16 " title ": " Order #4711 (#3 of orders for customer #42) " ,
17 " updated ": 2006 -02 -17 T01 :49:48 Z ,
18 " links ": [
19 { " type ": " edit " ,
20 " uri ": " http :// example . org / webdata / customers /42/ orders /3" } ,
21 { " type ": " canonical " ,
22 " uri ": " http :// example . org / webdata / orders /4711" } ]
23 } ]
24 }

The described concepts are represented by the lines in listing 3.3 as follows: identifier
– line 2; title – line 3; update information – line 4; edit links for enclosed member rep-
resentations – lines 10,11,19,20; canonical links for enclosed member representations
– lines 12,13,21,22.

3.2.2 Member representations


The purpose of member representations is to describe the actual object instance
which the member resource exposes. There are common information entities, which
are relevant to all member representations, and information entities, which are spe-
cific with respect to the class that the respective instance belongs to in terms of
object orientation. The latter are called attributes expressed using value representa-
tions as described in the next paragraph. The corresponding WebData Format root
element for a member representation is: atom:entry. Common information entities for
all member representations are:

Identifier Every member representation must mention its identifier which has to
be globally unique. It is required that this identifier is a URI (cf. 2.2.1).
Corresponding WebData Format element: atom:id.

Title A title must be given for each member and should be a human-readable
text introducing the purpose of this member or (a combination of) its main
attribute(s).
Corresponding WebData Format element: atom:title.
3.2. Representation types for Web resources
exposing entities in a domain model 31

Update information A member representation must carry an information entity


mentioning the date and time of its last significant update.
Corresponding WebData Format element: atom:updated.

Edit link A member representation can contain an edit link (i.e. a URI). Edit
links will be used by clients to perform edit (i.e. PUT, DELETE, see 3.3.1) opera-
tions on a member resource. If no edit link is included in the representation,
the canonical link will be used for editing. WebData server-side connectors
should construct edit links following the definition of instance URIs or associ-
ated instance URIs (cf. 3.1), respectively. If the member resource is referenced
(either by a request to one of its associated instance URIs or by a collection
of associated instances) in the context of an associated instance, the member
representation must include an edit link being an associated instance URI. See
3.6.1 for more information on edit links in the context of optimistic concur-
rency. Clients must use the edit link for subsequent PUT and DELETE requests
against the represented resource.
Corresponding WebData Format element: atom:link with attributes atom:href
mentioning the URI and atom:rel with value edit.
Canonical link A member representation must contain a canonical link which must
be constructed according to the definition of instance URIs in 3.1.
Corresponding WebData Format element: atom:link with attributes atom:href
mentioning the URI and atom:rel with value self.
Associated resource links A member representation must include links to all as-
sociated resources according to the structure of associations of the currently
represented instance’s class in the domain model. If the represented instance
i has an association through attribute a to another instance or a collection
of instances, the associated resource link must be an attribute URI where
instancename1 references i and attributename references a. Furthermore, the infor-
mation entity representing the association has to reveal whether the referenced
resource is a collection or a member and the name of a as defined by the class
of i.
Corresponding WebData Format element: atom:link with attributes atom:href
mentioning the URI, atom:rel with value related , wd:type with value association
, atom:title mentioning the name of attribute a, and wd:cardinality with value *
for a referenced collection or 1 for a referenced member.
Value links A member representation must include links to all value resources
whose values are accessible through augmented accessor methods of the repre-
sented instance. If the represented instance i has augmented accessor methods
for the virtual attribute a the value link must be an attribute URI where
instancename1 references i and attributename references a. Furthermore, the in-
formation entity representing the value link has to reveal that the referenced
resource is a value and the name of a as defined by the class of i. (Refer to 2.1
to learn about virtual attributes and method types.)
Corresponding WebData Format element: atom:link with attributes atom:href
mentioning the URI, atom:rel with value related , wd:type with value accessor ,
and atom:title mentioning the name of attribute a.
32 3. WebData

Values A member representation must include value representations for all its value
resources whose values are accessible through standard accessor methods. (Re-
fer to 2.1 to learn about method types.)
Corresponding WebData Format elements as defined in ’Value representations’
enclosed in element atom:content with attribute atom:type having application/xml
as value.

Listing 3.4 shows a member representation serialized to Atom (i.e. an Atom entry)
which is extended by a WebData schema for the order class (see below).
Listing 3.4: Member representation in extended Atom
1 < entry xmlns : wd =" http :// example . org / webdata / orders " xmlns : xsd =" http :// www .
w3 . org /2001/ XMLSchema " xmlns =" http :// www . w3 . org /2005/ Atom " >
2 <id > urn : example . org : webdata : orders :4711 </ id >
3 < title > Order #4711 (#3 of orders for customer #42) </ title >
4 < updated >2007 -01 -16 T04 :12:26 Z </ updated >
5 < link href =" http :// example . org / webdata / customers /42/ orders /3" rel =" edit "/ >
6 < link href =" http :// example . org / webdata / orders /4711" rel =" self "/ >
7 < link href =" http :// example . org / webdata / orders /4711/ products " rel =" related "
title =" products " wd : type =" association " wd : cardinality ="*"/ >
8 < link href =" http :// example . org / webdata / orders /4711/ customer " rel =" related "
title =" customer " wd : type =" association " wd : cardinality ="1"/ >
9 < link href =" http :// example . org / webdata / orders /4711/ total_price " rel ="
related " title =" total - price " wd : type =" accessor "/ >
10 < content type =" application / xml " >
11 < wd : express - shipping wd : type =" xsd : boolean " > true </ wd : express - shipping >
12 < wd : gift - wrap wd : type =" xsd : boolean " > false </ wd : gift - wrap >
13 </ content >
14 </ entry >

The described concepts are represented by the lines in listing 3.4 as follows: identifier
– line 2; title – line 3; update information – line 4; edit link – line 5; canonical link
– line 6; associated resource link for a collection of associated instances – line 7;
associated resource link for a single associated instance – line 8; value link – line 9;
values – lines 11,12.
Listing 3.5 shows a sample member representation serialized to JSON.
Listing 3.5: Member representation in JSON
1 {
2 " id ": " urn : example . org : webdata : orders :4711" ,
3 " title ": " Order #4711 (#3 of orders for customer #42) " ,
4 " updated ": 2007 -01 -16 T04 :12:26 Z ,
5 " express_shipping ": true ,
6 " gift_wrap ": false ,
7 " links ": [
8 { " type ": " edit " ,
9 " uri ": " http :// example . org / webdata / customers /42/ orders /3" } ,
10 { " type ": " canonical " ,
11 " uri ": " http :// example . org / webdata / orders /4711" } ,
12 { " type ": " association " ,
13 " uri ": " http :// example . org / webdata / orders /4711/ products " ,
14 " cardinality ": " many " ,
15 " name ": " products " } ,
16 { " type ": " association " ,
17 " uri ": " http :// example . org / webdata / orders /4711/ customer " ,
18 " cardinality ": " one " ,
19 " name ": " customer " } ,
20 { " type ": " accessor " ,
21 " href ": " http :// example . org / webdata / orders /4711/ total_price " ,
22 " name ": " total_price " } ]
23 }

The described concepts are represented by the lines in listing 3.5 as follows: identifier
– line 2; title – line 3; update information – line 4; edit link – lines 8,9; canonical
3.2. Representation types for Web resources
exposing entities in a domain model 33

link – lines 10,11; associated resource link for a collection of associated instances –
lines 12-15; associated resource link for a single associated instance – lines 16-19;
value link – lines 20-22; values – lines 5,6.

3.2.3 Value representations

Value representations describe value resources that stand for object attributes and
their respective values. Value representations can represent virtual and non-virtual
attributes (refer to 2.1). A value representation consists of information describing
the name of the attribute and its value. In order to support different type systems
in different programming languages, value representations can include information
about the value’s data type.

Corresponding WebData Format elements are defined as wd:a where a is the name of
the represented attribute with XML attributes wd:type mentioning the corresponding
data type as defined by [BM04].

Listing 3.6 shows a value representation serialized to XML using a WebData schema
for the order class (see below).

Listing 3.6: Value representation in XML


1 < wd : total - price xmlns : wd =" http :// example . org / webdata / orders " xmlns : xsd ="
http :// www . w3 . org /2001/ XMLSchema " wd : type =" xsd : float " >
2 998.98
3 </ wd : total - price >

Listing 3.7 shows a value representation serialized to JSON.

Listing 3.7: Value representation in JSON


1 { " total_price ": 998.98 }

Figure 3.3 illustrates the three basic representation types which are described above.

Figure 3.3: Basic representation types

In figure 3.3 collection representations are composed of member representations that


can either be minimal or complete member representations. Minimal member rep-
resentations do not contain additional information, while complete member repre-
sentations may have associated resource links. Furthermore, they may contain value
representations directly (hence the compositional links to value representations) or
may contain links to value representations that are not directly contained themselves
(hence the aggregation).
34 3. WebData

3.2.4 Schema representations


In order to support interaction with resources via the means of representations, both
client and server must have common understanding of the schema of representations
to be exchanged. For each supported serialization format WebData server-side con-
nectors should provide schema representations serving as metadata information for
resources and their representations. Schema representations could use any format
for exchanging metadata and schemas; however, in order to support a large number
of clients, standardized formats (e.g. Relax NG [Cla01], XML Schema [XML01])
should be preferred over custom or proprietary formats.
Every resource which exposes a class within a domain model should provide its
schema representation if requested using content negotiation (see 3.5). The supplied
schema representations should contain a specific part, which is dependent on the
particular class within the domain model, i.e. mentioning all possible attributes and
their respective data types. Schema representations must at least contain a generic
(i.e. domain-independent) part, which specifies how the basic concepts of associated
resource links and value links as described in 3.2.2 are expressed.
Listing 3.8 shows a sample WebData Format schema representation realized in com-
pact Relax NG which extends the Atom Syndication Format. The generic part of
this schema describes the complete extension which WebData Format defines with
respect to Atom. Representations relating to the order resource exposing the order
class in listings 3.2, 3.4, and 3.6 comply with the schema shown in listing 3.8.
Listing 3.8: Sample Relax NG schema representation
1 # Relax NG for WebData representations
2 #
3 # Resource local name : Orders
4 # Resource URI : http :// example . org / webdata / orders
5
6 namespace atom = " http :// www . w3 . org /2005/ Atom "
7 namespace wd = " http :// example . org / webdata / orders "
8 datatypes xsd = " http :// www . w3 . org /2001/ XMLSchema - datatypes "
9
10 start = anyElement
11
12 anyElement = element * - atom :* {
13 ( attribute * { text }
14 | text
15 | anyElement ) *
16 } | anyAtom
17
18 anyAtom = element atom :* - ( atom : link | atom : content ) {
19 ( attribute atom :* { text }
20 | text
21 | anyAtom ) *
22 } | atomLink | atomContent
23
24 atomLink = element atom : link {
25 attribute wd : cardinality { "*" | "1" }? ,
26 attribute wd : type { " association " | " accessor " }? ,
27 anyNonWebData *
28 }
29
30 atomContent = element atom : content {
31 element wd : express - shipping { attribute wd : type { " xsd : boolean " } , xsd :
boolean }? ,
32 element wd : gift - wrap { attribute wd : type { " xsd : boolean " } , xsd : boolean
}? ,
33 element wd : total - price { attribute wd : type { " xsd : float " } , xsd : float }? ,
34 anyNonWebData *
35 }
36
37 anyNonWebData = element * - wd :* {
38 ( attribute * { text }
3.2. Representation types for Web resources
exposing entities in a domain model 35

39 | text
40 | anyNonWebData ) *
41 } | attribute * - wd :* { text }

In listing 3.8, lines 6-28 and 36-40 represent the generic part of the schema rep-
resentation where lines 10-22 interweave the definition with Atom documents and
lines 24-28 define extra attributes for the atom:link element which realize associ-
ated resource links and value links. Lines 30-34 however, define the part of the
schema which is specific to the order class and resource. They constrain value
representations to attributes that exist on the respective class and to their data
types. WebData server-side connector implementations will need information on the
underlying data schemas (e.g. through reflection) in order to provide those repre-
sentations. WedData server-side connectors must provide at least WebData Format
schema representations for every resource that stands for a class. WebData Format
schema representations must contain the exact generic part, as mentioned above,
and a specific part with respect to the underlying domain model.
It is also possible for WebData server-side connector implementations to supply
generic and specific parts of schema representations separately. In this case, the
generic schema representation would have to be provided by the base resource as
described in 3.1 while specific schema representations would be provided by those
resources standing for a class as described above. The benefit of splitting up schema
representations is that all resource representations can be validated as WebData
representations using only one (generic) schema representation that acts as “least
common denominator”3 in a first pass while specific collection, member, and value
representations can be validated against the specific schema representation in a sec-
ond pass. Obviously, this increases complexity and might not be pertinant in all
cases.
Listings 3.9 and 3.10 show a split-up schema representation which represents the
same schema as listing 3.8 which was discussed above.

Listing 3.9: Sample Relax NG schema representation (generic part)


1 # Relax NG for WebData representations
2 #
3 # Generic schema representation
4
5 namespace atom = " http :// www . w3 . org /2005/ Atom "
6 namespace wd = " http :// example . org / webdata "
7 datatypes xsd = " http :// www . w3 . org /2001/ XMLSchema - datatypes "
8
9 start = anyElement
10
11 anyElement = element * - atom :* {
12 ( attribute * { text }
13 | text
14 | anyElement ) *
15 } | anyAtom
16
17 anyAtom = element atom :* - ( atom : link | atom : content ) {
18 ( attribute atom :* { text }
19 | text
20 | anyAtom ) *
21 } | atomLink | atomContent
22
23 anyNonWebData = element * - wd :* {
24 ( attribute * { text }
25 | text

3
For instance, WebData domain models could be represented graphically using XSLT transfor-
mations.
36 3. WebData

26 | anyNonWebData ) *
27 } | attribute * - wd :* { text }
28
29 atomContent = element atom : content {
30 anyElement * ,
31 anyNonWebData *
32 }
33
34 atomLink = element atom : link {
35 attribute wd : cardinality { "*" | "1" }? ,
36 attribute wd : type { " association " | " accessor " }? ,
37 anyNonWebData *
38 }

Listing 3.10: Sample Relax NG schema representation (specific part)


1 # Relax NG for WebData representations
2 #
3 # Resource local name : Orders
4 # Resource URI : http :// example . org / webdata / orders
5
6 namespace atom = " http :// www . w3 . org /2005/ Atom "
7 namespace wd = " http :// example . org / webdata / orders "
8 datatypes xsd = " http :// www . w3 . org /2001/ XMLSchema - datatypes "
9
10 start = anyElement
11
12 anyElement = element * - ( atom :* | wd :*) {
13 ( attribute * { text }
14 | text
15 | anyElement ) *
16 } | anyAtom | webDataContent
17
18 anyAtom = element atom :* - ( atom : content ) {
19 ( attribute * { text }
20 | text
21 | anyAtom ) *
22 } | atomContent
23
24 anyNonWebData = element * - wd :* {
25 ( attribute * { text }
26 | text
27 | anyNonWebData ) *
28 } | attribute * - wd :* { text }
29
30 atomContent = element atom : content {
31 ( webDataContent | anyNonWebData ) *
32 }
33
34 webDataContent = element wd : express - shipping { attribute wd : type { " xsd :
boolean " } , xsd : boolean }
35 | element wd : gift - wrap { attribute wd : type { " xsd : boolean " } , xsd : boolean
}
36 | element wd : total - price { attribute wd : type { " xsd : float " } , xsd : float }

3.2.5 Mixed collection representations


In some cases, collection representations can be mixed with respect to the type of
the included members. While in standard collection representations all member rep-
resentations represent resources that expose instances of the same class, in mixed
representations, resources exposing instances of different classes can be represented.
This is particularly required for prefetching as described in 2.2.6. Member represen-
tations contained in mixed collection representations should mention the resource
that exposes the class that their represented instances belong to with respect to the
used representation format. In other words, schema representations for members in
mixed collection representations should be retrieved using references obtained from
enclosed member representations and the collection representation itself must not
mention any such references outside member representations.
Listing 3.11 illustrates a mixed collection representation serialized to XML using
WebData schemas for the order and customer resources.
3.3. Exposure of object-oriented domain models as Web resources (server part) 37

Listing 3.11: Mixed collection representation in extended Atom


1 < feed xmlns : xsd =" http :// www . w3 . org /2001/ XMLSchema " xmlns =" http :// www . w3 . org
/2005/ Atom " >
2 <id > urn : example . org : webdata : customers :42 - with - orders </ id >
3 < title > Customer #42 , and orders for customer #42 , </ title >
4 < updated >2007 -01 -16 T04 :12:26 Z </ updated >
5 < entry xmlns : wd =" http :// example . org / webdata / customers " >
6 <id > urn : example . org : webdata : customer :42 </ id >
7 < title > Customer #42 </ title >
8 < updated >2007 -01 -16 T04 :12:26 Z </ updated >
9 < link href =" http :// example . org / webdata / customers /42" rel =" self "/ >
10 < link href =" http :// example . org / webdata / customers /42/ orders " rel =" related "
title =" orders " wd : type =" association " wd : cardinality ="*"/ >
11 </ entry >
12 < entry xmlns : wd =" http :// example . org / webdata / orders " >
13 <id > urn : example . org : webdata : orders :0815 </ id >
14 < title > Order #0815 </ title >
15 < updated >2007 -01 -16 T04 :12:26 Z </ updated >
16 < link href =" http :// example . org / webdata / customers /42/ orders /2" rel =" edit
"/ >
17 < link href =" http :// example . org / webdata / orders /0815" rel =" self "/ >
18 < link href =" http :// example . org / webdata / orders /4711/ customer " rel =" related
" title =" customer " wd : type =" association " wd : cardinality ="1"/ >
19 </ entry >
20 < entry xmlns : wd =" http :// example . org / webdata / orders " >
21 <id > urn : example . org : webdata : orders :4711 </ id >
22 < title > Order #4711 </ title >
23 < updated >2006 -02 -17 T01 :49:48 Z </ updated >
24 < link href =" http :// example . org / webdata / customers /42/ orders /3" rel =" edit
"/ >
25 < link href =" http :// example . org / webdata / orders /4711" rel =" self "/ >
26 < link href =" http :// example . org / webdata / orders /4711/ customer " rel =" related
" title =" customer " wd : type =" association " wd : cardinality ="1"/ >
27 </ entry >
28 </ feed >

Where lines 5-11 represent a customer resource and lines 12-19 and 20-27 represent
order resources respectively. Note, that the atom:feed element no longer mentions the
resource which stands for the class of represented instances. Instead, the atom:entry
elements mention those resources respectively.

3.3 Exposure of object-oriented domain models


as Web resources (server part)
The WebData server side connector accesses an object-oriented domain model and
exposes its entities as resources on the Web. The basic principle consists of trans-
ferring the object-oriented concepts of classes, (sets of) instances, methods and at-
tributes to Web concepts like URIs, resources, and representations. The following
sections explain how this is achieved.

3.3.1 Operation semantics for Web resources standing for


application data
The three different resource types as defined in 3.1 must expose an HTTP based
interface to clients on the Web. It is pertinent to use a subset of the eight operations
(cf. 2.2.3) as defined in HTTP to define interactions with WebData resources.

The following paragraphs define the valid operations for all three resource types
respectively and specify the behavior clients must expect when requesting.
38 3. WebData

Collection resources
As mentioned before, collection resources expose a set of object instances which
can be a whole class, an arbitrary set or an associated set of instances. Collec-
tion resources propose the four methods GET, POST, PUT, and DELETE with the following
semantics.

GET requests should return a representation describing all resources that are mem-
bers of the collection. Following the mapping of resource types, this means: all
object instances that belong to the identified set (e.g. class, search result, as-
sociation). In many cases, it will be appropriate to limit description of actual
collection members to a bare minimum including references to the member
resources (cf. minimal member representations in 3.2).
Programmatically, this involves discovery of object instances according to crite-
ria inferred from the request URI and serialization according to the requested
content-type (cf. 3.5). Generally, discovery breaks down into four different
types of behavior that must be carried out by a server connector on GET:

1. If the collection is referenced by a class URI, the resource is the referenced


class (cf. 3.1 1). All existing object instances of that class must be
discovered (e.g. using a generic finder method). If a prefetch postfix is
used in the URI, all resources that are associated through the mentioned
attributes must be discovered as well (preferably using the same finder
call).
2. If the collection is referenced by an attribute URI, the resource is the
attribute of another instance a consisting of a set of associated instances
(cf. 3.1 6). All instances with respect to instance a must be discovered.
This might involve discovering a first and evaluating the respective set
attribute of a.
3. If the collection is referenced by a finder URI, the resource is an arbitrary
set of object instances (cf. 3.1 2) as returned by a custom finder method
belonging to the referenced class. The finder method of that class has to
be called, which will subsequently perform discovery.
4. If the collection is referenced by a query URI, the resource is an arbitrary
set of object instances (cf. 3.1 2) as returned by an object query. The
query URI has to be transformed to an object query accordingly and
performed within the scope of the referenced class in order to discover
object instances. See 3.3.2 for more information on queries.

For serialization, a collection representation has to be produced. If the col-


lection resource has been requested using the "complete-" prefix in its URI (cf.
3.1 1), complete member representations (i.e. representations containing all
information) have to be included in the collection representation, otherwise,
minimal member representations can be used instead (see 3.2).
A GET request must yield a 200 Ok status code upon successful discovery of
objects or if the set of objects is empty but could have contained instances.
If the date and time of the next potential change can be determined (see
3.3.5) for all included instances, the response must include an Expires: header
3.3. Exposure of object-oriented domain models as Web resources (server part) 39

mentioning the earliest point in time with respect to expiration information of


all resources. It must be formatted as defined in [Bra89].
A 400 Bad request status code must be returned if either the query URI is mal-
formed or the actual execution of the query on the domain model is producing
an error that can be identified as being caused by the client (e.g. through the
given URI).

POST requests yield the creation of a new member resource in or the addition of
an existing member resource to the referenced collection. A POST request has
to include a member representation which is either describing the intended
state for a new member resource or mentioning the URI of an existing member
resource as its identifier (see 3.2). Programmatically, behavior for POST differs
according to the following conditions:

1. If the collection is referenced by a class URI, it is assumed that a new


instance j has to be created. The new object instance will have the
type of the class referenced by the class URI. If the given representation
mentions an identifier for the new resource and this identifier is not yet
taken, the server-side connector can choose to use the instancename part
(cf. 3.1) of that identifier for instance creation. If creation was successful,
a 201 Created status code must be returned.
2. If the collection is referenced by an attribute URI referring to attribute
a of instance i and the representation either does not mention its iden-
tifier or mentions the identifier of a non-existing resource, a new object
instance j will be created. Instance j will have the type of the class which
is associated by a to the class of i. If the given representation mentions
an identifier for the new resource and this identifier is not yet taken, the
server-side connector can choose to use the instancename part (cf. 3.1) of
that name for instance creation. Subsequently, instance j has to be asso-
ciated to instance i using a. If creation and association were successful,
a 201 Created status code must be returned.
3. If the collection is referenced by an attribute URI referring to attribute a
of instance i and the representation does mention its existing identifier,
that existing object instance j has to be associated to instance i using a.
In this case, j has to have the type of the class that is associated by a to
the class of i. Otherwise, a 400 Bad request status code must be returned
and the association must not take place. On successful association, a
200 Ok status code must be returned.

4. If the collection is referenced by a finder URI or a query URI a 400 Bad


request status code must be returned.

Generally, WebData server-side connectors can neglect parts within given rep-
resentations that are not applicable to the type of object to be created or as-
sociated. After creation and/or association, the instance j must be serialized
and returned. The representation must contain a Location: header mentioning
the instance URI (for POST requests on class URIs) or associated instance URI
(for POST requests on attribute URIs) of j. Furthermore, the response must
contain a status code and headers as with a GET request on a member resource.
40 3. WebData

PUT requests ask resources to change their state according to the provided repre-
sentation. For collection resources, this yields the respective change of state
for its member resources as one atomic operation.
Programmatically, a WebData server-side connector must discover the identi-
fied instances as it would do in order to perform a GET request and subsequently
perform the required updates on them. WebData server-side connectors must
expect a member representation (see 3.2) and must update attributes of all
identified instances as single updates on all identified instances. In order to
achieve atomicity, server-side connectors can rely on transaction mechanisms
of the underlying persistence framework or account for exclusive access and
potential rollbacks themselves. If all updates have been successful, server-side
connectors must return a status code, headers, and a collection representation
as with GET requests. If one of the updates was unsuccessful (and thus the whole
transaction has been rolled back), the connector can return a 400 Bad Request
status code if it identified the cause of the failure within the request.

DELETE requests yield the removal of resources. For collection resources, this
means the removal of all its members as one atomic operation rather than the
removal of the collection itself.
Programmatically and similar to the GET request for collections, a WebData
server-side connector must discover all object instances enclosed by the set that
is identified by the collection and then remove them subsequently. Removal
must be carried out according to the following conditions:

1. If the collection is referenced by a class, finder, or query URI, the enclosed


object instances must be destroyed. Depending on the underlying per-
sistence technology, this may include execution of designated destructor
functionality on the respective instances.
2. If the collection is referenced by an attribute URI referring to attribute
a of instance i the enclosed object instances must not be destroyed or
deleted. Instead they must be disassociated from i with respect to a.

In order to achieve atomicity, server-side connectors can rely on transaction


mechanisms of the underlying persistence framework or account for exclu-
sive access and eventual rollbacks themselves. If removal has been successful,
server-side connectors must return a 204 No Content status code and no repre-
sentation.

Generally, a 404 Not found status code must be returned if the referenced class, at-
tribute or finder is not existing.

Member resources
Member resources expose concrete object instances which belong to one class and
can be members in one or more sets. Member resources propose the methods GET
, PUT, and DELETE with the following semantics. If another operation is requested
from a member resource, a 405 Method Not Allowed status code must be returned and
the response must include an Allow: header mentioning GET, PUT, DELETE as allowed
operations.
3.3. Exposure of object-oriented domain models as Web resources (server part) 41

GET requests must return a representation describing the identified resource. Fol-
lowing the mapping for resource types, this means: a concrete object instance.
If no prefetch postfix is used in the request URI, member resources must return
member representations on GET that are structured as defined in 3.2, otherwise,
they must return mixed collection representations.
Programmatically, this means discovery and serialization of the identified in-
stance from the domain model. If a prefetch postfix is used in the URI, all re-
sources that are associated through the attributes mentioned in prefetch must
be discovered as well (preferably using the same finder call). A GET request must
yield a 200 Ok status code upon successful discovery of the object instances.
If the identified instance supports versioning (see 3.3.5) and the request mes-
sage included an If-None-Match: header mentioning the current version of the
instance, the request must yield a 304 Not Modified status code and no repre-
sentation instead. In any case, the response must include an Etag: header
mentioning the current version if versioning is supported for the current in-
stance.
If the date and time of the next potential change to this instance can be
determined (see 3.3.5), the response must include an Expires: header mentioning
this point in time formatted as defined in [Bra89].

PUT requests yield a status change on the identified resource and must include a
representation describing the indented state for the resource.
Programmatically, a WebData server-side connector must discover the identi-
fied instance as it would do in order to perform a GET request and subsequently
perform the required update on it. WebData server-side connectors must ex-
pect a member representation (see 3.2) and must update attributes of the
identified instance. If the member is referenced by an attribute URI refer-
ring to attribute a of instance i and the representation contains an identifier
(see 3.2.2) that is different from the identifier of the resource which currently
exposes the model instance being associated to i through a, a has to be up-
dated in a way that the new instance becomes associated to i. If the update
has been successful, server-side connectors must return a status code, headers,
and a member representation, as with GET requests. If the update was unsuc-
cessful, the connector can return a 400 Bad Request status code if it identified
the cause of the failure within the request (e.g. the enclosed representation).

DELETE requests yield the removal of resources. For a member resource, this
means the removal or deletion of this member with respect to the collection
that it is a member in.
Programmatically and similar to the GET request for members, a WebData
server-side connector must discover the object instance that is identified by the
request and remove it subsequently. Removal must be carried out according
to the following conditions:

1. If the member is referenced by an instance URI the respective object


instance must be destroyed. Depending on the underlying persistence
technology, this may include execution of designated destructor function-
ality on the respective instance.
42 3. WebData

2. If the member is referenced by an associated instance URI referring to


associated instance i through attribute a of instance j the associated
instance i must not be destroyed or deleted. Instead, it must be disasso-
ciated from j with respect to a.

If removal has been successful, server-side connectors must return a 204 No


Content status code and no representation.

Generally, a 404 Not found status code must be returned if the referenced class, in-
stance, attribute or associated instance does not exist.

Value resources
Value resources stand for single values that instance attributes may have. Value
resources propose the methods GET, PUT, and DELETE with the following semantics. If
another operation is requested from a value resource, a 405 Method Not Allowed status
code must be returned and the response must include an Allow: header mentioning
GET, PUT, DELETE as allowed operations.

GET requests must return a representation describing the identified resource. Fol-
lowing the mapping for resource types, this means: the value of an instance
attribute. Value resources must return value representations on GET which are
structured as defined in 3.2.
Programmatically, this means discovery of the instance identified by the at-
tribute URI within the domain model, readout of the attribute value identified
by the attribute URI, and its serialization. Note, that readout may entail the
execution of an augmented getter method as described in 2.1. A GET request
must yield a 200 Ok status code upon successful discovery of the object instance.

PUT requests yield a status change on the identified resource and have to include
a representation describing the indented state for the resource.
Programmatically, a WebData server-side connector must discover the identi-
fied instance and read the identified attribute as it would do in order to perform
a GET request and subsequently perform the required update on the attribute.
WebData server-side connectors must expect a value representation (see 3.2)
and must update the identified attribute of the identified instance. This may
entail the execution of an augmented setter method as described in 2.1. If the
update has been successful, server-side connectors must return a 200 Ok status
code and a value representation as with GET requests. If the update was un-
successful, the connector can return a 400 Bad Request status code if it identified
the cause of the failure within the request (e.g. the enclosed representation).

DELETE requests yield the removal of resources. For a value resource, this means
the reinitialization of the identified attribute value rather than the removal of
the attribute itself as it is defined by the data structure and not subject to
change.
Programmatically and similar to the PUT request for values, a WebData server-
side connector must discover the object instance and attribute which are iden-
tified by the request and set the attribute value to its initial state (e.g. NULL
3.3. Exposure of object-oriented domain models as Web resources (server part) 43

or a default value) accordingly. If reinitialization has been successful, server-


side connectors must return a 200 Ok status code and a value representation
describing the new initial value.

Generally, a 404 Not found status code must be returned if the referenced class, in-
stance or attribute is not existing.

The “base resource”


As mentioned in 3.1, a request can be made directly against the base URI. Se-
mantically, the base URI does not identify a particular resource but can be used to
perform multiple operations on a number of resources at a time atomically. The base
resource only exposes the PUT operation. If another operation is requested from the
base resource, a 405 Method Not Allowed status code must be returned and the response
must include an Allow: header mentioning PUT as the only one valid operation.
Programmatically, a PUT request against the base resource is similar to a PUT request
against a collection resource, except that instead of a member representation, a col-
lection representation must be enclosed. The collection representation is expected
to include a number of member representations which must include an edit link (see
3.2) and value representations describing the intended state for the member resource.
Subsequently, the object instances referenced by the edit links are discovered and
the requested update operations are performed respectively. In order to achieve
atomicity, server-side connectors can rely on transaction mechanisms of the under-
lying persistence framework or account for exclusive access and eventual rollbacks
themselves. If all updates have been successful, server-side connectors must return
a 200 Ok status code and a collection representation which references all resources
which have been affected by the update. If one of the updates was unsuccessful
(and thus the whole transaction has been rolled back), the connector can return a
400 Bad Request status code if it identified the cause of the failure within the request.

If neither the WebData server-side connector implementation nor the underlying


platform can support transactional behavior and atomicity the base resource must
yield a 405 Method Not Allowed status code on PUT operations to indicate the lack of
transactional capabilities.

Generally, all operations on all resource types must yield a 500 Internal Server Error
status code if an error occurred while performing the request and the failure cannot
certainly be identified as caused by the client. According to HTTP, any response with
a 500 Internal Server Error status code should include a representation explaining the
error. This explanation may reveal original error codes and messages of underlying
components if this is not considered critical for security.

3.3.2 Query mechanisms for Web resources


A major requirement for facilities which provide access to data is search. Section
3.1 reveals how concrete object instances and collections of instances can be located
and requested if a concrete name (i.e. URI) is known by the client. However, a large
44 3. WebData

class of use-cases does not satisfy this assumption: in many situations, clients only
know aspects of objects or need to select a subset of objects from a larger set using
conditional expressions. In relational database management systems, for example,
the Structured Query Language (SQL, [BC74]) has been a prominent and widely
accepted means for expressing and executing queries for many years.

Query URIs
WebData proposes a URI based approach to expressing queries against collections of
resources which is conceptually similar to aspects of SQL. It enables clients to specify
conditions in a URI. Conceptually, such a query URI (cf. 3.1) names a particular
query result and thus makes the respective collection of matching resources itself
a resource which in turn can be accessed using the uniform interface defined for
collection resources (cf. 3.3.1).
The URI specification [BLFM05] provides for queries in URIs through its definition
of the query part which can be mentioned in any URI after its hierarchical part using
the ? as a delimiter. However, neither the URI nor the HTTP specification define
a more concrete syntax, formats, or semantics for those queries. In existing Web
applications, the query part is widely used to specify any kind of parameters which
should be appended to a URI in order to retrieve a modified response. A usual
pattern for the query parameter is shown in listing 3.12 below:
Listing 3.12: Usual form for URI query parameter
query = condition *( "&" condition )
condition = key "=" value

In listing 3.12, simple key-value-pair based conditions can be mentioned and com-
bined using the logical AND operator expressed by &. Obviously, the expressiveness
of queries using this pattern is quite limited while apparently sufficient for many
cases. It can of course be used in many cases where keys and values carry additional
but domain-specific semantics, but appears rather insufficient when a global query
approach for all kinds of applications and domain models is required.
Listing 3.13 shows the grammar which WebData defines for free queries which are
designed to be globally applicable to Web resources.
Listing 3.13: Free URI queries
free_query = ["!"] ( free_query logop free_query / "(" free_query ") " /
expression compop expression )
expression = attribute / value
attribute = string ["::" string ]
value = number / " ’ " *( string / "*" / "?" ) " ’ "
logop = " ," / "&"
compop = "=" / "+=" / " -=" / "~=" / "!="
string = 1* unreserved
unreserved = ALPHA / DIGIT / " -" / "." / " _ " / "~"
number = 1* DIGIT [ "." 1* DIGIT ]

A free query is a construct consisting of one or more conditions which can be com-
bined using logical AND (using &) and OR (using ,), nested (using ( and )) and
negated (using !). Every condition is composed of two operands and a comparison
operator. Available operators are equal (=), not equal (!=), less than (-=), greater
than (+=) and like (~=, to be used for pattern matching as in SQL [BC74]). Operator
precedence is as follows (higher levels first): =, +=, -=, !=, !, &, ~=, ,.
3.3. Exposure of object-oriented domain models as Web resources (server part) 45

Every operand can either be a concrete value, a pattern (using single character ? and
multi character * wildcards) or an instance attribute as defined by the object classes
which the Web resources stand for. Note, that attributes of associated classes can
be included in a query using :: as dereference operator.
Listing 3.14 shows some sample query URIs for illustration purposes.

Listing 3.14: Sample queries


http :// example . org / webdata / products ? price -=500.00
http :// example . org / webdata / customers ? order :: price +=100000.00& email ~= ‘ * @sap .
com’
http :// example . org / webdata / orders ?( product :: gross_weight +=200 ,
express_shipping = true ) & customer :: country_code = US

WebData server side-connectors must implement a query mechanism which is able to


interpret and perform queries which are formulated according to the provided rules.
A query URI must yield a collection resource which stands for the set of objects
that match the query criteria. Request-based interaction with those resources is
described in 3.3.1. Upon failure, WebData server-side connectors must provide a
means of checking the soundness of a query URI and return a response including a
400 Bad Request status code and a description of the reason.

Finder methods
The proposed search mechanism using query URIs provides a certain amount of
expressiveness. However, query URIs are not suited for more sophisticated queries
which include complex joins from different sets of objects or special projections.
Commonly used queries should be wrapped into static finder methods on class level.
Hence, assuming that complex queries are built into object classes, those finders
can be accessed using finder URIs on WebData server-side connectors. Server-side
connector implementations must provide finder URIs for all existing finder methods
and expose a collection interface as defined in 3.3.1. Note, that in order to distinguish
finder methods from other methods, special configuration for the connector may be
needed. However, it may be more pertinent to use conventions, e.g. starting finder
method names with the imperative find if possible. Parameters which finder methods
may expect must be read from finder URIs as defined in 3.3.1. Again, a mapping of
parameter names to keys in the URI query part are most pertinent.
WebData server-side connectors must expose each finder with its own finder URI.
Connectors should use naming conventions based on a mapping between finder
method names and the findername part of the URI. The proposed mapping con-
sists in naming finder methods like find_name and deriving name as the findername
URI part. As for parameters, key query parts should be derived from parameter
names. For example, a finder method find_relevant_for_intl_vat_refund(year) defined
on a class order would be made available via the URI http://example.org/webdata/orders
/relevant_for_intl_vat_refund?year=x where x would be passed as respective value for
the year parameter to the finder method.

Client based queries


As a “last resort” for query cases where appropriate finder methods are not available
and query logic is too complex to be expressed using a query URI, clients can request
46 3. WebData

sets of resources using class, attribute, finder or query URIs and perform selections on
the retrieved collections manually. From a performance point of view, this solution
will likely be quite expensive and should be avoided, but may be pertinent in cases
where collections contain only a limited number of members.

3.3.3 Authentication and authorization for secure resource


access
When exposing entire collections of model objects from a domain model as resources
on the public Web, it is obvious that authorization for access is an important matter
with respect to security and privacy. In order to provide reasonable authorization
management, establishment of authentication mechanisms is an evident required
preceding step.

Authentication
HTTP – which all WebData interactions are based upon – provides for authen-
tication mechanisms using the Authorization: and WWW-Authenticate: headers and the
401 Unauthorized status code.

Conceptually, requests to resources can be sent with or without an Authorization:


header which carries appropriate credentials for the requested resource and speci-
fies the used authentication mechanism (i.e. possible schemes for credentials). If
the resource requires authentication and incorrect or no credentials are supplied at
all, it will return a response containing the 401 Unauthorized status code and a WWW-
Authenticate: header indicating the supported authentication scheme(s). The client
can subsequently chose to supply credentials and perform the request once again.
If the resource does not require authentication or correct credentials are supplied,
it will perform the requested operation with respect to authorization rules and the
identity of the client. Figure 3.4 illustrates authentication as defined in HTTP.

Figure 3.4: Sample request-response interaction using HTTP authentication

WebData server-side connectors must at least support the Basic Authentication


Scheme as described in [FHBH+ 99]. Note, that basic authentication should only
be used in combination with HTTP over an encrypted connection [DR06] in order
to ensure password security. Concrete implementations may propose additional au-
thentication schemes as defined in [FHBH+ 99]. Furthermore, they may connect to
3.3. Exposure of object-oriented domain models as Web resources (server part) 47

other frameworks (e.g. directory servers via LDAP) to check validity of credentials.
Also, implementations should propose a mechanism to delegate credential checking
to the actual application using the connector, i.e. using callback mechanisms.

Authorization
Conceptually, servers can use the 403 Forbidden status code to indicate that a request
is not allowed with respect to the requested resource, the request method and the
supplied credentials. Concrete server-side connector implementations must use the
403 Forbidden status code if access to a resource is denied. In order to determine
whether or not access should be allowed or denied, connectors must be configurable
by application developers using them in target applications. In order to support this,
connector implementations can use different techniques such as reading configuration
files or using annotations in the implementation of the domain model.
Regardless of the actual configuration technique and serialization format, the actual
language describing authorization must be composed of authorization rules which
contain the following types of information:

Model class An identifier which uniquely specifies a model class within the exposed
domain model or a wildcard which identifies any class.

Resource type An identifier which references one of the three resource types in
WebData (cf. 3.1), namely collection, member, and value or a wildcard.

Request method An identifier which references one of the four request methods
which WebData specifies semantics for (cf. 3.3.1), namely GET, POST, PUT, and
DELETE or a wildcard.

Outcome Either a boolean value or a conditional expression which evaluates to a


boolean value.

When receiving a request, WebData server-side connectors must evaluate a concrete


rule set sequentially in order to determine whether or not the operation can be
performed. If no rule applies, authorization must be denied. Evaluation must be
performed using matching of request information (i.e. model class, resource type,
request method ) against the rules. Subsequently, the respective outcome determines
whether or not access is granted and the operation will be performed. WebData
server-side connectors must provide a mechanism (e.g. through callbacks) to in-
terrogate the actual client application which is using the connector on whether or
not the operation can be performed and must supply information on the currently
authenticated user as well as on the requested resource with each interrogation.
For illustration purposes, listing 3.15 shows a sample configuration embedded in the
definition of an ActiveRecord [H+ 04a] class which would be suitable for a Ruby on
Rails [H+ 04b] implementation of a WebData server-side connector.
Listing 3.15: Sample authorization rule set embedded in the definition of an Ac-
tiveRecord class.
class Order < ActiveRecord :: Base
belongs_to : customer
h a s _ a nd_belongs_to_many : products
48 3. WebData

a c t s _ as_w ebdat a_reso urce : operations_allowed = > {


: collection = > {
: get = > true ,
: post = > proc { | res , user | res . customer == user } ,
: put = > false ,
: delete = > false
},
: member = > {
: get = > proc { | res , user | res . customer == user } ,
: put = > false ,
: delete = > false
}
}
end

3.3.4 Prefetching support


In order to support prefetching (i.e. the retrieval of associated resources along with
the requested one), WebData server-side connectors must supply a mechanism to
discover and deliver associated resources upon request. Whenever a request uses a
URI containing a prefetch part, WebData server-side connectors must return collec-
tion representations including the originally requested resource(s) and those which
are associated through the attributes mentioned in the prefetch part.
More specifically, while the classname, instancename1, attributename, or instancename2 (whichever
comes last) part of the respective URI imply the discovery of the main resources for
the request, the prefetch_attributename parts in the prefetch part of the URI specify
those associations which have to be considered for prefetching. The respective asso-
ciations must be evaluated for each of those resources and associated resources have
to be discovered as well.
The resulting collection of resources must be represented using a mixed collection
representation as described in 3.2.5 and subsequently returned.
For example, the URI http://example.org/webdata/customers/42-with-orders would yield
the discovery of the customer instance with is identified by 42 and all order re-
sources which are associated to that instance. Listing 3.11 in 3.2 shows a sample
representation which would represent a mixed collection according to these criteria.

3.3.5 Expiration information and cache validation


In order to support client-side caching of representations for instances in object-
oriented domain models as described in 2.2.6 and 3.4.2, WebData server-side con-
nectors should supply expiration and version information with member resources
whenever possible.

Expiration information
There are a number of ways in which expiration dates and times can be determined
which are subject to concrete server-side connector implementations.
In a number of cases, expiration dates could be determined using heuristics based on
an instance’s change history or the change history of similar instances (e.g. those be-
longing to the same class or set of instances). This is particularly useful in situations
where instances have proven to have very similar change intervals over time.
Another workable approach is to infer an instance’s expiration date and time from the
current state of the resource. Depending on the domain model, concrete instances
3.4. Object representation and mapping for REST Resources (client part) 49

may contain information which clearly define that the instance will never change
states again (e.g. a closed case, a resigned employee) or will likely change according a
defined schedule (e.g. deadlines, publication data). WebData server-side connectors
should provide a means for configuration (e.g. via callbacks) such that expiration
dates can be deferred from concrete instances of objects within the domain model
where possible.
WebData server-side connectors may determine expiration information using the
above-mentioned approaches or others, however, implementers must be aware that
wrong expiration information (i.e. dates too far in the future) will lead to possible
inconsistencies and erroneous behavior on the respective clients.
If available, expiration information for an instance must be supplied with every
response including a member resource using the HTTP Expires: header. Expiration
must then be formatted as defined in [Bra89]. As specified in HTTP, the expiration
date for an instance should be approximately one year from the time the response
is sent if the instance will never change state again.

Cache validation
Furthermore, WebData server-side connectors should supply version information
with every representation in order to support cache validation using conditional
GET operations as described in 2.2 and 3.4.2. Version information in this case must
be a symbol which uniquely identifies the current version of the resource with respect
to its change history. This is most likely to be implemented using an integer value
which is incremented at every state change the resource undergoes.
Versioning requires the connector to store extra persistent information for every
resource which may not be realizable in the underlying technology. However, ver-
sioning should be implemented by server-side connectors whenever possible. Version
information must then be supplied with every response to requests which were is-
sued against a member resource using the HTTP Etag: header to support subsequent
conditional GET requests.
Whenever a server-side connector receives a conditional GET request (i.e. in this case
a request directed against a member resource including an If-None-Match: header) the
current version of the identified instance has to be compared to the one mentioned in
the header. If both versions match, a 304 Not Modified status code must be returned
and the response must not include a response message, otherwise behavior is as
described in 3.3.1. When used by clients according to 3.4.2, this behavior results in
less transferred data if no changes have been made to the requested resources (cf.
2.2).

3.4 Object representation and mapping for REST


Resources (client part)
The WebData client-side connector is designed to be embedded in any target ap-
plication in order to perform the task of resource access4 . Conceptually, WebData
4
The WebData client-side connector is inspired by a component of the Ruby on Rails Framework
+
[H 04b], called ActiveResource which is still under development at the time of writing.
50 3. WebData

client-side connectors interact with WebData resources as specified in 3.3 and expose
them as entities within a domain model to the embedding target application. Thus,
server and client connectors together provide a transparent access channel to domain
models which are located on remote and potentially distributed servers. Roughly
speaking, WebData is able to “lift” the persistency layer to upper layers within the
architecture to give application developers a native and direct access to the domain
model.
3.4.1 Discovering, creating, reading, updating, and deleting
Web resources like objects
In order to supply an object-oriented programming interface to Web resources which
are available over HTTP and addressable via URIs (cf. 3.1) resource concepts must
be mapped back to classes, instances, methods, attributes and associations. While
the initial “entry point” into a remote domain model must be defined using a URI
the overall goal for subsequent interactions is “URI-less navigation”. URI-less means
that URIs of associated values and instances are not exposed to the embedding
application but automatically retrieved and dereferenced by the client-side connector
on demand.
In order for resources to behave like entities in a domain model, the basic concepts
of it have to be reconstructed. Concrete implementations may differ in a number of
aspects with respect to the features of the target programming language. Mainly,
a client-side connector should supply a class definition for WebData resources, such
that concrete instances can be created and stand for member resources which are
consumed from the Web. At least a default class for all objects standing for WebData
resources must be supplied. Programmatically, this class will most likely contain all
the standard behavior on class and instance level (e.g. finders, creators, save and
destroy methods, cf. 2.1.1).
It is recommended that client-side connector implementations provide a means for
application developers to obtain one class definition per class collection they intend
to use. Those class definitions can easily be inferred from the specific part of a
schema definition (cf. 3.2) and will most likely be inheriting from the default Web-
Data class. While WebData client-side connector implementations may not propose
specific class definitions this lack in type-awareness may result in confusion for ap-
plication developers which may end up as a considerable source for errors. Listing
3.16 shows sample Ruby code which application developers could use to define a
member type specific client-side class. The class Order in the listing inherits from the
default client-side class which is provided by the connector and is configured using
(a) a class URI (cf. 3.1) which can be used for creation (see below) and schema
representation retrieval and (b) credentials for HTTP access (cf 3.3.3).
Listing 3.16: Sample class collection specific class definition (Ruby)
class Order < WebDataResource :: Base
self . uri = " http :// example . org / webdata / orders "
self . credentials = {: name = > " yeah " , : password = > " secret "}
end

The following sections describe the basic behavior of the client-side connector de-
fault class5 with respect to the main aspects of resource access: discovery, creation,
reading, updating and destruction.
5
It is assumed that specific client-side classes inherit the default class’s basic behavior.
3.4. Object representation and mapping for REST Resources (client part) 51

Discovery

In order to locate resources on the Web and have them represented as objects to the
local environment, the client-side default class must provide a mechanism to perform
HTTP GET requests and instantiate local instances which stand for the retrieved
resource representations. The class must expose a class method which should be
called find() and expect a combination of the following parameters which can be
used to construct a URI:

URI The URI parameter accepts a class, finder, query or instance URI (cf. 3.1)
which must directly be used to request the resource, if no more parameters are
given. The URI parameter is optional, if a URI is given in the respective class
definition and overrides any URI within the class definition if both are given.

Complete The complete parameter accepts a boolean value. If the complete pa-
rameter is set to true, the given URI (or the URI which was derived from the
class definition, see above) must be modified such that the classname part is
preceded by "complete-" according to 3.1.

Condition The condition parameter accepts a free_query URI part as defined in


3.3.2. If a condition parameter is given it must be appended to the request
URI preceded by the ? character.

Prefetch The prefetch parameter accepts a number of attribute names identifying


associations within the requested resource. If prefetch attribute names are
given, the request URI must be modified such that the classname is followed by
a prefetch part mentioning the attribute names as prefetch_attributename parts
with respect to 3.1. See 3.4.2 to learn about prefetching as a caching strategy.

Credentials The credentials parameter can be used to supply credentials for au-
thentication with the respective resource. In common scenarios, this will be
a username and password pair. However, implementations can offer more so-
phisticated mechanisms as mentioned in 3.3.3. If credentials are given in the
respective class definition, the credentials parameter overrides them. Refer to
3.3.3 and 3.4.3 for more information.

Upon a call to the find() method, a GET request to the respective URI must be
issued. Subsequently, a local instance of the respective client-side class must be
instantiated and returned for each member representation which is included in the
respective response, loading must be carried out as described below (see Reading).
Note, that the member representations included in the response may not include
value representations. Values will then be acquired if they are accessed for the first
time following the lazy loading design pattern [GHJV00].

Discovery must return information about its success or failure using the common
error reporting mechanism of the target programming language or framework. The
type of the error should be indicated. Possible error types are defined in 3.3.1.
52 3. WebData

Creation
Creation of resources is achieved through creation of local model object instances
and must be supported through a class method on the client-side default class which
should be called create(). It must expect a class URI and optionally a number of
initial attribute-value pairs for the new resource as parameters. Note, that if the
client-side connector implementation proposes a mechanism to obtain specific class
definitions per member type, the class URI may be retrieved from the respective
class definition.
Upon method call, the client-side class must create a member representation (cf.
3.2) which reflects all given attribute-value pairs as value representations and request
resource creation using an HTTP POST on the class URI. Assumingly, the request will
yield a member representation as defined in 3.3.1. To complete the creation of the
local instance, this representation must be loaded as described below (see Reading).
Subsequently, the new instance must be returned.
A create operation must return information about its success or failure using the
common error reporting mechanism of the target programming language or frame-
work. The type of the error should be indicated. Possible errors types are defined
in 3.3.1.

Reading
Local model object instances which stand for member resources on the Web must
expose an interface to access representation data and provide for object naviga-
tion. In order for an application to interact with data carried by the instance, the
representation has to be loaded. Loading must be carried out as follows:

1. General information passed with the member representation (i.e. identifier,


title, update information, edit link, canonical link) must be stored locally and
privately to the new instance. The retrieved identifier should be used as pri-
mary key whenever referring to the instance is necessary (e.g. for caching).

2. Expiration and version information must be read from the Expires: and Etag:
headers of collection or member representations respectively and stored locally
and privately to the new instance.

3. An attribute6 must be initialized within the new instance with the respective
value which was retrieved for each value representation which is included in
the member representation.

4. If there is a schema representation (cf. 3.2.4) available through the class of this
instance, an instance attribute can be inaugurated for each attribute as defined
in the specific part of the schema even if it is not included by the member
representation. However, the attribute must be marked as non-initialized and
loaded if needed as described below.

5. An accessor must be initialized for each value link.


6
Instance attributes can be realized through private instance data structures and public accessor
methods, public attributes, or similar concepts with respect to the target programming language.
3.4. Object representation and mapping for REST Resources (client part) 53

6. An accessor must be initialized for each associated resource link.

The loading mechanism should be provided by a private instance method named load
() which expects a member representation as input. Furthermore, a public instance
method named reload() should be provided which retrieves a new representation using
a GET request to the instance’s canonical URI and then calls the loading mechanism.

Upon attribute read access on a local instance by the application, three different
situations are possible:

1. If the attribute is a simple value as represented by a value representation


within a member representation, the value must just be returned. It is be
possible that a concrete value for the attribute is not yet available locally, due
to the fact that the initial representation did not contain the respective value
representation (see Discovery). In this case, the instance must be reloaded as
described above.

2. If the attribute is represented by a value link within the respective member rep-
resentation a GET request to the value link must be performed and the retrieved
value must be returned.

3. If the attribute is represented by an associated resource link a GET request must


be performed and the retrieved representation must be used to instantiate new
local instances as described above (see Discovery). The new instance(s) must
be returned. Note, that depending on the cardinality of the association the
return value may be a single instance or a set (e.g. Array) of instances.

The load mechanism must return information about its success or failure using the
common error reporting mechanism of the target programming language or frame-
work. The type of the error should be indicated. Possible errors types are defined
in 3.3.1.

Updating

Upon attribute write access on a local instance by the application, again three dif-
ferent situations are possible:

1. If the attribute is a simple value as represented by a value representation within


a member representation, the new value must be stored with the instance
instead of the old one and must be used when a save operation (see save) is
carried out as described below.

2. If the attribute is represented by a value link within the respective member


representation a PUT request to the attribute URI in the value link must be
performed using a value representation which carries the new value for the
respective attribute.
54 3. WebData

3. If the attribute is represented by an associated resource link, four situations


are possible: (a) If the update operation is directed against a single instance
(either because the association has a cardinality of one or through identifica-
tion of a single instance within the set of associated resources) the associated
instance(s) must be acquired as described above (see Reading) and updates
must be performed on those instances accordingly. (b) If the update operation
is directed against an associated set of instances and yields the addition of
a new instance (e.g. through a push operation on an array), a POST request
has to be performed to the attribute URI in the associated resource link as
described in 3.3.1. (c) If the update operation is directed against an associated
set of instances and yields the removal of an existing instance (e.g. through
a remove operation on an array), a DELETE request has to be performed to the
associated instance URI as described in 3.3.1. (d) If the update operation is
directed against an associated set of instances and yields the replacement of
an existing instance, a PUT request to the associated instance URI containing a
member representation with the new instances identifier has to be performed
as described in 3.3.1. Note, that if multiple arrays or more complex array
operations need to be implemented, PUT requests to the base resource may be
suitable, refer to 3.1, 3.3.1, 3.6.

While in situations 2-3 immediate HTTP requests are to be performed by the client-
side connector, operations like in situation 1 are only applied to the local instance
data and no request is issued. As mentioned before, the actual modifications to the
resource which the instance stands for must be requested explicitly. This storing
mechanism must be provided by a public instance method which should be named
save(). Upon call of this method, a member representation must be constructed
which contains all values which have changed with respect to the last retrieved
representation. Subsequently, this representation has to be included in a PUT request
against the edit URI that has been attributed to the current instance.
An update operation must return information about its success or failure using the
common error reporting mechanism of the target programming language or frame-
work. The type of the error should be indicated. Possible errors types are defined
in 3.3.1.

Destruction

For local model object instances which stand for member resources on the Web,
there must be an explicit mechanism for destruction. While local objects aren’t
usually destructed explicitly in modern programming languages due to the existence
of garbage collectors, Web resources stand for objects in remote domain models
whose life cycle does foresee their destruction.
Local model object instances must provide a mechanism for destruction to be called
explicitly. The method performing this operation should be called destroy() and
perform the destruction using a DELETE request against the edit URI which is stored
with the local instance. If the target programming language for the client-side
connector implementation supports the concept of destructors, it might be pertinent
to combine this functionality with an instances destructor.
3.4. Object representation and mapping for REST Resources (client part) 55

The destroy mechanism must return information about its success or failure us-
ing the common error reporting mechanism of the target programming language or
framework. The type of the error should be indicated. Possible errors types are
defined in 3.3.1.

3.4.2 Caching
Obviously, dealing with entities in remote domain models entails constraints re-
garding application performance. It is therefore pertinent for WebData client-side
connector implementations to provide caching mechanisms in order to limit actual
remote interactions and perform a maximum of operations locally. As discussed
in 2.2.6, caching represents an essential element within HTTP. WebData client-side
connectors can make use of HTTP’s caching features to realize effective performance
improvements.

Caching local instances


The basic principle for caching of local instances is reducing the number of issued
GET requests. This is achieved by maintaining copies of instances in a central local
store which is consulted before every remote access. If a local copy is available for a
given identifier, the local copy can be used and the request can be omitted.
The central store should be realized using a data structure which permits fast random
access (e.g. a hash table) where instances7 can be stored and identified by URIs. In
addition to the behavior of client-side connectors, the following must be implemented
to support caching:

1. After loading of an instance from a complete member representation (i.e. a


member representation which contains value representations) has been com-
pleted, it should be stored in the cache so that it can be found using both
the URIs mentioned in its canonical and its edit link (cf. 3.2), if applicable.
Note, that any instance in the cache having the same identifier as the currently
cached instance has to be replaced by this operation.

2. Before implicit loading (i.e. loading which is not triggered by the explicit
reload mechanism as described in 3.4.1) of an instance from a member repre-
sentation the cache has to be queried using the URI which would be used for
the respective HTTP GET request. If the instance can be found in the cache,
the cached instance has to be inspected according to the cache replacement
strategy (see below). Depending on the outcome of that inspection, the cached
instance may be used and the load operation may be omitted.

Cache replacement strategy


The cache replacement strategy for WebData client-side connectors follows the basic
caching principles as defined in HTTP: expiration and validation (cf. 2.2).
7
Most current programming languages (e.g. Java) work with references implicitly, so that the
placement of an instance in a hash table actually references the instance from within it rather
than creating a copy. If the used programming language uses references explicitly (e.g. pointers in
C++), obviously references should be cached rather than the actual instances.
56 3. WebData

The client-side cache has to evaluate the following conditions in order to decide
whether an instance is actually returned for use by the embedding application when-
ever it has been found in the cache:

1. If the instance carries expiration information (obtained via an Expires: header)


and the instance is not yet expired (i.e. the current date and time is not yet
past the ones mentioned in the expiration information) the cached instance
must be used.
2. Otherwise and if the instance carries a version number (obtained via an Etag:
header), the instance’s canonical URI must be requested using a conditional
GET request mentioning the current known version in an If-None-Match: header
(cf. 3.3.5). If the conditional GET operation does not yield a representation (i.e.
the resource has not changed) but a 304 Not Modified status code, the cached
instance must be used, otherwise, the instance must be loaded (see Reading
above) from the representation obtained with the response.

While the replacement strategy will be appropriate in most cases, application devel-
opers must be able to express explicit reloads. Thus, WebData client-side connectors
must provide the reload mechanism (see Reading above) regardless of expiration and
version information.

Prefetch and look-ahead as caching strategies


Caching only induces an effect on the performance of an application if cached ver-
sions are available for instances which are to be requested. While the standard
caching concepts as described above are only contributing to a fast user experience
in situations where instances are accessed a second or subsequent time they are not
improving application performance when instances are initially accessed.
In order to address this problem, it is pertinent to retrieve and load representations
in advance which may be needed by the application at a later point in time. Obvi-
ously, this solution can – if not used properly – just move the point in time where
application latency due to loading occurs rather than eliminate it. However, the
overall latency can in fact be reduced by exploiting two basic types of situations:
Firstly, if instances are retrieved, it can be less time-consuming to use the same
server round-trip to retrieve more instances. Secondly, if the application is idle (e.g.
due to wait time for user interaction), more representations can be retrieved and
loaded in the background without affecting the end user experience. These two con-
cepts are manifested in two caching strategies which WebData client-side connectors
should implement: prefetch and look-ahead.
An important overall aspect of both concepts is the selection of resources which
should be retrieved and loaded in advance. A very powerful yet simple selection cri-
teria comes with the structure of WebData: associations. If a resource is associated
with another resource and thus referenced in its representation (cf. 3.2) it is likely
that the associated resource may be needed at a later point in time. While Web-
Data client-side connectors should consider associated resources as main targets for
prefetch and look-ahead, selection can be implemented based on more sophisticated
heuristics or explicit configuration.
The following paragraphs describe the two concepts in detail.
3.4. Object representation and mapping for REST Resources (client part) 57

Discovery with prefetch In order to realize prefetch, the request used for discov-
ery in the client-side connector has to be modified in a way that it requests
more resources from the server. While WebData servers could theoretically
return all associated resources at once without an explicitly modified request
from the client, the overall length of the message is very likely to become
much larger and not all representations may be needed. WebData client-side
connectors must therefore provide a mechanism which enables end application
developers to specify which resources should be prefetched. More specifically,
developers must be able to name attributes of the instances which are cur-
rently requested. This should be incorporated in the find() method’s signature
as an optional parameter called prefetch accepting an array or hash-like data
structure mentioning symbols for associations to include in prefetch. WebData
server-side connectors will respond with representations for the requested re-
source and representations for associated resources as specified (cf. 3.3.4). As
described above (see Discovery and Caching local instances), the representa-
tions have to be loaded subsequently and instances have to be stored in the
local cache. The discovery method however, must only return those instances
which were initially requested excluding all prefetched instances. Note, that it
may be pertinent for application developers to request representations which
contain complete member representations (cf. 3.2) using the complete parameter
(cf. Discovery above) for the find() method.
Listing 3.17 illustrates how the discovery functionality with prefetch could be
called from within a target application.
Listing 3.17: Sample discovery with prefetch call (Ruby)
Order . find (: all , : conditions = > " total_price +=100000" , : prefetch = > [:
customer ] , : complete = > true )

Supposed that the Order class has been defined as in listing 3.16 and the iden-
tified resource defines an attribute customer which is an association, the find()
method as in listing 3.17 would subsequently issue a GET request to the URI http
://example.org/webdata/complete-orders-with-customer?total_price+=100000 and retrieve
a mixed collection representation (cf. 3.2.5) including complete order and
customer member representations. All representations would be loaded and
instances would be cached accordingly while only order instances would ulti-
mately be returned by the find() call.
Look-ahead on idle The look-ahead on idle caching strategy takes advantage of
the fact that the target application may not be using the network connection at
certain points in time. During that time, client-side connectors may issue GET
requests in the background (i.e. in a different thread of execution). While the
basic selection mechanism for resources which should be requested is again the
association to already loaded resources, more sophisticated heuristics can be
implemented, e.g. access counters on instances can determine relevance of an
instance and thus request resources first which are associated to the resource
which this instance stands for. Subsequently, those usage statistics could be
used to define heuristics for similar instances (e.g. those belonging to the same
class).
The realization of look-ahead on idle is straight-forward. In idle periods, client-
side connectors must collect URIs leading to associated resources from already
58 3. WebData

cached instances and discover them (see Discovery above) subsequently. Note,
that for every URI, the cache must be checked first. The retrieved represen-
tations must be loaded as described above (see Reading) and placed in the
cache. Connector implementers should pay attention to the fact that look-
ahead may produce a lot of network traffic and should consider providing lim-
ited bandwidth for look-ahead. Furthermore, look-ahead should abort imme-
diately when other resources are requested explicitly by the target application.
Look-ahead should be implemented recursively, meaning that after an associ-
ated resource has been loaded and cached, it can again be inspected for links
to its associated resources. However, look-ahead on already cached instances
should be completed before more levels of recursion are entered. Client-side
connectors should provide a mechanism to application developers to configure
whether look-ahead must be used, on which classes and attributes it should be
effective and up to which level of recursion look-ahead should operate.
It may be pertinent to integrate these configuration options in class definitions
of classes which are derived of the default WebData client class. Listing 3.18
illustrates how the order class could be configured.
Listing 3.18: Sample class collection specific class definition with look-ahead
configuration (Ruby)
class Order < WebDataResource :: Base
self . uri = " http :// example . org / webdata / orders "
self . credentials = {: name = > " yeah " , : password = > " secret "}
self . lookahead = [: products , : customer ]
end

Line 4 in listing 3.18 defines that look-ahead should be carried out on order
instances following the products and customer associations.

WebData client-side connectors must be designed to operate on local instances and


synchronize potential modifications at specified points in time as defined above (see
Updating). Inconsistency problems which may arise due to this behavior will not be
addressed by the caching mechanisms described here. Conflict detection for these
cases is described separately in 3.6.1.
3.4.3 Authentication support
Most WebData servers will require clients to authenticate before actual access to re-
sources is granted. Refer to 3.3.3 for more information on how server-side connectors
will handle authentication.
In order to authenticate with servers, client-side connectors must be able to supply
credentials with a request if the server expects it. Therefore, they must be able
to acquire credentials from the target application. This can be realized in class
definitions or directly as a parameter while discovery (see above). WebData client-
side connectors must at least support the Basic Authentication Scheme as described
in [FHBH+ 99]. Note, that basic authentication should only be used in combination
with secure HTTP [DR06] to ensure password security. In order to do that, class
definitions and the discovery mechanism must at least allow the target application
to pass a username and password parameter. Both must be transformed and sent
using an Authorization: header if authentication is needed, refer to [FHBH+ 99] for
more details.
3.5. Request-based content negotiation for representation formats 59

3.5 Request-based content negotiation for repre-


sentation formats
In order to support different types of clients and servers and their specific needs and
requirements, HTTP proposes a mechanism for content negotiation. As mentioned in
2.2.4 clients can specify the content formats (i.e. the format of representations) they
can operate on in a prioritized order and both clients and servers have to label the
supplied representations with such a content type. WebData implementations can
make use of this mechanism to achieve extendibility with respect to representation
formats. While the WebData Format must be understood by all implementations
and should be used as a standard means of communication, other formats can be
defined and used. In some environments it may be pertinent to use representations
which are not based on XML but rather on JSON [Cro06] or YAML [BKEI02] simply
because of the fact that serialization and deserialization functionality is more easily
available. Furthermore, it may be advantageous to support existing REST based
servers or clients.
Content negotiation should be based on the standard media types as defined in
[Pos94] and must be carried out using the defined HTTP header fields: every request
or response which includes a representation must specify its media type using Content-
Type:. Additionally, requests should specify which media types they expect to receive
using Accept: in a prioritized order and separated by a ’,’ sign. WedData server-
side connectors must try to satisfy the requested media type preferences in their
respective order of priority. In case, the requested media type cannot be supplied
by a WebData connector server-side implementation, a 406 Not Acceptable status code
must be returned. Even though, HTTP specifies a more sophisticated mechanism
for relative qualities, those do not have to be implemented by WebData connectors.
Please refer to [FGM+ 99, 14.1] for more information.
As stated before, WebData connectors must be able to serialize and deserialize rep-
resentations in WebData Format which must be expressed using the media type
application/atom+xml in Accept: and Content-Type: headers. In order to receive the Web-
Data schema representation for a particular resource exposing a class (cf. 3.2.4), the
media type application/relax-ng-compact-syntax must be used8 .

3.6 Concurrency and transactional behavior


3.6.1 Lost updates and optimistic concurrency
Exposing a domain model in a distributed and heterogeneous environment such
as the Web poses a number of different problems, especially when write access is
permitted to clients. One prominent example is called the lost update problem as
described by [FL99]. It is manifested in situations where two clients are trying to
update a resource based on an older representation which they received earlier. Both
clients may have performed different updates on the representation and may want
to request an according status change from the resource using two requests. As a
8
Please note, that the application/relax-ng-compact-syntax media type is not yet registered as
standard MIME media type, but a request for registration has been approved at the time of this
writing as announced in http://www1.ietf.org/mail-archive/web/ietf-announce/current/msg02054
.html.
60 3. WebData

result, the first request will change the resource’s state at first, but that state change
will be overridden by the one which follows the first request. The problem here is
that the second change is not based on the current state of the resource but rather
on an old representation of it and may thus differ from the actual state change which
the second client intended.
In order to overcome this issue, basically two different approaches are possible which
are widely referred to as pessimistic and optimistic. When using a pessimistic ap-
proach the client locks the resource (i.e. transfers it into a state where the client has
exclusive access) on fetching of a representation for editing purposes and does not
release the lock before the actual update operation has been performed. As a con-
sequence, resources may be blocked for quite a long time and the lock may span the
whole period of editing which will most likely include user interaction. Obviously,
this approach will not result in a very pleasant user experience for highly distributed
landscapes such as the Web, where the number of concurrent users is potentially
infinite. The optimistic approach however does not involve locking mechanisms.
Instead, it uses versioning which allows resources to detect a possible lost-update
problem before actually performing an update operation: With every representation
that a resource sends to describe its current state, it includes a symbol identifying
its current version. Update operations will only be performed by the resource if the
update request includes a symbol which identifies the exact current version of the re-
source. Subsequently, the resource updates its version information. Thus, if another
update operation has changed the resource in between, subsequent update operations
based on older representations will fail. Note, that in this case, the client application
is only notified of the conflict and that the update operation has not been performed.
Hence, the client should get a new representation and re-request the state change
based on the representation. This may likely involve re-interrogating the respective
user. Furthermore, it is not guaranteed that the next update request will succeed.
HTTP provides means to help servers detect the lost update problem using the
optimistic approach. According to HTTP, servers can use the Etag: and the Last-
Modified: header to propagate a resource’s current version or timestamp respectively.
Subsequently, clients can request updates which are bound to a condition using
If-Match: and If-Unmodified-Since: as headers. However, because of the fact that
in the aforementioned solutions clients can decide whether or not they want to
perform a conditional request, WebData uses a slightly different realization of the
optimistic approach which has previously been described by [Goo07] as optimistic
concurrency control : a resource’s version information9 is expressed as a single non-
negative integer in every edit link (cf. ’versioned instance URIs’ in 3.1 and ’edit
links’ in 3.2.2). This version number is incremented by 1 at every successful update
operation on a resource and update operations (i.e. PUT requests) can only be directed
against those versioned instance URIs.
Concrete WebData server-side connector implementations should implement opti-
mistic concurrency for their resources. However, it is recognized that this requires
the connector to store extra persistent information for every resource which my not
be realizable in the underlying technology. Furthermore, connector implementations
can leave the choice on whether or not to use optimistic concurrency to the respec-
9
Identical version numbers can be used for cache validation (cf. 3.3.5) and optimistic concur-
rency.
3.6. Concurrency and transactional behavior 61

tive developers who use the connector. Configuration may then be turned on and off
on a per-model-class basis. If optimistic concurrency is implemented by a connector
and used for a model class a, it must be used consistently, which has a number of
implications:

1. Version numbers must be stored by the underlying technology on a per-instance


basis.

2. All representations for resources exposing model objects of a must contain a


versioned instance URI or versioned associated instance URI as edit link (cf.
3.2) instead of a non-versioned instance URI or associated instance URI.

3. All representations for resources standing for model objects of any other class
b must contain version1 (cf. 3.1) in their edit link if b has an association to a
and the representation is referenced to in the context of a as described in 3.2.

4. In representations and requests the latest known version numbers must be used
for version1 and version2 URI parts respectively. Thus, clients must use the last
version number they received for that particular resource and servers must use
the last version number they assigned to that particular resource during an
update operation.

5. If a server-side connector implementation receives a request which is directed


to a URI which is referencing a resource standing for a model object of a and
containing a non-current or no version number at all, it must not perform the
requested action as described in 3.3.1 and must return a 409 Conflict status
code instead.

Figure 3.5 illustrates optimistic concurrency as defined for WebData using a basic
example with two clients and a resource.

Figure 3.5: Sample request-response interaction using optimistic concurrency


62 3. WebData

3.6.2 Transactions involving multiple resources


When referring to data access where several clients are accessing resources potentially
concurrently such as in WebData, mechanisms for concurrency control cannot be
limited to scenarios where only one single resources is involved at a time. Moreover,
the classical problem of transactions involving two or more resources has to be taken
into account. The problem is outlined very easily: in a number of cases clients may
want to perform a number of operations on different resources. While each of those
operations can fail for a particular non-foreseeable reason (e.g. violated constraints,
network outages) the overall operation as a whole (i.e. the transaction) must either
be carried out completely (i.e. on all participants) or not at all (i.e. atomicity).
Furthermore, concurrent operations from a third party on involved resources should
not occur during the course of the transaction. The latter issue is similar to the
lost-update problem and could be resolved using the proposed optimistic approach.
However, an operation which is aborted due to a detected conflict on a particular
resource must as well lead to the abortion of the complete transaction.
In order to realize such behavior for resources, a number of different mechanisms
have evolved, in particular in the scope of database management systems. 2-Phase-
Commit (2PC) is obviously the best known algorithm for distributed transactions.
As indicated by its name, 2PC ensures atomicity in two phases where a central
transaction coordinator requests update operations from all participating resources
and expects them to enter a locked state, remember their current state, perform
the operation and return either a success or failure message with respect to the
outcome of the operation. If all participants report success, the coordinator sends a
finalization (i.e. commit) request to all participants who subsequently release their
locks. If any of the participants reports failure, the coordinator sends an undo (i.e.
rollback) request to all participants who subsequently return to their previous state
as remembered before and release their locks. While an algorithm like this could
theoretically be used for WebData resources in order to support atomic transactions,
this specification does not define a concrete mechanism for distributed transactions.
This is due to the fact that all considerable techniques rely on locking of resources
which is considered harmful [Hel07] on very large and decentralized systems as the
Web for a number of reasons, but mainly the huge risk of vulnerability to denial-of-
service attacks.10
However, there are cases, where operations on multiple resources can be carried out
in an atomic way: if all resources stand for model instances which are managed by
the same infrastructure (i.e. persistence framework with RDBMS) beyond a sin-
gle WebData server-side connector instance and the infrastructure supports atomic
transactions, atomicity can be delegated to it. Section 3.3.1 specifies this behavior.

10
The author does not dispute that a workable and suitable approach to atomic transactions in
the context of WebData (e.g. 3-Phase-Commit, [SS83]) can be established but explicitly regards
this as out of scope for his thesis work.
4. WebData for the SAP R
Cross
Application Timesheet
architecture

This chapter describes how WebData can be employed in a concrete scenario which
has been provided by SAP 1R
, the World market leader in enterprise software.
First, the general Cross Application Timesheet (CATS, [SAP01b]) component is
briefly introduced from an end-user’s point of view and the use of widgets as an
alternative for a richer and more focused user experience is motivated. One of
TM TM
SAP R
’s interaction mechanism for ABAP -based mySAP Business Suite com-
ponents, namely Business Application Programming Interface (BAPI R
, [SAP01a])
is described in general and the specific operations for interactions with CATS are
outlined. Then, the system architecture for a CATS widget scenario is described and
a possible realization for a server-side WebData-enabled wrapper and a client-side
WebData-enabled widget is presented.

4.1 SAP R
Cross Application Timesheet and the
Business Application Programming Interface
TM
This chapter introduces CATS and BAPI R
. CATS is SAP ’s solution for personnel
time management which integrates with a number of worktime-related components.
TM
BAPI R
is SAP ’s business application programming interface which developers can
TM
use to interact with SAP components.

4.1.1 Personnel time management and Cross Application


Timesheet
The CATS component serves as a single point of entry for all worktime-related
data. Employees as well as employers can use CATS to plan, record, audit and
1
SAP, ABAP, BAPI, and mySAP are the trademarks or registered trademarks of SAP AG in
Germany and in several other countries.
64 4. WebData for the SAP
R
Cross Application Timesheet architecture

manage time related information. CATS is a cross application component within


TM TM
the mySAP Business Suite. It integrates with other mySAP modules, e.g. HR,
Financial Accounting, Project System, Plant Maintenance, etc. A specific use-case
which is relevant to employees which are reporting their own working time data is
time entry. CATS offers transactions and dialogs within the SAP R
GUI, SAP R
’s
general user interface application, for this purpose. However, time entry is be-
ing viewed as a repeating and cumbersome task by many users. Too many steps
have to be performed to fill out a time sheet which leads to utilization of external
recording mechanisms (i.e. pen and paper) and periodical synchronization with the
SAP R
system. Figure 4.1 displays a screenshot of the user transaction CATSXT which
can be used to report time data.

Figure 4.1: SAP


R
transaction CATSXT for time entry

To enter time information using CATSXT, several steps have to be performed for each
TM
activity type after the user has logged on to mySAP using the SAP R
GUI: first,
working dates have to be selected from the calendar on the left hand side, then start
and end times have must be entered using number keys for each day separately and
short text descriptions can be given on the right hand side. Next, the entries have
to be checked and copied to the time entry clipboard using the clock symbol above
the time entry lines. Then, the transaction can be saved and closed using the save
button in the toolbar at the top of the screen.
While the presented transaction allows for detailed entry and promises to allow
users to record data for every possible situation including business trips, missing
days, split recordings for different cost centers, etc. it is obviously desirable to have
an at-hand tool whose functionality is limited and focused to the respective day-to-
day use-case. Widgets have emerged as a light-weight approach to handling very
4.1. SAP
R
Cross Application Timesheet and the Business Application
Programming Interface 65

focused similar user tasks on a regular basis. Widgets can be realized as always-on
mini applications which reside on web pages or an end user’s desktop. On demand,
they can be brought up and offer instant interaction with local data or remote back
end systems. A time-entry widget would reside on the desktop, offer real-time time
entry and thus replace pen and paper recording while saving synchronization time.
As widgets are light-weight user space applications which are often realized in client-
TM
side scripting languages such as JavaScript, interaction with the mySAP backend
TM
should be realized in a suitable way, i.e. WebData. However, the mySAP Business
Suite currently does not offer ways of interaction which are suited for those settings.
Instead, the standard way of interacting with CATS is its business API (BAPI R
).
The concept of BAPI
R
s in general is described in the next section.

TM
4.1.2 mySAP integration through the Business Applica-
tion Programming Interface
The Business Application Programming Interfaces (BAPI R
s) are the designated
TM
standard interfaces to mySAP components and modules which are written in ABAP,
SAP R
’s programming language for business applications. They are used for inter-
action between a number of SAP R
components and are supposed to serve as a single
point of entry for third party solutions and applications. Interaction with BAPI R
s
is designed to be network-enabled which means that the TCP/IP protocol suite is
used to realize communication, such that BAPI R
s can be used in local area net-

R
works and the global Internet. BAPI s are meant to allow for integration at the
business rather than at a technical level with respect to granularity of tailoring of
the BAPI R
functions.
TM
Entities within the mySAP business suite are designed around the principles of
object-orientation, meaning that autonomous entities in terms of functionality and
data are bundled together to reduce complexity. SAP R
’s Business Object Types de-
fine the different types for those objects. However, in order to support both object-
oriented and non-object-oriented environments on a BAPI R
’s client-side, BAPI
R
s
are designed as methods on Business Object Types and can be used without the
notion of actual classes or instances. Translated to terminology of object-oriented
programming languages this means that BAPI R
s are defined as class methods on ob-
ject classes rather than on instances. When operating on concrete instances, primary
keys or other identifiers have to be used to perform instance-related functionality
which would have been defined as instance methods in classical object-orientation.
Furthermore, associations between objects of different types are not defined explic-
itly but have to be reconstructed by BAPI R
clients by acquiring primary keys and
R
using the respective BAPI for the object type of the associated objects.
In order to execute BAPI R
methods, SAP R
’s remote function call (RFC) mecha-
nism has to be used. RFC’s are basically remote ABAP procedure calls which can
have multiple parameters. Following the principles of ABAP, a parameter is either
an import, export, changing or table parameter. Import and export are those con-
cepts which are referred to as input parameters and return values in many other
programming languages. A changing parameter is a parameter which servers both
for importing and exporting data to a function and a table parameter accepts two-
dimensionally structured records (i.e. tables) of data both for input and output.
66 4. WebData for the SAP
R
Cross Application Timesheet architecture

Figure 4.2: CATS BAPI


R

The CATS BAPI


R
is defined as shown in figure 4.2.
The two classes in figure 4.2 define the basic methods for interaction on domain
models as described in 2.1.1. Note, that there are no attributes defined in figure
4.2 due to the fact that interaction is rather based on the class methods and their
parameters than on actual object instances. Each of the methods requires a number
of rather complex parameters which are referenced in [SAPb], [SAPa] and used to
acquire and update information on actual time sheet data. Essentially, change() and
delete() both require a primary key value identifying a time sheet record and insert
() and change() both require a number of properties describing a time sheet record
(e.g. employee number, cost center, start time, end time, activity type, etc.). Some
TM
of which are keys for other objects within mySAP , however associations are not
expressed explicitly as mentioned before. Instead, a number of different BAPI R
s
have to be identified and used with respect to their interface definition. For this
case study, the BAPI R
Employee and CostCenter will have to be used in addition

R
to the CATS BAPI which are referenced in [SAPc], [SAPd].

4.2 A widget for time entry through CATS


The WebData-enabled CATS widget solution components are structured as depicted
in figure 4.3.

Figure 4.3: CATS widget system architecture

Here, end users can use their respective widget instances to access the CATS widget
TM
server which connects to the mySAP servers running the CATS application. The
CATS BAPI R
wrapper is in charge of providing an object-oriented domain model
while accessing the BAPI R
and restructuring data. The WebData server-side con-
nector interacts with the domain model and exposes its entities as resources on the
Web. Client-side connectors in turn, access resources from within the widgets in-
stances on the end user’s desktops and reconstruct the domain model for application
code which realizes the widget logic that builds up the actual widget.

4.2.1 Domain model


As outlined in the previous section, the CATS BAPI R
does not provide an object-
oriented domain model as would be necessary to serve as a foundation for Web-
Data. However, necessary information can be obtained from the BAPI R
s in order
4.2. A widget for time entry through CATS 67

to construct the necessary classes, instances, attributes and associations accordingly.


Hence, the course of this case study includes the definition of a domain model which
is not realized using Active Record as described in 2.1.1 but as a wrapper for the
respective BAPI R
s according to the Data Mapper pattern also described by Fowler
in [Fow02, p. 165].
The resulting domain model which is exposed to a WebData server-side connector
is illustrated as a class diagram in figure 4.4.

TM
Figure 4.4: Domain model for time entry using CATS in mySAP

A time sheet entry as depicted in figure 4.4 has to be recorded with times and dates
when work started and has finished and the activity which has been performed.
Furthermore, a time sheet entry must be associated to the employee who performed
the work and the cost center which the activity has to be billed to. An employee
record as exposed to the WebData connector contains the employee’s real name
and associations to the time entries for that employee. A cost center model object
exposes its description as acquired from the BAPI R
and its associations to employees
and reported time entries. The behavior carried out by the find(), create(),save(), and
destroy() methods is subject to custom implementation which accesses the respective
BAPI R
s accordingly.

4.2.2 Authorization
In order to establish secure interaction with CATS and to prevent misuse, the fol-
lowing access rules must be used to configure the respective WebData server-side
connector with respect to the definitions in 3.3.3:

1. POST is allowed on the timesheet entry collection.


2. GET is allowed on the timesheet entry collection and its members if they are
associated with the authenticated user.
3. PUT, DELETE
are allowed on timesheet entry members if they are associated with
the authenticated user.
68 4. WebData for the SAP
R
Cross Application Timesheet architecture

4. GET is allowed on the employee entry collection and its members if they are
representing the authenticated user.

5. GET is allowed on the cost center entry collection and its members if they are
associated with the authenticated user.

6. All other operations on all resources are not allowed.

4.2.3 Widget application


The actual widget is mainly composed of a user-friendly human interface and pre-
sentation logic which manages interaction with the end user. The interaction with
CATS data is significantly simplified by the WebData client-side connector which
provides the exact domain model as depicted in 4.4. The connector provides for
necessary interaction behavior as described in 3.4 (i.e. discovery, creation, memo-
rization and destruction). All coding implementing the widget will thus be focused
on presentation and widget-related matters and is completely ignoring any kind of
network interaction, value serialization or function calls.

Figure 4.5: CATS widget screenshot

Figure 4.5 shows a time entry widget which has been developed throughout this case
study.
5. Conclusion and related work

This chapter presents related work and draws conclusions from the thesis.

5.1 Related Work


The following sections represent a non-exhaustive enumeration of concepts, products
and research work related to this thesis, including a brief description and positioning
with respect to the work presented here.

5.1.1 Google Data API


Google which also offers a large number of Web based end-user applications that are
not directly related to search has defined the Google Data API (GData) [Goo07].
In order to provide a unified way for third party application developers to use and
leverage the vast amount of data which is stored by Google’s applications, GData
provides an API for querying and updating data in a large number of Google’s
services.
GData is based on both the Atom 1.0 [NS05] and RSS 2.0 [Win03] syndication for-
mats and the Atom publishing protocol (Atompub, [GdH06]) in a way that embraces
and extends the standards. GData feeds are either valid RSS 2.0 or valid Atom 1.0
messages and the publishing model behind GData conforms to the specifications in
Atompub. Extensions to the existing standards are fulltext query support, optimistic
concurrency and authentication.
While GData in fact offers fulltext queries for their feeds, they do not actually pro-
vide a means to perform structured queries as WebData does. GData defines a
number of different fields which can be queried against (e.g. author, update date/-
time) yet these are domain-agnostic, meaning that fields in data objects which are
represented can not be included or excluded from queries explicitly. Furthermore,
queries in GData can not take associations between data objects into account. The
optimistic concurrency approach which has been chosen for GData is the same as in
WebData, as WebData has in fact been inspired by GData to this extent. For authen-
tication, GData chooses a different approach which is based on a single time login
70 5. Conclusion and related work

procedure and generation of a login token plus session state kept on the (Google)
servers. As outlined in 3.3.3 WebData’s authentication concept is closer to the
authentication mechanisms provided by HTTP and adheres to the stateless com-
munication requirement [Fie00, 5.1.3] of representational state transfer (which is
violated for authentication by GData) and for the sake of scalability.
In addition to the abovementioned aspects, WebData addresses a number of aspects
which have not been considered for GData. While GData only defines a syndication
format and a publishing protocol, no mapping to domain models is given – first class
citizens in GData are pure XML messages.
GData is not open with respect to the fact that serving GData is completely governed
by Google. However, a number of client-side libraries are supplied which allow
applications to consume GData.

5.1.2 Queso
Queso is a semantic Web/Web 2.0 server which is being developed as a research test
bed by Elias Torres, et al. at IBM. Queso implements Atompub and is coupled to
an RDF [W3C99b] server for persistent storage and using the AtomOwl Vocabulary
Specification [AS06]. In addition to Atompub as means of communication, Queso
offers a SPARQL [SP06] endpoint for querying the data triples stored on the server.
While Queso offers an Atompub implementation which is supposed to adhere strictly
to the standard, it does not consider object-oriented domain models as a layer under-
neath, but RDF. It offers a Web based generic user interface which can be employed
to query and browse data which is stored on the server. Client applications can
access the server through Atompub and SPARQL which enables interaction with
client-side technologies such as JavaScript through a SPARQL Javascript Library
which was conceived by Lee Feigenbaum, et al.

5.1.3 Service Data Objects


Service Data Objects started as a specification for Java and C++ based software
systems which yields at unifying data access from different source technologies and
platforms. SDO has been initially specified as a joint effort of IBM and BEA Systems
and is currently being specified under the name JSR 235 using the Java Commu-
nity Process (JCP, [JSRa]). The SDO architecture defines a structure of clients,
data mediator services, and data sources whose interplay leads to seamless access
to heterogeneous data sources from within a Java client. SDO clients are supposed
to use a client-side programming interface to work on abstract data graphs which
are provided by data mediator services. Mediator services have to be developed as
“adapters” for different data source technologies. They take care of providing data
graphs as representations of data to clients and they will synchronize modifications
to those graphs back to the data sources. While specific data mediator services
are not defined by the SDO specification, concrete implementations are needed to
connect to data sources such as JDBC, EJB, XML, or Web services.
SDOs carefully consider the specifics of object-orientated domain models and provide
for associations between object instances (i.e. data graphs) while virtualizing the
source from which data originates. In contrast to WebData however, the SDO
5.2. Conclusion 71

specifications target uniquely the Java programming language – specifically in J2EE


environments and C++. Furthermore, SDO does not employ the Web as means
of transportation for resource representations but rather leaves remoting and data
transportation to mediator service implementations. Consequently, all other inter-
application communication issues such as authentication, concurrency, etc. are not
addressed.

5.1.4 Others
Web Application Description Language (WADL)

The Web Application Description Language (WADL, [Had06]) as proposed by Marc


Hadley aims at describing interfaces which Web applications propose. It is focused
on interfaces that follow the REST principles and use HTTP and XML. Thomas
Steiner has released the REST Describe toolkit [Ste07] which enables the user to
generate WADL desciptions based on existing applications. This approach is fun-
damentally different to WebData where interfaces are dynamically created based on
existing domain models. However, describing WebData interfaces using WADL may
be possible and pertinent if WADL gains broader acceptance.

acts as resource

acts as resource is a plugin for the Ruby on Rails Web application framework that
enables automated URI matching for “nested resources” which are similar to asso-
ciated URIs as described in 3.1. It does not offer interactions with resources using
representations.

5.2 Conclusion
The work presented in this thesis defines a middleware for RESTful application
integration which is composed of two main types of system components.
WebData server-side connectors are language and domain independent components
which are meant to be plugged into any application which is built around some sort of
domain model that is persistently stored. The most prominent and straightforward
example for domain model implementations is Active Record (as described in 2.1.1),
but domain models can be realized using more exotic technologies and concepts.
All of which can be accessed by WebData connectors as long as they provide a
consistent and coherent object-oriented interface to their environment. WebData
server-side connectors are subsequently able to access domain models using object-
oriented concepts and – by investigating on their structure (i.e. classes, instances,
attributes, and associations) – provide an HTTP based working point for clients
on the World Wide Web which is based on the concepts of representational state
transfer (REST). By providing URI references for domain classes and objects (cf.
3.1) and by implementing the defined operation semantics (cf. 3.3.1) connectors
expose these entities as Web resources which will receive and send representations
of current and intended state (cf. 3.2) respectively. Connectors have to adhere to
access restrictions as required by the application and its authorization model which
depends on the domain and types and roles of users (cf. 3.3.3).
72 5. Conclusion and related work

WebData client-side connectors have been defined to provide arbitrary applications


with a means for accessing resources on the World Wide Web in the most natural and
non-disruptive manner: through object-oriented concepts. Client-side connectors are
meant to be employed by application developers who face the task of integrating Web
resources exposed by other applications on the Internet. The provided programming
interface is designed to replicate the well-known concepts of a domain model by
encapsulating HTTP based resource interaction. Hence, client-side connectors are
spanning an arc between persistency layers on distributed and remote systems and
the local application by utilizing the Web and its very foundation (i.e. the REST
principles) as a virtually invisible means of transport. Client-side connectors are
providing a mesh of classes, instances, attributes and associations just as the domain
model would on the originating application. This is achieved by leveraging the object
interaction concepts as defined in Active Record (i.e. discovery, creation, reading,
updating, destruction, cf. 3.4.1) for the WebData connectors, and by embracing
HTTP’s caching strategies (cf. 3.4.2) as described in 3.4.
In order for WebData servers and clients to be able to produce and understand
the exchanged representations, this thesis defines an XML/Atom based serialization
format (the WebData Format) which acts as a default “language” of communication
between WebData partners. Other ways of establishing message formats between
WebData clients and servers have been outlined (cf. 3.2) as well as how clients and
servers can negotiate which formats they can produce and consume.
Aspects of concurrency control and transactionality for WebData interactions have
been discussed. For each of the presented scenarios, solutions have been described
which have to be implemented in connectors in order to handle situations where
multiple clients access resources concurrently while depending on atomicity.
The defined concepts have been brought to a real-world scenario provided by SAP R
,
R
the market leader for many aspects of large enterprise software. SAP ’s cross appli-
cation timesheet (CATS) architecture was subject to investigation for applicability
of the presented middleware. A case-study showed how an object-oriented domain
model has to be defined to wrap around the CATS data structures and to provide a
self-contained comprehensive data model. A prototypical WebData server-side con-
nector has been plugged into the application and accessed the object-oriented CATS
data while exposing it as Web resources on the Web. Moreover, it was outlined
how a client-side time reporting widget can access the CATS Web resources using
a prototypical client-side connector in a JavaScript environment in order to record
live timesheet data from an end user’s desktop.
Bibliography

[AS06] Danny Ayers and Henry Story. Atomowl vocabulary specification.


http://bblfish.net/work/atom-owl/2006-06-06/, June 2006.

[BC74] R. F. Boyce and D. D. Chamberlin. SEQUEL: A structured english


query language. In ACM SIGMOD, May 1974.

[BKEI02] Oren Ben-Kiki, Clark Evans, and Brian Ingerson. YAML ain’t markup
language (YAML) (tm) 1.0. Working draft, YAML.org, July 2002.

[BLFM05] T. Berners-Lee, R. Fielding, and L. Masinter. Uniform resource identi-


fier (URI): generic syntax. RFC 3986, Internet Engineering Task Force,
January 2005.

[BM04] Paul V. Biron and Ashok Malhotra. XML schema part 2:


Datatypes second edition. W3C recommendation, W3C, October 2004.
http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/.

[BN84] Andrew D. Birrell and Bruce Jay Nelson. Implementing remote pro-
cedure calls. ACM Transactions on Computer Systems, 2(1):39–59,
February 1984.

[Bra89] R. Braden. Requirements for internet hosts - application and support.


RFC 1123, Internet Engineering Task Force, October 1989.

[Bur00] S. Burbeck. The tao of e-business services. IBM Corpo-


ration, http://www-4.ibm.com/software/developer/library/ws-
tao/index.html, 2000.

[CCMW01] E. Christensen, F. Curbera, G. Meredith, and S. Weerawarana. Web ser-


vices description language (WSDL) 1.1. http://www.w3.org/TR/wsdl,
2001.

[Che76] P. Chen. The entity-relationship model - toward a unified view of data.


ACM Transactions on Database Systems, 1(1):9–36, 1976.

[CHET03] Kishore Channabasavaiah, Kerrie Holley, and Jr. Edward Tuggle. Mi-
grating to a service-oriented architecture. Technical report, IBM Inc.,
December 2003.

[Cla01] James Clark. RELAX NG specification. Organization for the Advance-


ment of Structured Information Standards (OASIS), Committee Spec-
ification, December 2001.
74 Bibliography

[CO97] D. Crocker and P. Overell. Augmented BNF for syntax specifications:


ABNF. RFC 2234, Internet Engineering Task Force, November 1997.

[Cra06] Duncan Cragg. Strest (service-trampled rest) will break web


2.0. http://duncan-cragg.org/blog/post/strest-service-trampled-rest-
will-break-web-20/, January 2006.

[Cra07] Duncan Cragg. Business functions | the rest dialogues. http://duncan-


cragg.org/blog/post/business-functions-rest-dialogues/, January 2007.

[Cro06] Douglas Crockford. The application/json media type for javascript ob-
ject notation (JSON). Internet informational RFC 4627, July 2006.

[DN66] O. J. Dahl and K. Nygaard. SIMULA – an ALGOL-based simulation


language. Communications of the ACM, 9(9):671–682, September 1966.

[DR06] T. Dierks and E. Rescorla. The transport layer security (TLS) protocol
version 1.1. RFC 4346, Internet Engineering Task Force, April 2006.

[ECM99] ECMA. ECMAScript language specification, December 1999. ECMA


Standard 262, 3rd Edition.

[FGM+ 99] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, and


T. Berners-Lee. Hypertext transfer protocol – HTTP/1.1. RFC 2616,
Internet Engineering Task Force, June 1999.

[FHBH+ 99] J. Franks, P. Hallam-Baker, J. Hostetler, S. Lawrence, P. Leach, A. Lu-


otonen, and L. Stewart. HTTP authentication: Basic and digest access
authentication. RFC 2617, Internet Engineering Task Force, June 1999.

[Fie00] R. Fielding. Architectural Styles and the Design of Network-based Soft-


ware Architectures. PhD thesis, University of Califormia, Irvine, USA,
2000.

[FL99] Henrik Frystyk Nielsen and Daniel LaLiberte. Editing the web: Detect-
ing the lost update problem using unreserved checkout. World Wide
Web Consortium Note, May 1999.

[Fow02] Martin Fowler. Patterns of Enterprise Application Architecture. Addi-


son Wesley, Reading, Massachusetts, November 2002.

[GdH06] Joe Gregorio and Bill de Hóra. The atom publishing protocol. Internet
Draft draft-ietf-atompub-protocol-12, December 2006.

[GHJV00] Gamma, Helm, Johnson, and Vlissides. Design Patterns Elements of


Reusable Object-Oriented Software. Addison-Wesley, Massachusetts,
2000.

[Goo07] Google Inc. Google Data APIs Protocol Reference, 2007.

[H+ 04a] David Heinemeier Hansson et al. Active record — object-relation map-
ping put on rails. http://ar.rubyonrails.com/, 2004.
Bibliography 75

[H+ 04b] David Heinemeier Hansson et al. Ruby on rails - web development that
doesn’t hurt. http://www.rubyonrails.org/, 2004.

[Had06] Marc J. Hadley. Web Application Description Language (WADL).


Technical report, SUN Microsystems, April 2006.

[HB07] Paul Hoffman and Tim Bray. Atom publishing format and protocol
working group charter. http://www.ietf.org/html.charters/atompub-
charter.html, 2007.

[Hel07] Pat Helland. Life beyond distributed transactions: an apostate’s opin-


ion. In CIDR, pages 132–141. www.crdrdb.org, 2007.

[HTBL06] Dave Hollander, Richard Tobin, Tim Bray, and Andrew Layman.
Namespaces in XML 1.0 (second edition). W3C recommendation,
W3C, August 2006. http://www.w3.org/TR/2006/REC-xml-names-
20060816.

[IM07] Masayasu Ishikawa and Shane McCarron. XHTMLTM 1.1 - module-


based XHTML - second edition. W3C working draft, W3C, February
2007. http://www.w3.org/TR/2007/WD-xhtml11-20070216.

[ISO84] ISO. Information processing systems – OSI reference model, interna-


tional standards organization. Technical Report 7498, ISO, October
1984.

[JSRa] JSR 235: Service Data Objects. http://jcp.org/en/jsr/detail?id=235.

[JSRb] JSR 311: Java API for RESTful Web Services.


http://jcp.org/en/jsr/detail?id=311.

[Luo98] Ari Luotonen. Tunneling TCP based protocols through web proxy
servers. Internet Draft, August 1998.

[MLM+ 06] C. Matthew MacKenzie, Ken Laskey, Francis McCabe, Peter F Brown,
and Rebekah Metz. Reference model for service oriented architecture
1.0. Technical report, OASIS, 2006.

[Moc87a] P. Mockapetris. Domain names - concepts and facilities. RFC 1034,


Internet Engineering Task Force, November 1987.

[Moc87b] P. Mockapetris. Domain names - implementation and specification.


RFC 1035, Internet Engineering Task Force, November 1987.

[NS05] Mark Nottingham and Robert Sayre. The atom syndication format.
Internet proposed standard RFC 4287, December 2005.

[Pos81] J. Postel. Transmission control protocol. RFC 793, Internet Engineering


Task Force, September 1981.

[Pos94] J. Postel. Media type registration procedure. RFC 1590, Internet En-
gineering Task Force, March 1994.
76 Bibliography

[Ree79] Trygve M. H. Reenskaug. Models - views - controllers, December 1979.


heim.ifi.uio.no/˜trygver/1979/mvc-2/1979-12-MVC.pdf.

[RMCW07] Arthur Ryman, Jean-Jacques Moreau, Roberto Chinnici, and San-


jiva Weerawarana. Web services description language (WSDL) version
2.0 part 1: Core language. W3C recommendation, W3C, June 2007.
http://www.w3.org/TR/2007/REC-wsdl20-20070626.

[SAPa] SAP AG. Business Object CATimeSheetManager Documentation. SAP


AG. Available on mySAP systems through transaction BAPI.

[SAPb] SAP AG. Business Object CATimeSheetRecord Documentation. SAP


AG. Available on mySAP systems through transaction BAPI.

[SAPc] SAP AG. Business Object CostCenter Documentation. SAP AG. Avail-
able on mySAP systems through transaction BAPI.

[SAPd] SAP AG. Business Object Employee Documentation. SAP AG. Avail-
able on mySAP systems through transaction BAPI.

[SAP01a] SAP AG. BAPI Programming Guide (CA-BFA), 2001.

[SAP01b] SAP AG. Cross-Application Time Sheet (CA-TS), 2001.

[SP06] Andy Seaborne and Eric Prud’hommeaux. SPARQL query lan-


guage for RDF. W3C working draft, W3C, October 2006.
http://www.w3.org/TR/2006/WD-rdf-sparql-query-20061004/.

[SS83] Dale Skeen and Michael Stonebraker. A formal model of crash recovery
in a distributed system. IEEE Transactions on Software Engineering,
9(3):219–228, May 1983.

[Ste07] Thomas Steiner. Automatic multi language program library generation


for rest apis. Technical report, Google Inc., 2007.

[vK07] Anne van Kesteren. The XMLHttpRequest object. a WD in


last call, W3C, February 2007. http://www.w3.org/TR/2007/WD-
XMLHttpRequest-20070227/.

[W3C99a] World Wide Web Consortium. HTML 4.01 Specification, December


1999. http://www.w3.org/TR/html4/.

[W3C99b] World Wide Web Consortium. Resource Description Frame-


work (RDF) Model and Syntax Specification, February 1999.
statut : ? W3C Recommandation, errata REC-rdf-syntax-19990222 ?,
http://www.w3.org/TR/REC-rdf-syntax/.

[Win03] Dave Winer. RSS 2.0 specification, July 2003.

[XML98] Extensible Markup Language (XMLTM ), February 1998. XML 1.0, W3C
Recommendation, http://www.w3.org/XML/.

[XML01] XML schema part 1: Structures, W3C recommendation, May 2001.