Beruflich Dokumente
Kultur Dokumente
WebData
Definition of a Middleware for Exposing and Accessing
Object-oriented Domain Models as Web Resources
Jan Schulz-Hofen
Supervisors
Shel Finkelstein, SAP Research, Palo Alto, U.S.A.
Prof. Mathias Weske, Hasso-Plattner-Institute, Potsdam, Germany
25 September 2007
Abstract
The term ’Representational State Transfer’ (REST) has gained a lot of attention
over the past years. However, a lot of people still consider it mainly as a lightweight
approach to what has been known as Web Services in the past. REST is in fact
an architectural style for application interaction which induces a focus shift from
behavior (i.e. services) to state (i.e. data).
”WebData” describes a framework which leverages REST as an architectural style
and HTTP as a protocol to allow (a) exposure of business objects in arbitrary domain
models as resources on the World Wide Web and (b) integration of Web resources
into arbitrary application environments for access through an object-oriented API.
Zusammenfassung
Obwohl der Begriff ’Representational State Transfer’ (REST) in den letzten Jahren
einige Aufmerksamkeit erlangt hat, wird REST vielerorts immernoch hauptsächlich
als leichtgewichtiger Ansatz für das gehalten, was in der Vergangenheit unter dem
Namen Web services firmierte. REST ist allerdings vielmehr ein Softwarearchitek-
turstil, gedacht für die Kommunikation und Interaktion zwischen Anwendungen und
Systemen. REST leitet darüberhinaus einen Paradigmenwechsel ein, der darauf
abzielt, den Fokus weg von reinen Funktionsaufrufen und hin zu Daten in verteilten
Systemen zu lenken.
”WebData” definiert eine Middleware, die sich die Konzepte von REST im Sinne
eines Architekturstils und HTTP als Protokol zu Nutze macht, um (a) Business-
Objekte aus beliebigen Anwendungsdomänen als Ressourcen im World Wide Web
verfügbar zu machen und (b) den Zugriff auf solche Resourcen in beliebigen Anwen-
dungsumgebungen über eine objektorientierte Schnittstelle zu ermöglichen.
Ich erkläre hiermit, dass ich die vorliegende Arbeit selbständig verfasst und keine
anderen als die angegebenen Quellen und Hilfsmittel verwendet habe.
Berlin, den 25. September 2007
Acknowledgments
The research work presented in this thesis has been carried out from August 2006
to March 2007 at the research centers of SAP Labs in Montréal, Canada and Palo
Alto, U.S.A. While being a student at Hasso-Plattner-Institute (HPI) the author
was employed as a research assistant by SAP Research and worked within the Ad-
vanced Web Technologies team for his Master’s thesis.
I would like to thank Shel Finkelstein (SAP) for many fruitful discussions and very
inspiring collaboration. I would also like to thank Mathias Weske and the members
of his Business Process Technology chair at HPI, especially Hagen Overdick, for
their continuous advice. Furthermore, I would like to thank Cedric Ulmer (SAP)
and Bernd Schäufele (HPI) for their valuable feedback during the application of
my work in their research projects, Nolwen Mahé and Anne Hardy (both SAP) for
steadily supporting me during my time at SAP Research, and Gero Decker, Johannes
Nicolai, and Volker Gersabeck (all HPI) for their comments on my thesis. Thanks
goes also to James Hogg et Monsieur Brian Bauer for spell checking and general
corrections.
Contents
1 Introduction 1
1.1 Web services and service-oriented architectures . . . . . . . . . . . . . 2
1.2 Web resources and representational state transfer (REST) . . . . . . 5
2 Preliminaries 9
2.1 Object-oriented domain models . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 The Active Record pattern . . . . . . . . . . . . . . . . . . . . 11
2.2 The Hypertext Transfer Protocol (HTTP) . . . . . . . . . . . . . . . 12
2.2.1 Uniform Resource Identifiers . . . . . . . . . . . . . . . . . . . 13
2.2.2 Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.3 Request methods . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.4 Headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.5 Status codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.6 Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3 Atom Publishing Protocol (Atompub) . . . . . . . . . . . . . . . . . . 19
2.3.1 Resources and representations . . . . . . . . . . . . . . . . . . 20
2.3.2 Operation semantics . . . . . . . . . . . . . . . . . . . . . . . 21
3 WebData 23
3.1 URI references for entities in object-oriented domain models . . . . . 24
3.2 Representation types for Web resources exposing entities in a domain
model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.1 Collection representations . . . . . . . . . . . . . . . . . . . . 28
3.2.2 Member representations . . . . . . . . . . . . . . . . . . . . . 30
3.2.3 Value representations . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.4 Schema representations . . . . . . . . . . . . . . . . . . . . . . 34
viii Contents
Bibliography 73
List of Figures
4.1 SAP
R
transaction CATSXT for time entry . . . . . . . . . . . . . . . 64
4.2 CATS BAPI
R
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.3 CATS widget system architecture . . . . . . . . . . . . . . . . . . . . 66
TM
4.4 Domain model for time entry using CATS in mySAP . . . . . . . . 67
4.5 CATS widget screenshot . . . . . . . . . . . . . . . . . . . . . . . . . 68
Listings
Architectural Style
SOA is meant to be used for interoperation between different software systems which
can potentially be operated by entirely different organizations. A commonly referred
use-case is business-to-business (B2B) interaction. But also for integration of func-
tionality performed by existing systems within the same organization (i.e. Enterprise
Application Integration, EAI), the SOA style is promoted by a large number of pro-
fessional individuals and companies.
According to the SOA style, interfaces between different and potentially heteroge-
neous components should be designed as coarse-grained, loosely coupled and highly
interoperable functions, i.e. services [CHET03]. SOA is independent from concrete
service protocols and can theoretically be implemented using techniques such as
CORBA, SOAP WebServices, RMI, etc.
The concept of calling procedures remotely dates back to 1984 [BN84] and defines
the very foundation of current Web services: a procedure (i.e. method, function) is
called on a remote machine using networking techniques. In order to perform the
call, the calling system can provide several input parameters. The called system
will perform actions depending on received parameters and potentially the state of
other systems (i.e. “world-state” ) and may consequently return a number of output
parameters and invoke changes to the state of other systems (i.e. “side effects”).
Dynamic Invocation
According to Burbeck [Bur00], a main aspect of service-oriented architectures is the
description and organization of services such that they can be discovered and used
dynamically and in a semi-automated or even entirely automated manner. Service
description is realized in dedicated documents which usually follow a standardized
scheme (e.g. WSDL [CCMW01]). Thus, the same type of service could theoretically
be carried out by a number of different providers which should be entirely trans-
parent for the invoking entity. Organization of services is to be done in hierarchical
categories or taxonomies which are managed by dedicated entities within an SOA
landscape, called brokers.
Consequently, any entity within an SOA can play one of the following roles: service
requestor, service provider, or service broker. Figure 1.1 (commonly also referenced
to as the “SOA triangle”) illustrates these basic types of entities and their interac-
tions.
Service providers register their services with the broker in order to be discoverable by
requestors. Subsequently, requestors can query services and would acquire a service
4 1. Introduction
Link passing
Web services do not offer a mechanism for passing data by reference. Subsequent
actions have to work on values which can quickly become outdated or be very large
and expensive in terms of network transmission.
Caching
Caching can only be done on the service provider’s side. It has to be carried out
behind the scenes of the service interface and with knowledge of the service’s be-
havior. External caches on the client-side or within the transport infrastructure
would very likely break a service as functionality cannot be stored or imitated by
the cache without concrete knowledge of side-effects which service invocations may
have. Even though many service invocations which perform mere read operations
could be cached, the lack of protocol level information on side-effects prevents their
environment from performing the action effectively.3 As a consequence, network
outages or problems at the provider’s side cause definite service unavailability unless
redundant providers are defined (and consistency is taken care of).
suggestions for definition and tailoring of services, a majority of Web service imple-
mentations still depend on the basic RPC principle which dates back to 1984 [BN84],
which was long before object orientation had become widely accepted.
Nevertheless, most service requestor and provider implementations will very likely
use object orientation and will have their native application data organized in an
object-oriented (or at least entity-relationship model [Che76] based) structure.
As a consequence, it is commonplace to strip down those data objects or records to
their atomic values and reorganize them according to the signature defined by the
service description document. Usually, this leads to replication of (subsets of) an
application’s data schema and metadata in several places throughout the application,
which opens the floodgates to all types of malfunction due to inconsistencies.
Resources
According to Fielding, a resource can be any piece of information which can have a
name. Given examples are documents like hypertext or images, but also temporal
services (e.g. “today’s weather in San Francisco”), real-world objects (e.g. a person),
or concepts. A resource can also be a collection of other resources. A resource name
does not imply the existence of that particular resource. However, if the resource
exists, other entities may be allowed to interact with it. Resources are thus active
components which have behavior and state. They expose interaction mechanisms to
allow for state transition and state representation.
On the Web, resources are identified by Uniform Resource Identifiers (URIs, [BLFM05])
which comprise the more commonly known Uniform Resource Locators (URLs). For
example, a valid (and suitable) resource identifier for the resource Jan Schulz-Hofen
could be http://jpgmag.com/people/yeah while http://jpgmag.com/people would
be the identifier for the collection of person resources.
Representations
While the resource itself is the specific concept identified by its name (cf. “to-
day’s weather in San Francisco” ), a representation is an actual (“tangible”) but
volatile piece of data describing the state of a resource or parts of it. Furthermore,
6 1. Introduction
Uniform Interface
As opposed to SOAP Web services where every port type can define a number of
custom operations [CCMW01], resources in a REST style landscape share a common
and well-defined limited set of operations and returning status information which
compose their interface. Entities wishing to interact with a resource are required to
use one of these operations. A type of operation can be (a) safe, which means that it
does not have an effect on the resource, (b) idempotent, meaning that an operation
being carried out once has an identical effect on the resource as it would have if
carried out more often, or (c) neither (a) nor (b). For some operations it is allowed
to supply a representation which describes the intended state of the resource after
transfer. In most cases, a response will include a representation as well. Returning
status information can encompass information on whether the operation succeeded
or failed, on reasons and responsibility for failure, and on consequences.
The HTTP protocol, for instance, defines eight different operations, their proper-
ties and semantics, as well as a set of 41 status codes classifying returning status
information. Four operations are commonly known and will be explained in 2.2.3 .
Those operations are: GET, POST, PUT, and DELETE. Relevant status codes for this
work will be explained in 2.2.5.
Stateless Communication
While resources have a certain state and potentially change state over time, all
interactions with a resource are stateless: session state (if any) is not maintained
by the resource, but rather by the entities interacting with it. As a consequence,
each request can be understood by itself and without knowledge of the preceding
operations. This entails the requirement for representations to include all necessary
data for the intended interaction.
Protagonists from industry and academia are sometimes referring to SOA and REST
as two entirely different and competing concepts these days. However, after closer
4
Some ego information, as of January 2007 to be precise.
5
This URI identifies a photograph which has been taken by Jan Schulz-Hofen.
1.2. Web resources and representational state transfer (REST) 7
examination, one could conclude that REST fits into the larger scheme of SOA quite
nicely: It could be argued that resources as they are defined in REST are indeed
service providers which have a name, propose a number of services (i.e. operations),
expect parameters to operate on (i.e. representations) and return resulting data
(again representations). Reconsidering the principles of REST, the limitation to
a defined set of generally interpretable operations including an agreement on their
semantics and the consequent use of URI references throughout representations can
be considered a restriction to SOA. This makes REST systems a more concrete
subset of the vast number of service-oriented landscapes, but does not contradict its
fundamental idea.
Hence, we believe that the latter restrictions lead to clearer understanding of and
simpler interaction with providers, operations, and messages which makes REST a
promising new approach to service-orientation6 . This is underpinned by the fact
that large companies such as Amazon7 , eBay8 and Google9 are already providing
REST interfaces to their data. Moreover, architecting systems according to REST
concepts induces a change in thought: while traditional Web services tailoring is
merely centered around functionality, the definition of REST resources, operations
and representations involves considerations about structured data access, state of
system components and their relations with each other. While in traditional SOA
landscapes many services only consist in data access, a lot of business functionality
can be refactored using a restricted set of operations and a well-tailored definition of
resources and relations [Cra07]. In consequence, a REST style SOA would propose
openly accessible, semantically well-defined but restricted data-driven interfaces to
functionality. This contrasts to traditional SOA landscapes where a wealth of differ-
ent services with heterogeneous parameter structures are sometimes rendering access
to encapsulated data complicated and error-prone.
6
Interestingly, in recent days one was able to observe a number of applications which offer their
services using the HTTP protocol and expose their functions as URI endpoints while misleadingly
calling this REST. Those interfaces have been entitled “Service-trampled REST” by Duncan Cragg
[Cra06].
7
http://docs.amazonwebservices.com/AmazonS3/2006-03-01/RESTAPI.html
8
http://developer.ebay.com/developercenter/rest/
9
http://code.google.com/apis/
2. Preliminaries
This chapter describes the technologies and concepts which the work presented in
this thesis is built upon or which are leveraged and enhanced to achieve the described
goals. Those are oject-oriented domain models, the Hypertext Transfer Protocol, and
the Atom Publishing Protocol.
At its worst business logic can be very complex. Rules and logic de-
scribe many different cases and slants of behavior, and it’s this complex-
ity that objects were designed to work with. A Domain Model creates a
web of interconnected objects, where each object represents some mean-
ingful individual, whether as large as a corporation or as small as a single
line on an order form. [Fow02, p. 116]
In a domain model, every aspect of the application domain which is relevant to the
application is modeled and represented close to reality. Moreover, domain models
are meant to be as independent as possible from the actual software system they
are used in. In terms of object-orientation [DN66], one will usually find a class
for every type of entity in the domain and instances of that class are meant to
represent concrete incarnations of that type. The task of those domain objects is to
capture aspects of the real world entities and support the application with its task
to handle (i.e. manage, store, interact with) them. Following concepts of object-
orientation, those classes will specify attributes for themselves (class attributes) and
10 2. Preliminaries
for their instances (instance attributes) which allow to store data which is related to
the respective entity. Furthermore, they will define methods (again on class and on
instance level) which represent behavior that the corresponding entities can perform.
Hence, a class definition ties together process and data structure which belong to
conceptually related entities. Object instances of the same class share the same set of
attributes and behavior – they will, however, have different values and state during
their life cycle. Another important characteristic for classes in a domain model (and
their instances) are associations. Classes define associations to other classes which
are usually named according to the role they play in the context of the associated
classes and quantified according to their possible multiplicity.
Figure 2.1 illustrates a sample domain model in a UML class diagram.
In figure 2.1, three classes of objects are defining a domain model: customer, order
and product. The classes define attributes for their respective object instances (e.g.
firstname, lastname, etc.). Furthermore, the domain model defines a behavior called
calculate total price which is bound to order instances, meaning that its execution
will only make sense in the context of a concrete order object instance. The following
basic distinction between method types can be made when investigating domain
models: business methods and accessors. Business methods define behavior which
is carried out within the scope of an object or class and which realizes some sort
of business functionality or process. Business methods are explicitly disregarded for
the work in this thesis and the author’s belief is that many of this functionality
can be refactored [Cra07] into standard behavior of entities in domain models (see
below). Accessors, however, are relevant to the work presented here. Accessors are
methods which are exposing read (i.e. getters) and write (i.e. setters) functionality
to attributes defined within an object instance. One should differentiate between
basic accessors which are exclusively performing the described behavior of read and
write and augmented accessors which may perform value-based transformations or
calculations before or after attribute access. Augmented accessors are also used to
realize virtual attributes. Virtual attributes are attributes which do not actually exist
on the respective instance. However, accessors are in place and perform a certain
behavior which emulates the existence of the respective attribute (e.g. while working
on other attributes internally). Ideally, calculate total price would be refactored to
a virtual attribute called total price that would be available through an accessor
which sums up the prices of all associated products and thus returns a total price.
In figure 2.1, the classes customer and order and order and product are associated,
which is expressed using an edge. Cardinalities are annotated and express that (a1)
one customer object can have associations to none or multiple order objects, (a2) an
order object has exactly one association to a customer object, (b1) an order object
2.1. Object-oriented domain models 11
can have associations to none or multiple product objects and (b2) a product object
can have associations to none or multiple orders. Note that the domain model shown
in figure 2.1 will serve as an example case for many concepts presented in chapter 3.
As mentioned before, one main use for domain models is its capability of providing an
interface to persistent storage. In many cases, entities in domain models can easily be
mapped to database structures where the Active Record pattern (see 2.1.1 below) can
be used to deal with mapping in an automated manner. Fowler differentiates between
simple and rich domain models, where simple models are basically very similar to the
database design and mapping is straightforward. Simple models encompass classes
and their instances, attributes, behavior and associations. Rich domain models
bring inheritance and strategies to the table as well as a non-trivial mapping to
actual entities in database design.
The work described in this thesis relies on simple domain models which exclude
inheritance and strategies and focus more on a straight-forward database mapping
mechanism, such as the Active Record pattern.
An object carries both data and behavior. Much of this data is persis-
tent and needs to be stored in a database. Active Record uses the most
obvious approach, putting data access logic in the domain object. This
way all people know how to read and write to and from the database.
[Fow02, p. 160]
The obvious mapping which is the essence for Active Record is that of database
concepts to object-oriented principles: a table maps to a class, a row maps to an
instance, a column maps to an instance attribute, a foreign key relationship maps
to an association. Figure 2.2 illustrates the concept.
In order to realize the mapping in a generic and automated way for a domain model,
Active Record basically adds the following behavior to classes and instances:
12 2. Preliminaries
2. Creation of a new domain model object instance which will be able to carry
data to be mapped to a new record in persistent storage. Realized through
create behavior on class level.
3. Storage of data carried in the domain model object instance into the corre-
sponding cells of the database. Realized through save behavior on instance
level.
HTTP is an application layer protocol which acts on top of the Transmission Control
Protocol (TCP, [Pos81]) according to the OSI Reference Model [ISO84]. HTTP
follows the traditional client-server pattern, where servers usually run on dedicated
server machines serving documents which describe the hosted resources. HTTP
1
When mentioning HTTP is this document, the author refers to HTTP/1.1 as defined in
[FGM+ 99].
2.2. The Hypertext Transfer Protocol (HTTP) 13
usually uses TCP port 80 for unencrypted communication and 443 for Transport
Layer Security (TLS, [DR06]) encrypted communication respectively. Clients will
send requests to servers using a unique address which specifies the concrete machine
and port and the concrete resource or resources. Clients can transmit messages with
a request and will receive a response from the server which may again contain a
message. Every HTTP communication is triggered by the client. Servers can not
contact or notify clients2 . Proxy servers may cache HTTP messages in order to
attain improved performance or to overcome outages.
In HTTP, scheme can have the values "http" and "https" which refer to HTTP and
HTTP over TSL respectively. authority is used to locate a specific machine on the
Web and can thus be an IP address or any resolvable name [Moc87a], [Moc87b] and
optionally authentication credentials and port information. path-abempty can be empty
or a hierarchical path expression to identify a specific resource on the machine. query
can be used to further identify a resource using non-hierarchical data and fragment is
a means of indirect identification of a secondary resource (e.g. subset of the primary
resource, specific view on representations of the primary resource).
2.2.2 Messages
HTTP messages are used for client to server (request) and server to client (response)
communication.
A request message contains at least a request line which specifies the request method,
identifies the resource (using the path-abempty [ "?" query ] [ "#" fragment ] part of the
URI, [BLFM05]) and the used HTTP version (i.e. "HTTP/1.1"). Furthermore, it can
contain an entity (i.e. the representation of the resource). Both message and entity
can contain a number of headers which describe either the request itself or the entity
body. Listing 2.2 shows a simplified example for a request message.
Listing 2.2: Request message
GET / people / yeah HTTP /1.1
Accept : text / html
2
“Asynchronous” behavior however, can be emulated using threading and continuous polling on
the client (e.g. XMLHttpRequest [vK07] , sometimes imprecisely referred to as “Ajax”).
3
Refer to [BLFM05, p. 16] for more detailed information on URI grammar.
4
Unless otherwise stated, URI schemes are defined in Augmented Backus-Naur Form (ABNF,
[CO97]) throughout this thesis.
14 2. Preliminaries
Each request yields a server response which is expressed using a response message.
A response can be composed of a status line, an entity body and a number of headers
which describe either the request itself or the entity body. A status line is composed
of the HTTP version the server operates with, a status code and a reason phrase.
The status phrase is a human-readable phrase which is used to briefly explain the
status code. Listing 2.3 shows a simplified example for a response message.
POST is used to request a state change on the server. A POST operation can contain
a representation. The semantics of POST are that the represented resource will
be stored and appended as a subordinate of the resource which is referenced by
the URI that the POST operation is directed to. It is very likely that the resource
identified by the latter URI is in fact a collection of resources. Usually (but
not necessarily), the URI of the newly created resource and its representation
will be advertised back to the initiator of the operation. POST is neither safe
nor idempotent.
PUT operations request that either an existing resource referenced by the request
URI is being updated or a non-existent being created according to the enclosed
representation. Consequently, PUT is idempotent, but not safe.
DELETE requests that the resource referenced by the given URI is removed. The
requesting entity is not expected to send a representation along. As DELETE
yields the removal of the resource, it is not safe, but idempotent.
2.2. The Hypertext Transfer Protocol (HTTP) 15
HEAD has an identical behavior as compared to the GET method except that no
entity body (i.e. no resource representation) is returned by the server. It is
consequently safe and idempotent as well.
CONNECT has been reserved in the HTTP specification for use with proxies
which are able to switch to tunneling mode [Luo98] dynamically.
2.2.4 Headers
HTTP specifies a number of headers which can be used in either request or response
messages or both. An HTTP header is expressed as a single line containing a colon-
separated key value pair as shown in listing 2.4. The following paragraph will briefly
mention those headers which are relevant to the work presented in this thesis. Their
applicability to message types (i.e. request, response or both) is denoted in brackets.
Accept (request) can be used to specify a number of media types or ranges [FGM+ 99,
3.7] which are acceptable as responses and their respective options. HTTP
specifies a sophisticated tuning mechanism which allows clients to gradually
specify their preference. As a basis for the work presented in this thesis, it
is sufficient to understand that the Accept header can be used to specify the
desired media type.
Content-Type (request, response) is used to specify the media type of the entity
enclosed with the message.
Authorization (request) is used to authenticate the client towards the server. The
field value will then carry the user’s credentials which the server uses to de-
termine their validity. Credentials can be expressed in a number of different
encryptions standards [FHBH+ 99].
304 Not Modified indicates that the resource has not been modified according to
the version or timestamp the client has used in its conditional request.
401 Unauthorized indicates that the request can not be performed if the client
does not authenticate with the resource.
403 Forbidden indicates that the request is not allowed. This reason can either
be that the request is generally forbidden or forbidden for the currently au-
thenticated client.
404 Not Found indicates that no resource can be located using the given request
URI.
405 Method Not Allowed indicates that the resource is available but does not
allow the used request method.
406 Not Acceptable indicates that the resource can not supply representations
in a content-type which the client can accept (and mentioned using a Accept:
header).
409 Conflict indicates that the requested operation could not be performed be-
cause the client expected the resource to be in a state different from its current
one. Refer to 3.6.1.
500 Internal Server Error indicates that the request did not succeed for a rea-
son which the client can not account for. The response should include an
explanation of the cause for the failure if possible.
2.2.6 Caching
One of the main features of HTTP/1.1 is its precise specification of caching func-
tionality, control, and algorithms. It entails one of its main advantages over Web
service based approaches to system interoperability, because of the fact that (multi-
ple) caches can be established on the line between clients and servers.
The semantics defined by the uniform interface and the caching directives in HTTP
headers allow far more flexibility. As other approaches would have to specify caching
semantics themselves and negotiate them with clients, service providers are likely
to limit themselves to perform caching behind the invocation boundary (i.e. the
interface) of a service. Obviously, this results in more load on the provider’s side
and reduced dependability due to the fact that this architecture imposes the single
point of failure problem.
Figure 2.4 illustrates the basic architecture of HTTP servers and proxies and denotes
that (a) caches can be on the line between clients and servers, (b) clients can interact
with caches or servers directly while using an identical uniform interface, (c) caches
can interact with original servers or cascaded caches, and (d) clients can keep their
own cache of representations in order to allow performant local operations.
Caching is usually carried out by retrieving response messages from origin servers
and by storing them for future communication with clients. The goal of caching in
HTTP is to reduce the number of network roundtrips and bandwidth requirements
while maintaining a high level of semantic transparency for end users and client
applications.
18 2. Preliminaries
Semantic transparency
The term semantic transparency refers to the ideal cache behavior in which end
users and applications do not perceive any semantic difference in interactions with
remote resources due to caching. However, caching entails a number of problems
regarding detection of validity of cached data and – in case of modifying operations
– concurrent access and conflicts.
In general, semantic transparency is inversely proportional to the gain in perfor-
mance that caching results in. In respect to that, HTTP allows for relaxated trans-
parency and defines mechanisms to establish an equilibrium between performance
and semantic transparency. Relaxation can be requested or denied by end users and
origin servers and warnings are defined by the protocol in order to notify users and
client applications about relaxed transparency.
Expiration model
One mechanism for relaxed transparency is the expiration model which is defined
in HTTP: Usually, a cached response message is expired if the origin server would
return a different response at the moment it would receive an identical request.
Ideally, caches would know when this moment occurs in order to acquire a fresh
response message to store. HTTP defines an expiration model (refer to [FGM+ 99,
13.2] for full specification) which attempts to approximate this behavior by a number
of different mechanisms. Expiry detection can either be based on different client
side calculations and heuristics or on server-specified predictions and annotation of
messages using the Expires: (cf. 2.2.4) header.
Validation model
Technically, a client application or cache which stores representations according to
the expiration model, would have to refresh its cached messages after they have been
considered expired. However, the calculated or predicted expiration of a message
does not imply that the actual representation has indeed become invalid. Thus, a
reload of the entire representation could yield unnecessary network usage in some
cases.
It seems pertinent for a client or cache to check with the respective origin server
whether or not a fresh representation should be acquired before the actual transfer
is started. HTTP defines conditional methods and validators to combine those two
actions into one. Both are expressed using special headers mentioned in 2.2.4. The
2.3. Atom Publishing Protocol (Atompub) 19
Caching in HTTP is not possible or pertinent in all cases. It is actually not possible
for all operations which have side effects. However, the clearly defined semantics
regarding side-effects (cf. safeness and idempotence in 2.2.3) in HTTP allow for
accurate and automated decision on whether caching is appropriate or not.
Moreover, in some cases it is required that even operations with side-effects are car-
ried out on locally stored or cached data and that modifications are then written back
to origin servers at a later point in time. This is specifically useful for disconnected
clients (e.g. on mobile devices) and applications which propose a large amount of
modification options to users where synchronous modifications would yield an unac-
ceptable decrease in performance. In fact, those modifications immediately lead to
the Lost Update Problem described in [FL99]. While the validation model is designed
to help detect these kinds of problems, section 3.6.1 will discuss a similar approach
to the problem which has been chosen for the work presented in this thesis.
feed format for representing and a protocol for editing Web resources
such as Weblogs, online journals, Wikis, and similar content. The feed
format enables syndication; that is, provision of a channel of information
by representing multiple resources in a single document. The editing
protocol enables agents to interact with resources by nominating a way
of using existing Web standards in a pattern. [HB07]
5
At the time of writing, the Atompub protocol specification is still in the RFC Editor Queue.
The work in this thesis is based on draft 17 which has been submitted on Jul 11, 2007. The parts of
this thesis which are based on Atompub may become incoherent with future versions of Atompub.
However, the author believes that the concepts remain valid and changes in Atompub may be easily
transferred onto the work presented here.
20 2. Preliminaries
Atompub messages are based on the Atom Syndication Format (Atom, [NS05]) which
it employs as standard data format for representations.
While this section introduces the Atompub and the parts which are relevant to the
work presented in this thesis, section 3 describes how concepts of Atompub are
interpreted, extended, mapped to other serialization formats, and applied to more
generic data.
Collections
A collection resource can be associated with one or more member resources. Thus, a
collection document mentions a number of URIs referencing member resources and
may give a brief or full representation of each of them. A collection resource can be
interpreted as the set of member resources it is associated with.
Collection documents are serialized as XML [XML98] files following Atom. The
root node of this serialization is atom:feed6 and must contain at least the nodes atom:
id mentioning a unique identifier for the collection, atom:title mentioning its title in
a human-readable language, and atom:updated mentioning the point in time when the
resource had its last significant update. A feed can contain one or more atom:entry
nodes which represent member resources belonging to this collection.
An atom:entrynode in a collection document must –again– contain at least the fields
atom:id, atom:title, and atom:updated containing information as outlined above but
describing the entry instead of the feed respectively. Usually, entry representations
within a collection representation only list the non-optional properties and a URI
reference which points to the entry resource itself, such that a full representation
can be retrieved using a separate GET statement.
Listing 2.5 shows a sample collection document which is minimal with respect to
required and optional nodes as specified by Atompub and Atom
Listing 2.5: Collection document
< feed xmlns =" http :// www . w3 . org /2005/ Atom " >
< title > A feed of complete nonsense </ title >
<id > urn : uuid :43 gadc91 -7624 </ id >
< updated >2004 -01 -14 T19 :31:03 Z </ updated >
< entry >
< title > Amok - Powered Robots Run Atom </ title >
< link href =" http :// example . org / articles / atom04 "/ >
<id > urn : uuid :9 fa59bc1 -75 d4 </ id >
< updated >2004 -01 -14 T19 :31:03 Z </ updated >
</ entry >
< entry >
< title > Atom - Powered Robots Run Amok </ title >
< link href =" http :// example . org / articles / atom03 "/ >
<id > urn : uuid :1225 c695 - cfb8 </ id >
< updated >2003 -12 -13 T18 :30:02 Z </ updated >
</ entry >
</ feed >
6
XML elements and attributes which are defined for Atom in the http://www.w3.org/2005/Atom
namespace are denoted with the prefix atom:
2.3. Atom Publishing Protocol (Atompub) 21
Members
A member resource can belong to one or more collections and thus be referenced by
them. It stands for a single entity of information, such as a news article or a blog
entry.
Member documents are serialized as XML files following Atom. The root node is
atom:entry. It must contain at least the fields atom:id, atom:title, and atom:updated.
An entry can contain a atom:content node which should be used to transport the ac-
tual content information that this entry represents. It can contain XHTML [IM07],
foreign markup in a different XML namespace [HTBL06] or just plain text. Fur-
thermore, Atom member representations can contain several links which have to be
denoted as atom:link elements. A link must have a atom:href attribute mentioning an
URI identifying the linked resource. Furthermore, links can have a atom:rel attribute
which defines the link relation type, i.e. how the respective resource is related to
the linked resource. Values for the rel attribute are defined in [NS05, 4.2.7.2] and
[GdH06, 11]. The value range can be extended. Values which are used throughout
this thesis are related meaning that the linked resource is related, self meaning that
the linked resource is equivalent to the current resource, edit meaning that the linked
resource is an editable equivalent of the current one and that this URI must be used
for editing (i.e. using a PUT request).
Listing 2.6 shows a sample member document containing a plain text content node
and links as specified by Atompub and Atom.
Listing 2.6: Member document
< entry xmlns =" http :// www . w3 . org /2005/ Atom " >
< title > Atom - Powered Robots Run Amok </ title >
< link href =" http :// example . org / articles / atom03 " rel =" self "/ >
< link href =" http :// example . org / articles / edit / atom03 " rel =" edit "/ >
<id > urn : uuid :1225 c695 - cfb8 </ id >
< updated >2003 -12 -13 T18 :30:02 Z </ updated >
< content type =" text / plain " >
Lorem ipsum quod eros cu pro .
Vel accumsan invenire appellantur eu .
At insolens efficiendi conclusionemque eos ,
ad amet splendide democritum per .
</ content >
</ entry >
the response will contain a Location: header (as defined in HTTP, refer to 2.2.4) men-
tioning the URI which the server assigned to the new resource for future reference
and a status code of 201 Created (cf. 2.2.5).
In combination with member resources, Atompub defines semantics for GET, PUT, and
DELETE. A GET request results in a message which represents the member: an entry
document. A PUT request must contain an enclosed entry document and expects the
specified resource to alter accordingly. As with creation of resources, a successful
PUT request will return a response enclosing a representation of the modified resource
according to its current state. A DELETE request does not contain a representation
and requires the resource to be removed.
3. WebData
This chapter defines WebData, a middleware for exposing and accessing object-
oriented application data as Web resources. WebData is the central part of the
thesis. Its concept, design, and prototypical implementation are the author’s con-
tribution. The WebData middleware is split into two main types of components:
server connectors, which expose object-oriented application data as Web resources,
and client connectors, which access Web resources and provide an object-oriented
programming interface to interact with them. The following is a specification rather
than the documentation of a concrete software component or a product.
Both server side and client side connectors are defined as generic software compo-
nents, which means that they do not depend on a specific application domain with
respect to resource types. Furthermore, the server connector specification is inde-
pendent from a programming language and from the underlying framework realizing
the domain model. Concrete server connector implementations will require that
(a) the domain model is made available to the connector using an object-oriented
programming interface and (b) there are means for the connector to gather infor-
mation (e.g. using reflection) on the actual structure of application data within the
domain model in terms of classes, instances, methods, attributes, and associations.
Likewise, the client connector specification is independent from the programming
language and technology in which implementations are realized. Concrete imple-
mentations, however, will only be useful if the embedding software components are
built upon object-oriented concepts. Client and server connectors which are real-
ized in different programming languages or are embedded in technologically different
software systems are compatible and can interoperate if they are respecting the spec-
ification. Implementations of server and client connectors that contain modifications
or enhancements must respect this specification in a way that they can interoperate
with their respective counterparts as specified even if the latter are not part of the
implementation in question.
Figure 3.1 illustrates a sample architecture where client and server connectors are
employed on different tiers within multiple layers. In this example, the domain
model which could be made available using an object-relational mapper implement-
ing the Active Record pattern (cf. 2.1.1) is accessed by the server connector using
24 3. WebData
Sections in this chapter are structured as follows: For each aspect of WebData, ba-
sic considerations and a description of the problem or requirement is given. Where
applicable, different approaches are discussed and a final solution is specified. Those
aspects are: URI references for entities in object-oriented domain models, Repre-
sentation types for Web resources exposing entities in a domain model, Exposure of
object-oriented domain models as Web resources, Object representation and mapping
for REST Resources, Request-based content negotiation for representation formats,
and Concurrency and transactional behavior.
Where pertinent, implementation alternatives and interoperability with existing
components in neighboring layers of the overall architecture are discussed.
Figure 3.2: Entities in an object-oriented domain model and their mapping to re-
source types
Figure 3.2 reveals the following facts about resource types and domain models:
1. A class maps to a collection resource referencing all its existing instances. This
type of resource is more specifically referred to as class collection resource.
2. An array or set of instances maps to a collection resource referencing all con-
tained instances. This type of resource is more specifically referred to as arbi-
trary collection resource.
3. An instance maps to a member resource.
4. An instance attribute which consists of an atomic value maps to a value re-
source.
5. An instance attribute which references an associated instance with respect to
the structure of the domain model maps to a member resource. This type of
resource is more specifically referred to as associated member resource.
26 3. WebData
Base URI
Class URI
1
Listing 3.1 refers to elements from [BLFM05] unless specified in this document.
3.1. URI references for entities in object-oriented domain models 27
Instance URI
If instancename1 is given it must be a pchar expression which is unique within the
given classname, prefix, and authority. It is recommended that instancename1 is the key
attribute value which is used within the domain model to identify instances (e.g. the
value of a primary column key in a database table row). A URI ending on instance1
is called instance URI. If an instance URI does not contain a prefetch expression it
references a member resource exposing the identified instance, otherwise it references
a collection resource as in 2 containing member resources exposing the identified
instance and its associated instances (see 3.3.1 and 3.4.2).
Attribute URI
If attributename is given it must be a pchar expression which is unique within the given
instancename1, classname, prefix,and authority. It is recommended that attributename is
the name of a valid attribute or association as defined by the class which is referred
to by classname. A URI ending on attribute references a value, member or associated
collection resource, according to 4, 5, 6 with respect to the structure of the domain
model. It is called attribute URI. The prefix "complete-" is used before a attributename
in order to receive a complete collection resource (see 3.3.1). The postfix expression
prefetch can be used to reference a collection containing associated instances (see
3.3.1) where prefetch_attributename must be a pchar expression as in attributename. An
attribute URI ending with a prefetch expression references a collection resource such
as described in 2.
Finder URI
If findername is given instead of instancename1 and following path components, it must
be a pchar expression which is unique within the given classname, prefix, and authority.
It is recommended that findername matches the name of a finder method (cf. 2.1.1)
within the class referenced by classname. A URI containing findername references a
collection resource containing the member resources which the finder method re-
turned, according to 2. In case the finder method expects parameters, they must be
passed as subsequent pairs of key and value within the query part (cf. 2.2.1) of the
URI. If the employed programming language supports named parameters, keys must
match the finder’s parameter names and values will be passed accordingly, otherwise
values will be passed in order of occurrence disregarding keys. Refer to 2.1.1 for more
information on finder methods. This type of URI is called finder URI.
Query URI
If path-abempty ends on classname and a query is given, it must be a free_query expression
according to the rules in listing 3.13 in section 3.3.2. Such a URI references a
collection resource as in 2 referencing the member resources which match the search
criteria expressed in free_query. Refer to 3.3.2 for more information on queries. This
type of URI is called query URI.
tation. The corresponding WebData Format root element for a collection represen-
tation is: atom:feed2 . The actual entities of information which must be included in a
collection representation are as follows:
Identifier Every collection representation must mention its identifier which has to
be globally unique. It is required that this identifier is a URI (cf. 2.2.1).
Corresponding WebData Format element: atom:id.
Title A title must be given for each collection and should be a human-readable text
introducing the purpose of this collection.
Corresponding WebData Format element: atom:title.
Listing 3.2 shows a sample collection representation serialized to Atom (i.e. an Atom
feed containing Atom entries) which is extended by a WebData Format schema (see
below) for the Order class as in 2.1.
Listing 3.2: Collection representation in extended Atom
1 < feed xmlns : wd =" http :// example . org / webdata / orders " xmlns =" http :// www . w3 . org
/2005/ Atom " >
2 <id > urn : example . org : webdata : customers :42: orders </ id >
3 < title > Orders for customer #42 </ title >
4 < updated >2007 -01 -16 T04 :12:26 Z </ updated >
5 < entry >
6 <id > urn : example . org : webdata : orders :0815 </ id >
7 < title > Order #0815 (#2 of orders for customer #42) </ title >
8 < updated >2007 -01 -16 T04 :12:26 Z </ updated >
9 < link href =" http :// example . org / webdata / customers /42/ orders /2" rel =" edit
"/ >
10 < link href =" http :// example . org / webdata / orders /0815" rel =" self "/ >
11 </ entry >
12 < entry >
13 <id > urn : example . org : webdata : orders :4711 </ id >
14 < title > Order #4711 (#3 of orders for customer #42) </ title >
15 < updated >2006 -02 -17 T01 :49:48 Z </ updated >
16 < link href =" http :// example . org / webdata / customers /42/ orders /3" rel =" edit
"/ >
17 < link href =" http :// example . org / webdata / orders /4711" rel =" self "/ >
18 </ entry >
19 </ feed >
2
XML elements and attributes which are defined in the http://www.w3.org/2005/Atom names-
pace are denoted with the prefix atom: while prefix wd: denotes elements and attributes from the
respective WebData namespace, refer to 3.2.4 Schema representations.
30 3. WebData
The described concepts are represented by the lines in listing 3.2 as follows: identifier
– line 2; title – line 3; update information – line 4; edit links for enclosed member
representations – lines 9,16; canonical links for enclosed member representations –
lines 10,17.
Listing 3.3 shows a collection representation serialized to JSON.
The described concepts are represented by the lines in listing 3.3 as follows: identifier
– line 2; title – line 3; update information – line 4; edit links for enclosed member rep-
resentations – lines 10,11,19,20; canonical links for enclosed member representations
– lines 12,13,21,22.
Identifier Every member representation must mention its identifier which has to
be globally unique. It is required that this identifier is a URI (cf. 2.2.1).
Corresponding WebData Format element: atom:id.
Title A title must be given for each member and should be a human-readable
text introducing the purpose of this member or (a combination of) its main
attribute(s).
Corresponding WebData Format element: atom:title.
3.2. Representation types for Web resources
exposing entities in a domain model 31
Edit link A member representation can contain an edit link (i.e. a URI). Edit
links will be used by clients to perform edit (i.e. PUT, DELETE, see 3.3.1) opera-
tions on a member resource. If no edit link is included in the representation,
the canonical link will be used for editing. WebData server-side connectors
should construct edit links following the definition of instance URIs or associ-
ated instance URIs (cf. 3.1), respectively. If the member resource is referenced
(either by a request to one of its associated instance URIs or by a collection
of associated instances) in the context of an associated instance, the member
representation must include an edit link being an associated instance URI. See
3.6.1 for more information on edit links in the context of optimistic concur-
rency. Clients must use the edit link for subsequent PUT and DELETE requests
against the represented resource.
Corresponding WebData Format element: atom:link with attributes atom:href
mentioning the URI and atom:rel with value edit.
Canonical link A member representation must contain a canonical link which must
be constructed according to the definition of instance URIs in 3.1.
Corresponding WebData Format element: atom:link with attributes atom:href
mentioning the URI and atom:rel with value self.
Associated resource links A member representation must include links to all as-
sociated resources according to the structure of associations of the currently
represented instance’s class in the domain model. If the represented instance
i has an association through attribute a to another instance or a collection
of instances, the associated resource link must be an attribute URI where
instancename1 references i and attributename references a. Furthermore, the infor-
mation entity representing the association has to reveal whether the referenced
resource is a collection or a member and the name of a as defined by the class
of i.
Corresponding WebData Format element: atom:link with attributes atom:href
mentioning the URI, atom:rel with value related , wd:type with value association
, atom:title mentioning the name of attribute a, and wd:cardinality with value *
for a referenced collection or 1 for a referenced member.
Value links A member representation must include links to all value resources
whose values are accessible through augmented accessor methods of the repre-
sented instance. If the represented instance i has augmented accessor methods
for the virtual attribute a the value link must be an attribute URI where
instancename1 references i and attributename references a. Furthermore, the in-
formation entity representing the value link has to reveal that the referenced
resource is a value and the name of a as defined by the class of i. (Refer to 2.1
to learn about virtual attributes and method types.)
Corresponding WebData Format element: atom:link with attributes atom:href
mentioning the URI, atom:rel with value related , wd:type with value accessor ,
and atom:title mentioning the name of attribute a.
32 3. WebData
Values A member representation must include value representations for all its value
resources whose values are accessible through standard accessor methods. (Re-
fer to 2.1 to learn about method types.)
Corresponding WebData Format elements as defined in ’Value representations’
enclosed in element atom:content with attribute atom:type having application/xml
as value.
Listing 3.4 shows a member representation serialized to Atom (i.e. an Atom entry)
which is extended by a WebData schema for the order class (see below).
Listing 3.4: Member representation in extended Atom
1 < entry xmlns : wd =" http :// example . org / webdata / orders " xmlns : xsd =" http :// www .
w3 . org /2001/ XMLSchema " xmlns =" http :// www . w3 . org /2005/ Atom " >
2 <id > urn : example . org : webdata : orders :4711 </ id >
3 < title > Order #4711 (#3 of orders for customer #42) </ title >
4 < updated >2007 -01 -16 T04 :12:26 Z </ updated >
5 < link href =" http :// example . org / webdata / customers /42/ orders /3" rel =" edit "/ >
6 < link href =" http :// example . org / webdata / orders /4711" rel =" self "/ >
7 < link href =" http :// example . org / webdata / orders /4711/ products " rel =" related "
title =" products " wd : type =" association " wd : cardinality ="*"/ >
8 < link href =" http :// example . org / webdata / orders /4711/ customer " rel =" related "
title =" customer " wd : type =" association " wd : cardinality ="1"/ >
9 < link href =" http :// example . org / webdata / orders /4711/ total_price " rel ="
related " title =" total - price " wd : type =" accessor "/ >
10 < content type =" application / xml " >
11 < wd : express - shipping wd : type =" xsd : boolean " > true </ wd : express - shipping >
12 < wd : gift - wrap wd : type =" xsd : boolean " > false </ wd : gift - wrap >
13 </ content >
14 </ entry >
The described concepts are represented by the lines in listing 3.4 as follows: identifier
– line 2; title – line 3; update information – line 4; edit link – line 5; canonical link
– line 6; associated resource link for a collection of associated instances – line 7;
associated resource link for a single associated instance – line 8; value link – line 9;
values – lines 11,12.
Listing 3.5 shows a sample member representation serialized to JSON.
Listing 3.5: Member representation in JSON
1 {
2 " id ": " urn : example . org : webdata : orders :4711" ,
3 " title ": " Order #4711 (#3 of orders for customer #42) " ,
4 " updated ": 2007 -01 -16 T04 :12:26 Z ,
5 " express_shipping ": true ,
6 " gift_wrap ": false ,
7 " links ": [
8 { " type ": " edit " ,
9 " uri ": " http :// example . org / webdata / customers /42/ orders /3" } ,
10 { " type ": " canonical " ,
11 " uri ": " http :// example . org / webdata / orders /4711" } ,
12 { " type ": " association " ,
13 " uri ": " http :// example . org / webdata / orders /4711/ products " ,
14 " cardinality ": " many " ,
15 " name ": " products " } ,
16 { " type ": " association " ,
17 " uri ": " http :// example . org / webdata / orders /4711/ customer " ,
18 " cardinality ": " one " ,
19 " name ": " customer " } ,
20 { " type ": " accessor " ,
21 " href ": " http :// example . org / webdata / orders /4711/ total_price " ,
22 " name ": " total_price " } ]
23 }
The described concepts are represented by the lines in listing 3.5 as follows: identifier
– line 2; title – line 3; update information – line 4; edit link – lines 8,9; canonical
3.2. Representation types for Web resources
exposing entities in a domain model 33
link – lines 10,11; associated resource link for a collection of associated instances –
lines 12-15; associated resource link for a single associated instance – lines 16-19;
value link – lines 20-22; values – lines 5,6.
Value representations describe value resources that stand for object attributes and
their respective values. Value representations can represent virtual and non-virtual
attributes (refer to 2.1). A value representation consists of information describing
the name of the attribute and its value. In order to support different type systems
in different programming languages, value representations can include information
about the value’s data type.
Corresponding WebData Format elements are defined as wd:a where a is the name of
the represented attribute with XML attributes wd:type mentioning the corresponding
data type as defined by [BM04].
Listing 3.6 shows a value representation serialized to XML using a WebData schema
for the order class (see below).
Figure 3.3 illustrates the three basic representation types which are described above.
39 | text
40 | anyNonWebData ) *
41 } | attribute * - wd :* { text }
In listing 3.8, lines 6-28 and 36-40 represent the generic part of the schema rep-
resentation where lines 10-22 interweave the definition with Atom documents and
lines 24-28 define extra attributes for the atom:link element which realize associ-
ated resource links and value links. Lines 30-34 however, define the part of the
schema which is specific to the order class and resource. They constrain value
representations to attributes that exist on the respective class and to their data
types. WebData server-side connector implementations will need information on the
underlying data schemas (e.g. through reflection) in order to provide those repre-
sentations. WedData server-side connectors must provide at least WebData Format
schema representations for every resource that stands for a class. WebData Format
schema representations must contain the exact generic part, as mentioned above,
and a specific part with respect to the underlying domain model.
It is also possible for WebData server-side connector implementations to supply
generic and specific parts of schema representations separately. In this case, the
generic schema representation would have to be provided by the base resource as
described in 3.1 while specific schema representations would be provided by those
resources standing for a class as described above. The benefit of splitting up schema
representations is that all resource representations can be validated as WebData
representations using only one (generic) schema representation that acts as “least
common denominator”3 in a first pass while specific collection, member, and value
representations can be validated against the specific schema representation in a sec-
ond pass. Obviously, this increases complexity and might not be pertinant in all
cases.
Listings 3.9 and 3.10 show a split-up schema representation which represents the
same schema as listing 3.8 which was discussed above.
3
For instance, WebData domain models could be represented graphically using XSLT transfor-
mations.
36 3. WebData
26 | anyNonWebData ) *
27 } | attribute * - wd :* { text }
28
29 atomContent = element atom : content {
30 anyElement * ,
31 anyNonWebData *
32 }
33
34 atomLink = element atom : link {
35 attribute wd : cardinality { "*" | "1" }? ,
36 attribute wd : type { " association " | " accessor " }? ,
37 anyNonWebData *
38 }
Where lines 5-11 represent a customer resource and lines 12-19 and 20-27 represent
order resources respectively. Note, that the atom:feed element no longer mentions the
resource which stands for the class of represented instances. Instead, the atom:entry
elements mention those resources respectively.
The following paragraphs define the valid operations for all three resource types
respectively and specify the behavior clients must expect when requesting.
38 3. WebData
Collection resources
As mentioned before, collection resources expose a set of object instances which
can be a whole class, an arbitrary set or an associated set of instances. Collec-
tion resources propose the four methods GET, POST, PUT, and DELETE with the following
semantics.
GET requests should return a representation describing all resources that are mem-
bers of the collection. Following the mapping of resource types, this means: all
object instances that belong to the identified set (e.g. class, search result, as-
sociation). In many cases, it will be appropriate to limit description of actual
collection members to a bare minimum including references to the member
resources (cf. minimal member representations in 3.2).
Programmatically, this involves discovery of object instances according to crite-
ria inferred from the request URI and serialization according to the requested
content-type (cf. 3.5). Generally, discovery breaks down into four different
types of behavior that must be carried out by a server connector on GET:
POST requests yield the creation of a new member resource in or the addition of
an existing member resource to the referenced collection. A POST request has
to include a member representation which is either describing the intended
state for a new member resource or mentioning the URI of an existing member
resource as its identifier (see 3.2). Programmatically, behavior for POST differs
according to the following conditions:
Generally, WebData server-side connectors can neglect parts within given rep-
resentations that are not applicable to the type of object to be created or as-
sociated. After creation and/or association, the instance j must be serialized
and returned. The representation must contain a Location: header mentioning
the instance URI (for POST requests on class URIs) or associated instance URI
(for POST requests on attribute URIs) of j. Furthermore, the response must
contain a status code and headers as with a GET request on a member resource.
40 3. WebData
PUT requests ask resources to change their state according to the provided repre-
sentation. For collection resources, this yields the respective change of state
for its member resources as one atomic operation.
Programmatically, a WebData server-side connector must discover the identi-
fied instances as it would do in order to perform a GET request and subsequently
perform the required updates on them. WebData server-side connectors must
expect a member representation (see 3.2) and must update attributes of all
identified instances as single updates on all identified instances. In order to
achieve atomicity, server-side connectors can rely on transaction mechanisms
of the underlying persistence framework or account for exclusive access and
potential rollbacks themselves. If all updates have been successful, server-side
connectors must return a status code, headers, and a collection representation
as with GET requests. If one of the updates was unsuccessful (and thus the whole
transaction has been rolled back), the connector can return a 400 Bad Request
status code if it identified the cause of the failure within the request.
DELETE requests yield the removal of resources. For collection resources, this
means the removal of all its members as one atomic operation rather than the
removal of the collection itself.
Programmatically and similar to the GET request for collections, a WebData
server-side connector must discover all object instances enclosed by the set that
is identified by the collection and then remove them subsequently. Removal
must be carried out according to the following conditions:
Generally, a 404 Not found status code must be returned if the referenced class, at-
tribute or finder is not existing.
Member resources
Member resources expose concrete object instances which belong to one class and
can be members in one or more sets. Member resources propose the methods GET
, PUT, and DELETE with the following semantics. If another operation is requested
from a member resource, a 405 Method Not Allowed status code must be returned and
the response must include an Allow: header mentioning GET, PUT, DELETE as allowed
operations.
3.3. Exposure of object-oriented domain models as Web resources (server part) 41
GET requests must return a representation describing the identified resource. Fol-
lowing the mapping for resource types, this means: a concrete object instance.
If no prefetch postfix is used in the request URI, member resources must return
member representations on GET that are structured as defined in 3.2, otherwise,
they must return mixed collection representations.
Programmatically, this means discovery and serialization of the identified in-
stance from the domain model. If a prefetch postfix is used in the URI, all re-
sources that are associated through the attributes mentioned in prefetch must
be discovered as well (preferably using the same finder call). A GET request must
yield a 200 Ok status code upon successful discovery of the object instances.
If the identified instance supports versioning (see 3.3.5) and the request mes-
sage included an If-None-Match: header mentioning the current version of the
instance, the request must yield a 304 Not Modified status code and no repre-
sentation instead. In any case, the response must include an Etag: header
mentioning the current version if versioning is supported for the current in-
stance.
If the date and time of the next potential change to this instance can be
determined (see 3.3.5), the response must include an Expires: header mentioning
this point in time formatted as defined in [Bra89].
PUT requests yield a status change on the identified resource and must include a
representation describing the indented state for the resource.
Programmatically, a WebData server-side connector must discover the identi-
fied instance as it would do in order to perform a GET request and subsequently
perform the required update on it. WebData server-side connectors must ex-
pect a member representation (see 3.2) and must update attributes of the
identified instance. If the member is referenced by an attribute URI refer-
ring to attribute a of instance i and the representation contains an identifier
(see 3.2.2) that is different from the identifier of the resource which currently
exposes the model instance being associated to i through a, a has to be up-
dated in a way that the new instance becomes associated to i. If the update
has been successful, server-side connectors must return a status code, headers,
and a member representation, as with GET requests. If the update was unsuc-
cessful, the connector can return a 400 Bad Request status code if it identified
the cause of the failure within the request (e.g. the enclosed representation).
DELETE requests yield the removal of resources. For a member resource, this
means the removal or deletion of this member with respect to the collection
that it is a member in.
Programmatically and similar to the GET request for members, a WebData
server-side connector must discover the object instance that is identified by the
request and remove it subsequently. Removal must be carried out according
to the following conditions:
Generally, a 404 Not found status code must be returned if the referenced class, in-
stance, attribute or associated instance does not exist.
Value resources
Value resources stand for single values that instance attributes may have. Value
resources propose the methods GET, PUT, and DELETE with the following semantics. If
another operation is requested from a value resource, a 405 Method Not Allowed status
code must be returned and the response must include an Allow: header mentioning
GET, PUT, DELETE as allowed operations.
GET requests must return a representation describing the identified resource. Fol-
lowing the mapping for resource types, this means: the value of an instance
attribute. Value resources must return value representations on GET which are
structured as defined in 3.2.
Programmatically, this means discovery of the instance identified by the at-
tribute URI within the domain model, readout of the attribute value identified
by the attribute URI, and its serialization. Note, that readout may entail the
execution of an augmented getter method as described in 2.1. A GET request
must yield a 200 Ok status code upon successful discovery of the object instance.
PUT requests yield a status change on the identified resource and have to include
a representation describing the indented state for the resource.
Programmatically, a WebData server-side connector must discover the identi-
fied instance and read the identified attribute as it would do in order to perform
a GET request and subsequently perform the required update on the attribute.
WebData server-side connectors must expect a value representation (see 3.2)
and must update the identified attribute of the identified instance. This may
entail the execution of an augmented setter method as described in 2.1. If the
update has been successful, server-side connectors must return a 200 Ok status
code and a value representation as with GET requests. If the update was un-
successful, the connector can return a 400 Bad Request status code if it identified
the cause of the failure within the request (e.g. the enclosed representation).
DELETE requests yield the removal of resources. For a value resource, this means
the reinitialization of the identified attribute value rather than the removal of
the attribute itself as it is defined by the data structure and not subject to
change.
Programmatically and similar to the PUT request for values, a WebData server-
side connector must discover the object instance and attribute which are iden-
tified by the request and set the attribute value to its initial state (e.g. NULL
3.3. Exposure of object-oriented domain models as Web resources (server part) 43
Generally, a 404 Not found status code must be returned if the referenced class, in-
stance or attribute is not existing.
Generally, all operations on all resource types must yield a 500 Internal Server Error
status code if an error occurred while performing the request and the failure cannot
certainly be identified as caused by the client. According to HTTP, any response with
a 500 Internal Server Error status code should include a representation explaining the
error. This explanation may reveal original error codes and messages of underlying
components if this is not considered critical for security.
class of use-cases does not satisfy this assumption: in many situations, clients only
know aspects of objects or need to select a subset of objects from a larger set using
conditional expressions. In relational database management systems, for example,
the Structured Query Language (SQL, [BC74]) has been a prominent and widely
accepted means for expressing and executing queries for many years.
Query URIs
WebData proposes a URI based approach to expressing queries against collections of
resources which is conceptually similar to aspects of SQL. It enables clients to specify
conditions in a URI. Conceptually, such a query URI (cf. 3.1) names a particular
query result and thus makes the respective collection of matching resources itself
a resource which in turn can be accessed using the uniform interface defined for
collection resources (cf. 3.3.1).
The URI specification [BLFM05] provides for queries in URIs through its definition
of the query part which can be mentioned in any URI after its hierarchical part using
the ? as a delimiter. However, neither the URI nor the HTTP specification define
a more concrete syntax, formats, or semantics for those queries. In existing Web
applications, the query part is widely used to specify any kind of parameters which
should be appended to a URI in order to retrieve a modified response. A usual
pattern for the query parameter is shown in listing 3.12 below:
Listing 3.12: Usual form for URI query parameter
query = condition *( "&" condition )
condition = key "=" value
In listing 3.12, simple key-value-pair based conditions can be mentioned and com-
bined using the logical AND operator expressed by &. Obviously, the expressiveness
of queries using this pattern is quite limited while apparently sufficient for many
cases. It can of course be used in many cases where keys and values carry additional
but domain-specific semantics, but appears rather insufficient when a global query
approach for all kinds of applications and domain models is required.
Listing 3.13 shows the grammar which WebData defines for free queries which are
designed to be globally applicable to Web resources.
Listing 3.13: Free URI queries
free_query = ["!"] ( free_query logop free_query / "(" free_query ") " /
expression compop expression )
expression = attribute / value
attribute = string ["::" string ]
value = number / " ’ " *( string / "*" / "?" ) " ’ "
logop = " ," / "&"
compop = "=" / "+=" / " -=" / "~=" / "!="
string = 1* unreserved
unreserved = ALPHA / DIGIT / " -" / "." / " _ " / "~"
number = 1* DIGIT [ "." 1* DIGIT ]
A free query is a construct consisting of one or more conditions which can be com-
bined using logical AND (using &) and OR (using ,), nested (using ( and )) and
negated (using !). Every condition is composed of two operands and a comparison
operator. Available operators are equal (=), not equal (!=), less than (-=), greater
than (+=) and like (~=, to be used for pattern matching as in SQL [BC74]). Operator
precedence is as follows (higher levels first): =, +=, -=, !=, !, &, ~=, ,.
3.3. Exposure of object-oriented domain models as Web resources (server part) 45
Every operand can either be a concrete value, a pattern (using single character ? and
multi character * wildcards) or an instance attribute as defined by the object classes
which the Web resources stand for. Note, that attributes of associated classes can
be included in a query using :: as dereference operator.
Listing 3.14 shows some sample query URIs for illustration purposes.
Finder methods
The proposed search mechanism using query URIs provides a certain amount of
expressiveness. However, query URIs are not suited for more sophisticated queries
which include complex joins from different sets of objects or special projections.
Commonly used queries should be wrapped into static finder methods on class level.
Hence, assuming that complex queries are built into object classes, those finders
can be accessed using finder URIs on WebData server-side connectors. Server-side
connector implementations must provide finder URIs for all existing finder methods
and expose a collection interface as defined in 3.3.1. Note, that in order to distinguish
finder methods from other methods, special configuration for the connector may be
needed. However, it may be more pertinent to use conventions, e.g. starting finder
method names with the imperative find if possible. Parameters which finder methods
may expect must be read from finder URIs as defined in 3.3.1. Again, a mapping of
parameter names to keys in the URI query part are most pertinent.
WebData server-side connectors must expose each finder with its own finder URI.
Connectors should use naming conventions based on a mapping between finder
method names and the findername part of the URI. The proposed mapping con-
sists in naming finder methods like find_name and deriving name as the findername
URI part. As for parameters, key query parts should be derived from parameter
names. For example, a finder method find_relevant_for_intl_vat_refund(year) defined
on a class order would be made available via the URI http://example.org/webdata/orders
/relevant_for_intl_vat_refund?year=x where x would be passed as respective value for
the year parameter to the finder method.
sets of resources using class, attribute, finder or query URIs and perform selections on
the retrieved collections manually. From a performance point of view, this solution
will likely be quite expensive and should be avoided, but may be pertinent in cases
where collections contain only a limited number of members.
Authentication
HTTP – which all WebData interactions are based upon – provides for authen-
tication mechanisms using the Authorization: and WWW-Authenticate: headers and the
401 Unauthorized status code.
other frameworks (e.g. directory servers via LDAP) to check validity of credentials.
Also, implementations should propose a mechanism to delegate credential checking
to the actual application using the connector, i.e. using callback mechanisms.
Authorization
Conceptually, servers can use the 403 Forbidden status code to indicate that a request
is not allowed with respect to the requested resource, the request method and the
supplied credentials. Concrete server-side connector implementations must use the
403 Forbidden status code if access to a resource is denied. In order to determine
whether or not access should be allowed or denied, connectors must be configurable
by application developers using them in target applications. In order to support this,
connector implementations can use different techniques such as reading configuration
files or using annotations in the implementation of the domain model.
Regardless of the actual configuration technique and serialization format, the actual
language describing authorization must be composed of authorization rules which
contain the following types of information:
Model class An identifier which uniquely specifies a model class within the exposed
domain model or a wildcard which identifies any class.
Resource type An identifier which references one of the three resource types in
WebData (cf. 3.1), namely collection, member, and value or a wildcard.
Request method An identifier which references one of the four request methods
which WebData specifies semantics for (cf. 3.3.1), namely GET, POST, PUT, and
DELETE or a wildcard.
Expiration information
There are a number of ways in which expiration dates and times can be determined
which are subject to concrete server-side connector implementations.
In a number of cases, expiration dates could be determined using heuristics based on
an instance’s change history or the change history of similar instances (e.g. those be-
longing to the same class or set of instances). This is particularly useful in situations
where instances have proven to have very similar change intervals over time.
Another workable approach is to infer an instance’s expiration date and time from the
current state of the resource. Depending on the domain model, concrete instances
3.4. Object representation and mapping for REST Resources (client part) 49
may contain information which clearly define that the instance will never change
states again (e.g. a closed case, a resigned employee) or will likely change according a
defined schedule (e.g. deadlines, publication data). WebData server-side connectors
should provide a means for configuration (e.g. via callbacks) such that expiration
dates can be deferred from concrete instances of objects within the domain model
where possible.
WebData server-side connectors may determine expiration information using the
above-mentioned approaches or others, however, implementers must be aware that
wrong expiration information (i.e. dates too far in the future) will lead to possible
inconsistencies and erroneous behavior on the respective clients.
If available, expiration information for an instance must be supplied with every
response including a member resource using the HTTP Expires: header. Expiration
must then be formatted as defined in [Bra89]. As specified in HTTP, the expiration
date for an instance should be approximately one year from the time the response
is sent if the instance will never change state again.
Cache validation
Furthermore, WebData server-side connectors should supply version information
with every representation in order to support cache validation using conditional
GET operations as described in 2.2 and 3.4.2. Version information in this case must
be a symbol which uniquely identifies the current version of the resource with respect
to its change history. This is most likely to be implemented using an integer value
which is incremented at every state change the resource undergoes.
Versioning requires the connector to store extra persistent information for every
resource which may not be realizable in the underlying technology. However, ver-
sioning should be implemented by server-side connectors whenever possible. Version
information must then be supplied with every response to requests which were is-
sued against a member resource using the HTTP Etag: header to support subsequent
conditional GET requests.
Whenever a server-side connector receives a conditional GET request (i.e. in this case
a request directed against a member resource including an If-None-Match: header) the
current version of the identified instance has to be compared to the one mentioned in
the header. If both versions match, a 304 Not Modified status code must be returned
and the response must not include a response message, otherwise behavior is as
described in 3.3.1. When used by clients according to 3.4.2, this behavior results in
less transferred data if no changes have been made to the requested resources (cf.
2.2).
client-side connectors interact with WebData resources as specified in 3.3 and expose
them as entities within a domain model to the embedding target application. Thus,
server and client connectors together provide a transparent access channel to domain
models which are located on remote and potentially distributed servers. Roughly
speaking, WebData is able to “lift” the persistency layer to upper layers within the
architecture to give application developers a native and direct access to the domain
model.
3.4.1 Discovering, creating, reading, updating, and deleting
Web resources like objects
In order to supply an object-oriented programming interface to Web resources which
are available over HTTP and addressable via URIs (cf. 3.1) resource concepts must
be mapped back to classes, instances, methods, attributes and associations. While
the initial “entry point” into a remote domain model must be defined using a URI
the overall goal for subsequent interactions is “URI-less navigation”. URI-less means
that URIs of associated values and instances are not exposed to the embedding
application but automatically retrieved and dereferenced by the client-side connector
on demand.
In order for resources to behave like entities in a domain model, the basic concepts
of it have to be reconstructed. Concrete implementations may differ in a number of
aspects with respect to the features of the target programming language. Mainly,
a client-side connector should supply a class definition for WebData resources, such
that concrete instances can be created and stand for member resources which are
consumed from the Web. At least a default class for all objects standing for WebData
resources must be supplied. Programmatically, this class will most likely contain all
the standard behavior on class and instance level (e.g. finders, creators, save and
destroy methods, cf. 2.1.1).
It is recommended that client-side connector implementations provide a means for
application developers to obtain one class definition per class collection they intend
to use. Those class definitions can easily be inferred from the specific part of a
schema definition (cf. 3.2) and will most likely be inheriting from the default Web-
Data class. While WebData client-side connector implementations may not propose
specific class definitions this lack in type-awareness may result in confusion for ap-
plication developers which may end up as a considerable source for errors. Listing
3.16 shows sample Ruby code which application developers could use to define a
member type specific client-side class. The class Order in the listing inherits from the
default client-side class which is provided by the connector and is configured using
(a) a class URI (cf. 3.1) which can be used for creation (see below) and schema
representation retrieval and (b) credentials for HTTP access (cf 3.3.3).
Listing 3.16: Sample class collection specific class definition (Ruby)
class Order < WebDataResource :: Base
self . uri = " http :// example . org / webdata / orders "
self . credentials = {: name = > " yeah " , : password = > " secret "}
end
The following sections describe the basic behavior of the client-side connector de-
fault class5 with respect to the main aspects of resource access: discovery, creation,
reading, updating and destruction.
5
It is assumed that specific client-side classes inherit the default class’s basic behavior.
3.4. Object representation and mapping for REST Resources (client part) 51
Discovery
In order to locate resources on the Web and have them represented as objects to the
local environment, the client-side default class must provide a mechanism to perform
HTTP GET requests and instantiate local instances which stand for the retrieved
resource representations. The class must expose a class method which should be
called find() and expect a combination of the following parameters which can be
used to construct a URI:
URI The URI parameter accepts a class, finder, query or instance URI (cf. 3.1)
which must directly be used to request the resource, if no more parameters are
given. The URI parameter is optional, if a URI is given in the respective class
definition and overrides any URI within the class definition if both are given.
Complete The complete parameter accepts a boolean value. If the complete pa-
rameter is set to true, the given URI (or the URI which was derived from the
class definition, see above) must be modified such that the classname part is
preceded by "complete-" according to 3.1.
Credentials The credentials parameter can be used to supply credentials for au-
thentication with the respective resource. In common scenarios, this will be
a username and password pair. However, implementations can offer more so-
phisticated mechanisms as mentioned in 3.3.3. If credentials are given in the
respective class definition, the credentials parameter overrides them. Refer to
3.3.3 and 3.4.3 for more information.
Upon a call to the find() method, a GET request to the respective URI must be
issued. Subsequently, a local instance of the respective client-side class must be
instantiated and returned for each member representation which is included in the
respective response, loading must be carried out as described below (see Reading).
Note, that the member representations included in the response may not include
value representations. Values will then be acquired if they are accessed for the first
time following the lazy loading design pattern [GHJV00].
Discovery must return information about its success or failure using the common
error reporting mechanism of the target programming language or framework. The
type of the error should be indicated. Possible error types are defined in 3.3.1.
52 3. WebData
Creation
Creation of resources is achieved through creation of local model object instances
and must be supported through a class method on the client-side default class which
should be called create(). It must expect a class URI and optionally a number of
initial attribute-value pairs for the new resource as parameters. Note, that if the
client-side connector implementation proposes a mechanism to obtain specific class
definitions per member type, the class URI may be retrieved from the respective
class definition.
Upon method call, the client-side class must create a member representation (cf.
3.2) which reflects all given attribute-value pairs as value representations and request
resource creation using an HTTP POST on the class URI. Assumingly, the request will
yield a member representation as defined in 3.3.1. To complete the creation of the
local instance, this representation must be loaded as described below (see Reading).
Subsequently, the new instance must be returned.
A create operation must return information about its success or failure using the
common error reporting mechanism of the target programming language or frame-
work. The type of the error should be indicated. Possible errors types are defined
in 3.3.1.
Reading
Local model object instances which stand for member resources on the Web must
expose an interface to access representation data and provide for object naviga-
tion. In order for an application to interact with data carried by the instance, the
representation has to be loaded. Loading must be carried out as follows:
2. Expiration and version information must be read from the Expires: and Etag:
headers of collection or member representations respectively and stored locally
and privately to the new instance.
3. An attribute6 must be initialized within the new instance with the respective
value which was retrieved for each value representation which is included in
the member representation.
4. If there is a schema representation (cf. 3.2.4) available through the class of this
instance, an instance attribute can be inaugurated for each attribute as defined
in the specific part of the schema even if it is not included by the member
representation. However, the attribute must be marked as non-initialized and
loaded if needed as described below.
The loading mechanism should be provided by a private instance method named load
() which expects a member representation as input. Furthermore, a public instance
method named reload() should be provided which retrieves a new representation using
a GET request to the instance’s canonical URI and then calls the loading mechanism.
Upon attribute read access on a local instance by the application, three different
situations are possible:
2. If the attribute is represented by a value link within the respective member rep-
resentation a GET request to the value link must be performed and the retrieved
value must be returned.
The load mechanism must return information about its success or failure using the
common error reporting mechanism of the target programming language or frame-
work. The type of the error should be indicated. Possible errors types are defined
in 3.3.1.
Updating
Upon attribute write access on a local instance by the application, again three dif-
ferent situations are possible:
While in situations 2-3 immediate HTTP requests are to be performed by the client-
side connector, operations like in situation 1 are only applied to the local instance
data and no request is issued. As mentioned before, the actual modifications to the
resource which the instance stands for must be requested explicitly. This storing
mechanism must be provided by a public instance method which should be named
save(). Upon call of this method, a member representation must be constructed
which contains all values which have changed with respect to the last retrieved
representation. Subsequently, this representation has to be included in a PUT request
against the edit URI that has been attributed to the current instance.
An update operation must return information about its success or failure using the
common error reporting mechanism of the target programming language or frame-
work. The type of the error should be indicated. Possible errors types are defined
in 3.3.1.
Destruction
For local model object instances which stand for member resources on the Web,
there must be an explicit mechanism for destruction. While local objects aren’t
usually destructed explicitly in modern programming languages due to the existence
of garbage collectors, Web resources stand for objects in remote domain models
whose life cycle does foresee their destruction.
Local model object instances must provide a mechanism for destruction to be called
explicitly. The method performing this operation should be called destroy() and
perform the destruction using a DELETE request against the edit URI which is stored
with the local instance. If the target programming language for the client-side
connector implementation supports the concept of destructors, it might be pertinent
to combine this functionality with an instances destructor.
3.4. Object representation and mapping for REST Resources (client part) 55
The destroy mechanism must return information about its success or failure us-
ing the common error reporting mechanism of the target programming language or
framework. The type of the error should be indicated. Possible errors types are
defined in 3.3.1.
3.4.2 Caching
Obviously, dealing with entities in remote domain models entails constraints re-
garding application performance. It is therefore pertinent for WebData client-side
connector implementations to provide caching mechanisms in order to limit actual
remote interactions and perform a maximum of operations locally. As discussed
in 2.2.6, caching represents an essential element within HTTP. WebData client-side
connectors can make use of HTTP’s caching features to realize effective performance
improvements.
2. Before implicit loading (i.e. loading which is not triggered by the explicit
reload mechanism as described in 3.4.1) of an instance from a member repre-
sentation the cache has to be queried using the URI which would be used for
the respective HTTP GET request. If the instance can be found in the cache,
the cached instance has to be inspected according to the cache replacement
strategy (see below). Depending on the outcome of that inspection, the cached
instance may be used and the load operation may be omitted.
The client-side cache has to evaluate the following conditions in order to decide
whether an instance is actually returned for use by the embedding application when-
ever it has been found in the cache:
While the replacement strategy will be appropriate in most cases, application devel-
opers must be able to express explicit reloads. Thus, WebData client-side connectors
must provide the reload mechanism (see Reading above) regardless of expiration and
version information.
Discovery with prefetch In order to realize prefetch, the request used for discov-
ery in the client-side connector has to be modified in a way that it requests
more resources from the server. While WebData servers could theoretically
return all associated resources at once without an explicitly modified request
from the client, the overall length of the message is very likely to become
much larger and not all representations may be needed. WebData client-side
connectors must therefore provide a mechanism which enables end application
developers to specify which resources should be prefetched. More specifically,
developers must be able to name attributes of the instances which are cur-
rently requested. This should be incorporated in the find() method’s signature
as an optional parameter called prefetch accepting an array or hash-like data
structure mentioning symbols for associations to include in prefetch. WebData
server-side connectors will respond with representations for the requested re-
source and representations for associated resources as specified (cf. 3.3.4). As
described above (see Discovery and Caching local instances), the representa-
tions have to be loaded subsequently and instances have to be stored in the
local cache. The discovery method however, must only return those instances
which were initially requested excluding all prefetched instances. Note, that it
may be pertinent for application developers to request representations which
contain complete member representations (cf. 3.2) using the complete parameter
(cf. Discovery above) for the find() method.
Listing 3.17 illustrates how the discovery functionality with prefetch could be
called from within a target application.
Listing 3.17: Sample discovery with prefetch call (Ruby)
Order . find (: all , : conditions = > " total_price +=100000" , : prefetch = > [:
customer ] , : complete = > true )
Supposed that the Order class has been defined as in listing 3.16 and the iden-
tified resource defines an attribute customer which is an association, the find()
method as in listing 3.17 would subsequently issue a GET request to the URI http
://example.org/webdata/complete-orders-with-customer?total_price+=100000 and retrieve
a mixed collection representation (cf. 3.2.5) including complete order and
customer member representations. All representations would be loaded and
instances would be cached accordingly while only order instances would ulti-
mately be returned by the find() call.
Look-ahead on idle The look-ahead on idle caching strategy takes advantage of
the fact that the target application may not be using the network connection at
certain points in time. During that time, client-side connectors may issue GET
requests in the background (i.e. in a different thread of execution). While the
basic selection mechanism for resources which should be requested is again the
association to already loaded resources, more sophisticated heuristics can be
implemented, e.g. access counters on instances can determine relevance of an
instance and thus request resources first which are associated to the resource
which this instance stands for. Subsequently, those usage statistics could be
used to define heuristics for similar instances (e.g. those belonging to the same
class).
The realization of look-ahead on idle is straight-forward. In idle periods, client-
side connectors must collect URIs leading to associated resources from already
58 3. WebData
cached instances and discover them (see Discovery above) subsequently. Note,
that for every URI, the cache must be checked first. The retrieved represen-
tations must be loaded as described above (see Reading) and placed in the
cache. Connector implementers should pay attention to the fact that look-
ahead may produce a lot of network traffic and should consider providing lim-
ited bandwidth for look-ahead. Furthermore, look-ahead should abort imme-
diately when other resources are requested explicitly by the target application.
Look-ahead should be implemented recursively, meaning that after an associ-
ated resource has been loaded and cached, it can again be inspected for links
to its associated resources. However, look-ahead on already cached instances
should be completed before more levels of recursion are entered. Client-side
connectors should provide a mechanism to application developers to configure
whether look-ahead must be used, on which classes and attributes it should be
effective and up to which level of recursion look-ahead should operate.
It may be pertinent to integrate these configuration options in class definitions
of classes which are derived of the default WebData client class. Listing 3.18
illustrates how the order class could be configured.
Listing 3.18: Sample class collection specific class definition with look-ahead
configuration (Ruby)
class Order < WebDataResource :: Base
self . uri = " http :// example . org / webdata / orders "
self . credentials = {: name = > " yeah " , : password = > " secret "}
self . lookahead = [: products , : customer ]
end
Line 4 in listing 3.18 defines that look-ahead should be carried out on order
instances following the products and customer associations.
result, the first request will change the resource’s state at first, but that state change
will be overridden by the one which follows the first request. The problem here is
that the second change is not based on the current state of the resource but rather
on an old representation of it and may thus differ from the actual state change which
the second client intended.
In order to overcome this issue, basically two different approaches are possible which
are widely referred to as pessimistic and optimistic. When using a pessimistic ap-
proach the client locks the resource (i.e. transfers it into a state where the client has
exclusive access) on fetching of a representation for editing purposes and does not
release the lock before the actual update operation has been performed. As a con-
sequence, resources may be blocked for quite a long time and the lock may span the
whole period of editing which will most likely include user interaction. Obviously,
this approach will not result in a very pleasant user experience for highly distributed
landscapes such as the Web, where the number of concurrent users is potentially
infinite. The optimistic approach however does not involve locking mechanisms.
Instead, it uses versioning which allows resources to detect a possible lost-update
problem before actually performing an update operation: With every representation
that a resource sends to describe its current state, it includes a symbol identifying
its current version. Update operations will only be performed by the resource if the
update request includes a symbol which identifies the exact current version of the re-
source. Subsequently, the resource updates its version information. Thus, if another
update operation has changed the resource in between, subsequent update operations
based on older representations will fail. Note, that in this case, the client application
is only notified of the conflict and that the update operation has not been performed.
Hence, the client should get a new representation and re-request the state change
based on the representation. This may likely involve re-interrogating the respective
user. Furthermore, it is not guaranteed that the next update request will succeed.
HTTP provides means to help servers detect the lost update problem using the
optimistic approach. According to HTTP, servers can use the Etag: and the Last-
Modified: header to propagate a resource’s current version or timestamp respectively.
Subsequently, clients can request updates which are bound to a condition using
If-Match: and If-Unmodified-Since: as headers. However, because of the fact that
in the aforementioned solutions clients can decide whether or not they want to
perform a conditional request, WebData uses a slightly different realization of the
optimistic approach which has previously been described by [Goo07] as optimistic
concurrency control : a resource’s version information9 is expressed as a single non-
negative integer in every edit link (cf. ’versioned instance URIs’ in 3.1 and ’edit
links’ in 3.2.2). This version number is incremented by 1 at every successful update
operation on a resource and update operations (i.e. PUT requests) can only be directed
against those versioned instance URIs.
Concrete WebData server-side connector implementations should implement opti-
mistic concurrency for their resources. However, it is recognized that this requires
the connector to store extra persistent information for every resource which my not
be realizable in the underlying technology. Furthermore, connector implementations
can leave the choice on whether or not to use optimistic concurrency to the respec-
9
Identical version numbers can be used for cache validation (cf. 3.3.5) and optimistic concur-
rency.
3.6. Concurrency and transactional behavior 61
tive developers who use the connector. Configuration may then be turned on and off
on a per-model-class basis. If optimistic concurrency is implemented by a connector
and used for a model class a, it must be used consistently, which has a number of
implications:
3. All representations for resources standing for model objects of any other class
b must contain version1 (cf. 3.1) in their edit link if b has an association to a
and the representation is referenced to in the context of a as described in 3.2.
4. In representations and requests the latest known version numbers must be used
for version1 and version2 URI parts respectively. Thus, clients must use the last
version number they received for that particular resource and servers must use
the last version number they assigned to that particular resource during an
update operation.
Figure 3.5 illustrates optimistic concurrency as defined for WebData using a basic
example with two clients and a resource.
10
The author does not dispute that a workable and suitable approach to atomic transactions in
the context of WebData (e.g. 3-Phase-Commit, [SS83]) can be established but explicitly regards
this as out of scope for his thesis work.
4. WebData for the SAP
R
Cross
Application Timesheet
architecture
This chapter describes how WebData can be employed in a concrete scenario which
has been provided by SAP
1R
, the World market leader in enterprise software.
First, the general Cross Application Timesheet (CATS, [SAP01b]) component is
briefly introduced from an end-user’s point of view and the use of widgets as an
alternative for a richer and more focused user experience is motivated. One of
TM TM
SAP
R
’s interaction mechanism for ABAP -based mySAP Business Suite com-
ponents, namely Business Application Programming Interface (BAPI
R
, [SAP01a])
is described in general and the specific operations for interactions with CATS are
outlined. Then, the system architecture for a CATS widget scenario is described and
a possible realization for a server-side WebData-enabled wrapper and a client-side
WebData-enabled widget is presented.
4.1 SAP
R
Cross Application Timesheet and the
Business Application Programming Interface
TM
This chapter introduces CATS and BAPI
R
. CATS is SAP ’s solution for personnel
time management which integrates with a number of worktime-related components.
TM
BAPI
R
is SAP ’s business application programming interface which developers can
TM
use to interact with SAP components.
To enter time information using CATSXT, several steps have to be performed for each
TM
activity type after the user has logged on to mySAP using the SAP
R
GUI: first,
working dates have to be selected from the calendar on the left hand side, then start
and end times have must be entered using number keys for each day separately and
short text descriptions can be given on the right hand side. Next, the entries have
to be checked and copied to the time entry clipboard using the clock symbol above
the time entry lines. Then, the transaction can be saved and closed using the save
button in the toolbar at the top of the screen.
While the presented transaction allows for detailed entry and promises to allow
users to record data for every possible situation including business trips, missing
days, split recordings for different cost centers, etc. it is obviously desirable to have
an at-hand tool whose functionality is limited and focused to the respective day-to-
day use-case. Widgets have emerged as a light-weight approach to handling very
4.1. SAP
R
Cross Application Timesheet and the Business Application
Programming Interface 65
focused similar user tasks on a regular basis. Widgets can be realized as always-on
mini applications which reside on web pages or an end user’s desktop. On demand,
they can be brought up and offer instant interaction with local data or remote back
end systems. A time-entry widget would reside on the desktop, offer real-time time
entry and thus replace pen and paper recording while saving synchronization time.
As widgets are light-weight user space applications which are often realized in client-
TM
side scripting languages such as JavaScript, interaction with the mySAP backend
TM
should be realized in a suitable way, i.e. WebData. However, the mySAP Business
Suite currently does not offer ways of interaction which are suited for those settings.
Instead, the standard way of interacting with CATS is its business API (BAPI
R
).
The concept of BAPI
R
s in general is described in the next section.
TM
4.1.2 mySAP integration through the Business Applica-
tion Programming Interface
The Business Application Programming Interfaces (BAPI
R
s) are the designated
TM
standard interfaces to mySAP components and modules which are written in ABAP,
SAP
R
’s programming language for business applications. They are used for inter-
action between a number of SAP
R
components and are supposed to serve as a single
point of entry for third party solutions and applications. Interaction with BAPI
R
s
is designed to be network-enabled which means that the TCP/IP protocol suite is
used to realize communication, such that BAPI
R
s can be used in local area net-
R
works and the global Internet. BAPI s are meant to allow for integration at the
business rather than at a technical level with respect to granularity of tailoring of
the BAPI
R
functions.
TM
Entities within the mySAP business suite are designed around the principles of
object-orientation, meaning that autonomous entities in terms of functionality and
data are bundled together to reduce complexity. SAP
R
’s Business Object Types de-
fine the different types for those objects. However, in order to support both object-
oriented and non-object-oriented environments on a BAPI
R
’s client-side, BAPI
R
s
are designed as methods on Business Object Types and can be used without the
notion of actual classes or instances. Translated to terminology of object-oriented
programming languages this means that BAPI
R
s are defined as class methods on ob-
ject classes rather than on instances. When operating on concrete instances, primary
keys or other identifiers have to be used to perform instance-related functionality
which would have been defined as instance methods in classical object-orientation.
Furthermore, associations between objects of different types are not defined explic-
itly but have to be reconstructed by BAPI
R
clients by acquiring primary keys and
R
using the respective BAPI for the object type of the associated objects.
In order to execute BAPI
R
methods, SAP
R
’s remote function call (RFC) mecha-
nism has to be used. RFC’s are basically remote ABAP procedure calls which can
have multiple parameters. Following the principles of ABAP, a parameter is either
an import, export, changing or table parameter. Import and export are those con-
cepts which are referred to as input parameters and return values in many other
programming languages. A changing parameter is a parameter which servers both
for importing and exporting data to a function and a table parameter accepts two-
dimensionally structured records (i.e. tables) of data both for input and output.
66 4. WebData for the SAP
R
Cross Application Timesheet architecture
Here, end users can use their respective widget instances to access the CATS widget
TM
server which connects to the mySAP servers running the CATS application. The
CATS BAPI
R
wrapper is in charge of providing an object-oriented domain model
while accessing the BAPI
R
and restructuring data. The WebData server-side con-
nector interacts with the domain model and exposes its entities as resources on the
Web. Client-side connectors in turn, access resources from within the widgets in-
stances on the end user’s desktops and reconstruct the domain model for application
code which realizes the widget logic that builds up the actual widget.
TM
Figure 4.4: Domain model for time entry using CATS in mySAP
A time sheet entry as depicted in figure 4.4 has to be recorded with times and dates
when work started and has finished and the activity which has been performed.
Furthermore, a time sheet entry must be associated to the employee who performed
the work and the cost center which the activity has to be billed to. An employee
record as exposed to the WebData connector contains the employee’s real name
and associations to the time entries for that employee. A cost center model object
exposes its description as acquired from the BAPI
R
and its associations to employees
and reported time entries. The behavior carried out by the find(), create(),save(), and
destroy() methods is subject to custom implementation which accesses the respective
BAPI
R
s accordingly.
4.2.2 Authorization
In order to establish secure interaction with CATS and to prevent misuse, the fol-
lowing access rules must be used to configure the respective WebData server-side
connector with respect to the definitions in 3.3.3:
4. GET is allowed on the employee entry collection and its members if they are
representing the authenticated user.
5. GET is allowed on the cost center entry collection and its members if they are
associated with the authenticated user.
Figure 4.5 shows a time entry widget which has been developed throughout this case
study.
5. Conclusion and related work
This chapter presents related work and draws conclusions from the thesis.
procedure and generation of a login token plus session state kept on the (Google)
servers. As outlined in 3.3.3 WebData’s authentication concept is closer to the
authentication mechanisms provided by HTTP and adheres to the stateless com-
munication requirement [Fie00, 5.1.3] of representational state transfer (which is
violated for authentication by GData) and for the sake of scalability.
In addition to the abovementioned aspects, WebData addresses a number of aspects
which have not been considered for GData. While GData only defines a syndication
format and a publishing protocol, no mapping to domain models is given – first class
citizens in GData are pure XML messages.
GData is not open with respect to the fact that serving GData is completely governed
by Google. However, a number of client-side libraries are supplied which allow
applications to consume GData.
5.1.2 Queso
Queso is a semantic Web/Web 2.0 server which is being developed as a research test
bed by Elias Torres, et al. at IBM. Queso implements Atompub and is coupled to
an RDF [W3C99b] server for persistent storage and using the AtomOwl Vocabulary
Specification [AS06]. In addition to Atompub as means of communication, Queso
offers a SPARQL [SP06] endpoint for querying the data triples stored on the server.
While Queso offers an Atompub implementation which is supposed to adhere strictly
to the standard, it does not consider object-oriented domain models as a layer under-
neath, but RDF. It offers a Web based generic user interface which can be employed
to query and browse data which is stored on the server. Client applications can
access the server through Atompub and SPARQL which enables interaction with
client-side technologies such as JavaScript through a SPARQL Javascript Library
which was conceived by Lee Feigenbaum, et al.
5.1.4 Others
Web Application Description Language (WADL)
acts as resource
acts as resource is a plugin for the Ruby on Rails Web application framework that
enables automated URI matching for “nested resources” which are similar to asso-
ciated URIs as described in 3.1. It does not offer interactions with resources using
representations.
5.2 Conclusion
The work presented in this thesis defines a middleware for RESTful application
integration which is composed of two main types of system components.
WebData server-side connectors are language and domain independent components
which are meant to be plugged into any application which is built around some sort of
domain model that is persistently stored. The most prominent and straightforward
example for domain model implementations is Active Record (as described in 2.1.1),
but domain models can be realized using more exotic technologies and concepts.
All of which can be accessed by WebData connectors as long as they provide a
consistent and coherent object-oriented interface to their environment. WebData
server-side connectors are subsequently able to access domain models using object-
oriented concepts and – by investigating on their structure (i.e. classes, instances,
attributes, and associations) – provide an HTTP based working point for clients
on the World Wide Web which is based on the concepts of representational state
transfer (REST). By providing URI references for domain classes and objects (cf.
3.1) and by implementing the defined operation semantics (cf. 3.3.1) connectors
expose these entities as Web resources which will receive and send representations
of current and intended state (cf. 3.2) respectively. Connectors have to adhere to
access restrictions as required by the application and its authorization model which
depends on the domain and types and roles of users (cf. 3.3.3).
72 5. Conclusion and related work
[BKEI02] Oren Ben-Kiki, Clark Evans, and Brian Ingerson. YAML ain’t markup
language (YAML) (tm) 1.0. Working draft, YAML.org, July 2002.
[BN84] Andrew D. Birrell and Bruce Jay Nelson. Implementing remote pro-
cedure calls. ACM Transactions on Computer Systems, 2(1):39–59,
February 1984.
[CHET03] Kishore Channabasavaiah, Kerrie Holley, and Jr. Edward Tuggle. Mi-
grating to a service-oriented architecture. Technical report, IBM Inc.,
December 2003.
[Cro06] Douglas Crockford. The application/json media type for javascript ob-
ject notation (JSON). Internet informational RFC 4627, July 2006.
[DR06] T. Dierks and E. Rescorla. The transport layer security (TLS) protocol
version 1.1. RFC 4346, Internet Engineering Task Force, April 2006.
[FL99] Henrik Frystyk Nielsen and Daniel LaLiberte. Editing the web: Detect-
ing the lost update problem using unreserved checkout. World Wide
Web Consortium Note, May 1999.
[GdH06] Joe Gregorio and Bill de Hóra. The atom publishing protocol. Internet
Draft draft-ietf-atompub-protocol-12, December 2006.
[H+ 04a] David Heinemeier Hansson et al. Active record — object-relation map-
ping put on rails. http://ar.rubyonrails.com/, 2004.
Bibliography 75
[H+ 04b] David Heinemeier Hansson et al. Ruby on rails - web development that
doesn’t hurt. http://www.rubyonrails.org/, 2004.
[HB07] Paul Hoffman and Tim Bray. Atom publishing format and protocol
working group charter. http://www.ietf.org/html.charters/atompub-
charter.html, 2007.
[HTBL06] Dave Hollander, Richard Tobin, Tim Bray, and Andrew Layman.
Namespaces in XML 1.0 (second edition). W3C recommendation,
W3C, August 2006. http://www.w3.org/TR/2006/REC-xml-names-
20060816.
[Luo98] Ari Luotonen. Tunneling TCP based protocols through web proxy
servers. Internet Draft, August 1998.
[MLM+ 06] C. Matthew MacKenzie, Ken Laskey, Francis McCabe, Peter F Brown,
and Rebekah Metz. Reference model for service oriented architecture
1.0. Technical report, OASIS, 2006.
[NS05] Mark Nottingham and Robert Sayre. The atom syndication format.
Internet proposed standard RFC 4287, December 2005.
[Pos94] J. Postel. Media type registration procedure. RFC 1590, Internet En-
gineering Task Force, March 1994.
76 Bibliography
[SAPc] SAP AG. Business Object CostCenter Documentation. SAP AG. Avail-
able on mySAP systems through transaction BAPI.
[SAPd] SAP AG. Business Object Employee Documentation. SAP AG. Avail-
able on mySAP systems through transaction BAPI.
[SS83] Dale Skeen and Michael Stonebraker. A formal model of crash recovery
in a distributed system. IEEE Transactions on Software Engineering,
9(3):219–228, May 1983.
[XML98] Extensible Markup Language (XMLTM ), February 1998. XML 1.0, W3C
Recommendation, http://www.w3.org/XML/.