Distributed Transaction Management in SOA-based System Integration

Distributed Transaction Management in SOA-based System Integration
M.Sc. Thesis September 3, 2007 Advisor: Arne John Glenstrup
Xin Yao xin[at]itu.dk
Mikkel Byrs Dan-Rognlie decorus[at]itu.dk
Preface
The end of all our exploring Will be to arrive where we started And know the place for the first time. -- T.S. Eliot (Four Quartets) Finally the end of our journey is in sight. It has been a project with turbo speed and a steep learning curve. The ink was barely dry on the project initiation document when we started to crunch tomes and wade through the minutiae of web service standards. And just as our appetites for theoretical exploration were being whetted, we had to shift gears and convert theories to code. The process was filled with fun and stress, thrill and despair. Ambition and selfconfidence rose and fell like a roller-coaster at Disneyland. While enthusiasts proclaim SOA to be the universal hammer to nail down all system integration problems, skeptics still see web services as over-hyped empty promises. Is SOA-based system integration a real business enabler? Or is it just hot air? The jury is still out. The work in this project was driven by a passionate desire to demystify transactional SOA. It is our belief that a theoretically and architecturally well-founded solution to web service transaction management will bring SOA-based system integration closer to its critical mass. Through this project, we are truly happy to have had the opportunity to explore so many emerging technologies and interdisciplinary theories, in a subject area that is laden with so much uncertainty, and yet so much potential. Despite the razzle-dazzle between service-orientation and object-orientation, despite the numerous mid-summer nights spent on coaxing the distributed state machines to work together, and despite the endless debugging and cursing at dead-locked transaction managers that enjoyed testing our patience, we moved on. It paid off. Strict deadlines constantly pushed us out of our comfort zones and toward the next milestone. Today, we are able to turn in all the deliverables promised in the project agreement. We can push the button to commit the project. We did it! Needless to say, imperfections abound in our project. The process has left us with as many questions as answers. This report documents our journey of discovery, as well as of the end-product. Along the way, we will be candid about the things we havent done as well as we would like, as well as areas for further improvements. Bon voyage!
Acknowledgement
We wish to thank our advisor, Arne John Glenstrup, for his professional navigation and insightful feedback throughout the project. Arne, thank you for continuously challenging us to step back from the multitude of activities and ask why. Our thanks also go to Michael Grosse, Niels Henrik Sodemann and Morten Kvist of Ementor Danmark A/S. Thank you for your confidence in us and for giving us the opportunity to be inspired by your CAP project. A special thank-you goes to Michael Grosse, for spending so much of his time sharing knowledge with us, reviewing our drafts, and helping us clarify our thoughts. We thank Joseph Messing and Hulda Armine, who generously gave their time to proofread our report. Joe, thank you for reviewing our texts with your meticulous attention to details and thoroughness, and for adding to our writing parsimony and elegance. And Hulda, your passion for helping us make the ideas flow with greater accuracy and variation has been most inspiring. Many people have interacted with us, shared their expert knowledge with us or helped us shape and scope the project. Our thanks go to Carsten Butz (ITU), John Gtze, Henning Niss (ITU), Jens Christian Godskesen (ITU), Thomas Hildebrandt (ITU), Jakob Burkard (Capgemini), Jacob Strange (Capgemini), Steen Brahe (Danske Bank), Jakob Bendsen (Lund & Bendsen), Rasmus Lund (Lund & Bendsen) and Hans Lassen (Lund & Bendsen). And finally, we thank our families and friends for their understanding through our long summer of manic work, and for putting up with our frequent absence.
Abstract
In SOA-based system integration, long-running business transactions often involve incompatible trust domains, asynchrony and periods of inactivity, presenting challenges to traditional ACID-style transaction processing. In this thesis, we take an explorative approach to probe the theoretical and implementational feasibility of managing transactions in the web service world. Following the theoretical thread, we propose a mental reference model to adapt existing transaction theories, including the classical ACID model and the extended transaction model with ACID relaxations, to meet the diversified web service transaction requirements, which we summarize as eight design criteria. Following the implementational thread, a service-oriented middleware prototype is developed as a proof-of-concept of the adapted, theoretical model. To attain interoperability, the prototype implements the WSTransaction standards (WS-Coordination, WS-AtomicTransaction, and WSBusinessActivity), providing modules for processing flat atomic transactions, long-running business transactions, as well as composable transaction hierarchies with heterogeneous subtransaction types. State transitions of distributed Participants and Coordinators are synchronized in a strict lock-step fashion through the use of distributed, communicating state machines. Although the implemented prototype requires more testing and extension in order to be industrially deployable, it is architecturally capable of playing the role of a loosely-coupled, pluggable middleware, overlaying heterogeneous legacy systems. In particular, propagation and coordination of state/context information is architecturally decoupled from the underlying application semantics, effectively turning stateless services to stateful Participants.
Contents
1. INTRODUCTION ..............................................................................................5
1.1. PROBLEM DEFINITION............................................................................................................... 6 1.2. METHODOLOGY ......................................................................................................................... 9
1.2.1. COMBINING THEORETICAL AND DOMAIN ANALYSIS ..................................................................... 10 1.2.2. COMBINING INDUCTIVE AND DEDUCTIVE APPROACHES .............................................................. 11 1.2.3. COMBINING ITERATIVE AND WATERFALL PROCESSES................................................................... 12 1.2.4. METHODS FOR PROTOTYPE IMPLEMENTATION ............................................................................. 13 1.2.5. EVALUATION OF METHODOLOGY ................................................................................................... 14
1.3. THE CAP CASE ......................................................................................................................... 15

1.3.1. BACKGROUND .................................................................................................................................... 15 1.3.2. KNOWLEDGE SHARING WITH EMENTOR DANMARK .................................................................... 16 1.3.3. THE ADAPTED CASE........................................................................................................................... 16 1.3.4. SERVICE LANDSCAPE IN THE CAP CASE ......................................................................................... 16 1.3.5. COMPLICATED WORKFLOW IN A COMPOSITE SERVICE .................................................................. 18 1.3.6. MAKING ORCHESTRATED SERVICES TRANSACTIONAL .................................................................. 19
1.4. RELATED WORK ....................................................................................................................... 20

1.4.1. SPECIFICATIONS ................................................................................................................................. 21 1.4.2. ACADEMIC RESEARCH ....................................................................................................................... 22 1.4.3. COMMERCIAL AND OPEN-SOURCE IMPLEMENTATIONS ................................................................ 24
1.5. SCOPE......................................................................................................................................... 25 1.6. ASSUMPTIONS ABOUT THE READER ....................................................................................... 26 1.7. REPORT STRUCTURE ................................................................................................................ 26
2. SOA-BASED SYSTEM INTEGRATION .............................................................. 28

2.1. CONCEPTS ................................................................................................................................. 28 2.2. SYSTEM INTEGRATION APPROACHES .................................................................................... 29
2.2.1. DATA-LEVEL POINT-TO-POINT INTEGRATION .............................................................................. 29 2.2.2. APPLICATION-LEVEL POINT-TO-POINT INTEGRATION ................................................................. 30 2.2.3. PROCESS-LEVEL INTEGRATION ........................................................................................................ 30
2.3. COMMUNICATION PARADIGMS IN SOA ............................................................................... 34 2.4. SUMMARY .................................................................................................................................. 36
3. CLASSICAL TRANSACTION PROCESSING MODELS .......................................... 37

3.1. FAILURE MODELS ..................................................................................................................... 37
3.1.1. LOGICAL FAILURES ............................................................................................................................ 37 3.1.2. OMISSION FAILURES .......................................................................................................................... 37 3.1.3. BYZANTINE AND TIMING FAILURES................................................................................................. 38
3.2. CLASSICAL TRANSACTION MODEL ......................................................................................... 39

3.2.1. THE ACID PROPERTIES .................................................................................................................... 39 3.2.2. SINGLE-MACHINE TRANSACTIONS ................................................................................................... 39 3.2.3. DISTRIBUTED TRANSACTIONS .......................................................................................................... 40 3.2.4. NESTED DISTRIBUTED TRANSACTIONS ........................................................................................... 42
3.3. EXTENDED TRANSACTION MODEL ....................................................................................... 42

3.3.1. RELAXING THE ACID PROPERTIES ................................................................................................. 42 3.3.2. EVALUATION OF THE EXTENDED TRANSACTION MODEL ............................................................ 45
3.4. SUMMARY .................................................................................................................................. 46
Contents
4. WEB SERVICE TRANSACTION PROCESSING MODEL ....................................... 47

4.1. IMPACT OF WEB SERVICES ON TRANSACTION MANAGEMENT ........................................... 47
4.1.1. INTEROPERABILITY............................................................................................................................ 48 4.1.2. STATEFUL TRANSACTIONS VS. STATELESS SERVICES ...................................................................... 49 4.1.3. HETEROGENEOUS TRANSACTIONAL REQUIREMENTS IN SOA .................................................... 51 4.1.4. TIGHTLY-COUPLED AND SHORT-LIVED ATOMIC TRANSACTION .................................................. 51 4.1.5. LOOSELY-COUPLED AND LONG-RUNNING TRANSACTION ........................................................... 53 4.1.6. COMPOSABLE TRANSACTION MODEL .............................................................................................. 57 4.1.7. LEVERAGE TRANSACTION SUPPORT IN LEGACY SYSTEMS ............................................................. 59
4.2. WS-TRANSACTION STANDARD ............................................................................................... 61

4.2.1. WS-COORDINATION ......................................................................................................................... 62 4.2.2. WS-ATOMICTRANSACTION .............................................................................................................. 65 4.2.3. WS-BUSINESSACTIVITY .................................................................................................................... 66 4.2.4. COMMON ASPECTS OF THE STANDARDS .......................................................................................... 67
4.3. A REFERENCE MODEL FOR WEB SERVICE TRANSACTION MANAGEMENT ........................ 68

4.3.1. READING INSTRUCTIONS FOR THE REFERENCE MODEL ............................................................... 68 4.3.2. MEETING THE DESIGN CRITERIA ..................................................................................................... 71
4.4. SUMMARY .................................................................................................................................. 73
5. SOA TRANSACTION MIDDLEWARE PROTOTYPE: ARCHITECTURE ................ 74

5.1. SYSTEM REQUIREMENTS .......................................................................................................... 74 5.2. MIDDLEWARE MODELING ...................................................................................................... 75
5.2.1. LAYERED SYSTEM ARCHITECTURE ................................................................................................... 76 5.2.2. THE PARADOX BETWEEN GENERALITY AND SPECIALIZATION .................................................... 77
5.3. COMMUNICATING STATE MACHINES ..................................................................................... 81 5.4. GENERIC COORDINATION FRAMEWORK .............................................................................. 84 5.5. SERVICE CONTAINER FRAMEWORK....................................................................................... 87
5.5.1. THE SERVICE CONTAINER CONCEPT .............................................................................................. 87 5.5.2. THE SERVICE CONTAINER FRAMEWORK ........................................................................................ 89 5.5.3. INTERVENING BACK-END RESOURCE MANAGEMENT IN WS-AT................................................ 92
5.6. SUMMARY .................................................................................................................................. 94
6. SOA TRANSACTION MIDDLEWARE PROTOTYPE: IMPLEMENTATION ........... 96

6.1. DEVELOPMENT ENVIRONMENT ............................................................................................ 96 6.2. MODELING STRUCTURE .......................................................................................................... 96
6.2.1. OVERVIEW OF VS SOLUTION ........................................................................................................... 97 6.2.2. OVERVIEW OF MIDDLEWARE COMPONENTS AND CLASSES........................................................... 97 6.2.3. ESSENTIAL DATA STRUCTURES, LOOKUP AND CORRELATION .................................................... 101
6.3. MODELING BEHAVIOR .......................................................................................................... 102

6.3.1. MODEL STATE MACHINES WITH STATE, SINGLETON AND FLYWEIGHT PATTERNS ................. 103 6.3.2. MODEL APPLICATION-MIDDLEWARE INTERACTION WITH COMMAND PATTERN .................... 109
6.4. OTHER IMPLEMENTATIONAL ASPECTS ............................................................................... 115

6.4.1. LOGGING .......................................................................................................................................... 115 6.4.2. FAULT HANDLING ........................................................................................................................... 116 6.4.3. INSTANCE MANAGEMENT .............................................................................................................. 116 6.4.4. CONCURRENCY MANAGEMENT ..................................................................................................... 118
6.5. DEPLOYMENT INSTRUCTIONS .............................................................................................. 118

6.5.1. SET UP A DEVELOPMENT ENVIRONMENT ..................................................................................... 119 6.5.2. DEPLOY THE COORDINATION FRAMEWORK AS A STAND-ALONE COMPONENT ....................... 120
6.6. SUMMARY ................................................................................................................................ 120
Contents
7. SOA TRANSACTION MIDDLEWARE PROTOTYPE: TEST AND EVALUATION ...121

7.1. EVALUATION STRATEGY ....................................................................................................... 121
7.1.1. DESIGN FOR TESTABILITY............................................................................................................... 122 7.1.2. PRIORITIZE QUALITY FACTORS....................................................................................................... 123
7.2. TEST STRATEGY...................................................................................................................... 127

7.2.1. STRUCTURAL TEST............................................................................................................................ 127 7.2.2. UNIT TEST ......................................................................................................................................... 128 7.2.3. FUNCTIONAL TEST ........................................................................................................................... 128 7.2.4. OBJECT-ORIENTED STATE-MACHINE SIMULATION ...................................................................... 129 7.2.5. EQUIVALENCE PARTITIONING ....................................................................................................... 130 7.2.6. CONTINUOUS INTEGRATION .......................................................................................................... 131
7.3. GENERAL APPROACH TO TEST SETUP ................................................................................. 132 7.4. ATOMIC TRANSACTION TEST ................................................................................................ 134
7.4.1. GENERAL MESSAGE FLOW .............................................................................................................. 134 7.4.2. TEST CASE EXAMPLES ...................................................................................................................... 135
7.5. BUSINESS ACTIVITY TEST....................................................................................................... 137

7.6. INTERPOSED COORDINATOR TEST ...................................................................................... 139

7.7. STRESS TESTING...................................................................................................................... 142 7.8. FEEDBACK FROM EMENTOR DANMARK ............................................................................ 144 7.9. SUMMARY: OVERALL QUALITY EVALUATION..................................................................... 144
8. CONCLUSION ................................................................................................147 9. REFLECTIONS...............................................................................................150 10.FUTURE WORK ..............................................................................................151 11. REFERENCES ................................................................................................152 12.APPENDICES .................................................................................................157
A. B. C. D. E. LIST OF ACRONYMS............................................................................................................................... 157 STATE MACHINE DIAGRAMS: WS-AT................................................................................................. 159 STATE MACHINE DIAGRAMS: WS-BA................................................................................................. 160 EXAMPLE OF PEER REVIEW: A WALK-THROUGH OF WS-AT ABORT SCENARIO ........................... 161 SOURCE CODE ....................................................................................................................................... 162
Figures
Figure 1: Overall research strategy: multi-method triangulation...................................................... 9 Figure 2: Web service landscape in the CAP case............................................................................ 17 Figure 3: A zoom-in on the eligibility evaluation service workflow .......................................... 18 Figure 4: ESB architecture - a simplied view .................................................................................... 33 Figure 5: Integrating transaction domains in SOA .......................................................................... 48 Figure 6: Atomic web service transaction (flat)................................................................................ 52 Figure 7: Atomic web service transaction (nested) .......................................................................... 52 Figure 8: Compensation-based business transaction in SOA......................................................... 55 Figure 9: A coordination hierarchy with embedded transaction models ...................................... 57
Contents
Figure 10: WS-Transaction services and protocols.......................................................................... 61 Figure 11: WS-Coordination communication scenario ................................................................... 63 Figure 12: A reference model for SOA transaction management ................................................. 69 Figure 13: Middleware layer on the network communication protocol stack.............................. 76 Figure 14: Interceptor: bridging the generic coordination service and specific protocols ......... 78 Figure 15: The use of the interceptor principle for context management.................................... 79 Figure 16: Communicating state machines for WS-AT Completion Protocol ............................ 83 Figure 17: Architecture - Coordination Framework........................................................................ 85 Figure 18: Java EE Container & Service Container......................................................................... 88 Figure 19: Architecture - Service Container Framework ................................................................ 90 Figure 20: Intervening back-end resource management in WS-AT .............................................. 92 Figure 21: Visual Studio solution and projects................................................................................. 97 Figure 22: Overview of the Service Container Framework and Coordination Framework....... 98 Figure 23: Essential data structures for lookup and correlation .................................................. 102 Figure 24: Simplified class diagram of the State pattern implementation................................... 103 Figure 25: Mapping from state table to state object ...................................................................... 106 Figure 26: A sample scenario of the communicating state machines.......................................... 108 Figure 27: Modeling the interaction between application and middleware layer....................... 110 Figure 28: Top-down decomposition of quality factors................................................................ 125 Figure 29: Source Monitor metrics: a selected subset.................................................................... 126 Figure 30: Equivalence partitioning of protocol test input space ................................................ 130 Figure 31: Atomic transaction: generic test setup .......................................................................... 134 Figure 32: Business activity: generic test setup...............................................................................137 Figure 33: Interposed Coordinator: generic setup ......................................................................... 140
Tables
Table 1: Countermeasures as alternative reliability guarantees to ACID...................................... 57 Table 2: Motivation for using a composable transaction model.................................................... 58 Table 3: Summary: design criteria for transaction management in SOA...................................... 72 Table 4: The Correctness criteria ....................................................................................................... 75 Table 5: Prioritized software quality factors ................................................................................... 123 Table 6: Test case AT-1: Commit .................................................................................................... 135 Table 7: Test case AT-2: Phase2Abort ............................................................................................ 136 Table 8: Test case BA-1: Close ......................................................................................................... 138 Table 9: Test case BA-2: Compensate............................................................................................. 139 Table 10: Test Case IC-1: Interposed Coordinator Close............................................................. 141 Table 11: Test Case IC-2: Interposed Coordinator Compensate................................................. 142 Table 12: Summarized evaluation of quality factors...................................................................... 145
1. Introduction
1. Introduction
Whether you are a SOA buff or SOA skeptic, service oriented architecture (SOA) has swept into our collective consciousness. The past decade has seen rising demand for supply chain integration both inside and across national boundaries, stressing the need for resolving the tower of Babel between different enterprise system infrastructures. In recent years, web services have generated much enthusiasm because of their ability to bridge disparate enterprise systems. As such, SOA stands out as a strong competitor to traditional integration solutions like EAI (Enterprise Application Integration) and ERP (Enterprise Resource Planning), and is proclaimed by many to be the panacea to companies system integration headaches. While the clean-slate ERP solution requires scrapping all legacy systems and replacing them with a new and comprehensive ERP system, SOA adopts a softer and incremental approach to system integration. Instead of being deconstructed, legacy functionalities are wrapped into web services with an outbound interface speaking an XML-based Esperanto. While the hub-and-spoke EAI solutions are largely proprietary, SOA adopts a standard-based integration approach. SOA injects interoperability between disparate software components by standardizing data representation (XML), business logic definition (WSDL), and message exchange (SOAP), as well as the management of cross-cutting concerns such as addressing, security, and transport, to name just a few. We have no interest in participating in the protracted arm twisting between SOA, ERP, and EAI, or any other system integration alternative. Every technology has its strengths and limitations all depending on the problem it is used to solve. SOA-enabled interoperability is an unrivalled choice when we do not have the luxury of starting from scratch and forcing everyone to abandon their previous investments in the legacy systems infrastructure, training, etc., and when an open standard-based solution is preferred to a proprietary solution. When SOA is used to encapsulate and integrate cross-enterprise workflows, it is conceivable that multiple web services may take part in transactions spanning system boundaries. Creating robust integration solutions requires more than simply exchanging messages. Attaining global agreement is additionally dependent on the ability of web services to provide interoperable
1. Introduction
transaction mechanisms despite partial failures experienced in individual services. The transaction problem per se is a classical distributed system problem, since any distributed transaction needs a mechanism to reach global agreement in the face of partial failures. So, is the answer to web service transactions what we have been using for distributed transaction processing over the past two decades, but this time by applying XML and SOAP rather than the traditional platform-dependent message passing? If so, what makes the design of transaction management at the web service level such a thorny issue that standard solutions to web service transaction management are conspicuously absent from packaged SOA and ESB (Enterprise Service Bus) solutions? This thesis is motivated by our curiosity for exploring the feasibility of porting traditional transaction processing theories and models into the web service world. In particular, how is SOA different from other types of distributed systems, and how do these differences influence transaction management in SOA-based system integration solutions? SOA provides a loosely-coupled, stateless integration fabric, while transaction management inevitably requires dissemination of service execution state. How do we resolve the tension between stateless services and stateful transactions? Many transactions in SOA are long-running and of a nested nature, sometimes involving periods of inactivity. How do web services reach outcome agreement and attain the goal of failure atomicity in settings of asynchrony and disconnection?
1.1. Problem definition

With these questions in mind, we arrive at the following problem definition: How can classical distributed transactional processing models be adapted to meet transaction management requirements in web service enabled system integration and what are the key challenges in translating the adapted, theoretical model into a service-oriented middleware solution whose architecture is aligned with the best-practice design principles? As it stands, the problem definition has a theoretical focus as well as an architectural focus. This duality reflects our perception of web service transaction management as a interdisciplinary field.
1. Introduction
The theoretical focus
Service-oriented systems do not contribute new computing capabilities - web services run on existing computers, execute the same set of instructions and access the same data. In this sense, a service-oriented system is just one type of distributed system, and is subject to basically the same challenges as in any other distributed system. In our view, managing transactions in the web service world should not be treated as a problem in a web service vacuum. In order not to make the search for a web service transaction management strategy a disjointed and duplicative endeavor, we can and should borrow theoretical insight from tried-and-true transaction processing models in traditional, preSOA execution environments. That being said, web service transactions are in many aspects more demanding than transactions in traditional homogeneous environments. Many web service transactions involve long-running workflows spanning heterogeneous system domains, asynchrony, and periods of inactivity as a consequence of user interaction or mobility issues. Classical transaction models must therefore be adapted and extended and the strict ACID-rules need to be bent in certain cases in order to tackle SOA-specific transaction challenges. The other side of the coin is that web service transaction management can also be less demanding, because the loosely coupled nature of individual subsystems in SOA usually isolates one subsystems resource space from anothers. When the assumption of the absence of cross-boundary resource usage holds, it will in turn imply the absence of distributed resource locking as well as the need for distributed deadlock detection in a service-oriented architecture. How to bridge the gap between classical transaction models and web service transactional needs is the theoretical focal point of this thesis. Through theoretical analysis and comparison we aim at spelling out what characterizes web service transactions and makes them different from traditional ACID-style transactions. The product we have in mind for the theoretical analysis is an adapted model for web service transaction management that harnesses the power of the classical models but shreds the weaknesses and limitations of these models by incorporating SOA-compatible solution alternatives.
The architectural focus
Going from theoretical principles to practice may require substantial efforts. The past has shown plenty of examples of implementing viable theoretical
1. Introduction
models in poorly designed architecture, resulting in incomprehensible, hard to maintain, and unreusable software solutions. In addition, transforming a theoretical model to a SOA middleware requires the middleware solution to be an integral part of a service-oriented architecture. In particular, the middleware should comply with the fundamental service-oriented design principles, such as statelessness and loose coupling. Web service transactions need to be coordinated which, in turn, necessitates the propagation of web service execution states. Hence, there is a paradox between web services being stateless and the global transactions being intrinsically stateful, potentially leading to tight coupling. So, how do we design a web service transaction middleware in an architecturally elegant way so that stateful behavior can be added to stateless services in a non-intrusive manner? How do we design a flexible transaction management framework with context dependencies upon the underlying enterprise transaction processing systems without introducing unnecessary tight coupling? How do we fit the middleware architecture into an existing SOA system without manifesting an overly complex system? These concerns are the architectural focal points of our thesis. The product we have in mind for the architectural analysis is a web service transaction management prototype that translates the adapted theoretical model to a comprehensible, maintainable, and reusable middleware, whose architecture is aligned with the best-practice service-oriented design principles. Ideally, the prototype should play the role of a pluggable, loosely-coupled communication middleware overlaying legacy systems with heterogeneous transaction domains. In particular, propagation and coordination of state/context information should be decoupled from the underlying application semantics. The prototype implementation in this thesis is intended to serve a dual proofof-concept purpose. First of all, it can help us evaluate the applicability of the adapted, theoretical model for web service transaction management. In addition, it is an explorative approach to test the feasibility of aligning theory and practice for SOA transaction management in an architecturally coherent solution. We dont expect to develop a one-size-fits-all solution but we do strive to infuse as much generality into the software as possible.
1. Introduction
1.2. Methodology
Having presented the problem definition, we will in this section outline the project methodology. Figure 1 summarizes our overall research strategy multimethod triangulation. Triangulation as a method of measurement dates as far back as the European Renaissance period. Sir Walter Raleigh (1552-1618), the famous English writer and explorer, who was sent by Queen Elizabeth to survey and map the first English colony in the new world, used the method of triangulation, with the help of a plain theodolite, to produce highly accurate maps [Moran, 1990]. In addition to cartography, triangulation is also applied in navigation and military domains, where multiple reference points are used to determine an objects exact position. As a scientific research method, triangulation was first introduced by Denzin [1978], as the combination of methodologies in the study of the same phenomenon. Jick [1979] argues that given basic principles of geometry, multiple viewpoints allow for greater accuracy. Method triangulation is often used by social scientists to improve the accuracy of their judgments by collecting both qualitative and quantitative data. In recent years, it has also found its way into interdisciplinary research fields [Andersen, 2003].
Figure 1: Overall research strategy: multi-method triangulation
1. Introduction
As stated in the problem definition, the subject matter of our thesis is multidimensional. As our prior knowledge of and experience in these dimensions (SOA, system integration, middleware design, and middleware architecture) is limited, the project objective is explorative in nature rather than being a definitive solution. A triangulation of multiple research methods can systematically guide us to analyze the problem from different angles and to consciously navigate the linkages between these angles, and consequently can enhance the validity of our conclusions. The methods and their linkages are depicted in Figure 1. The first triangulation takes place between the theoretical analysis and the domain analysis (Section 1.2.1), while a second triangulation involves the use of the inductive and deductive methods (Section 1.2.2). Yet another step in our triangulation is the combination of the waterfall and iterative methods (Section 1.2.3), although it is not explicitly captured by Figure 1. All three triangulations links lead to a synthesis underpinning the final conclusion. 1.2.1. Combining theoretical and domain analysis The first phase of triangulation takes place between the theoretical analysis and the domain analysis. The theoretical focus of our problem definition requires that we conduct a interdisciplinary literature screening and comparative analysis, looking at the strengths and limitations of the classical transaction models, and match these against the special transaction processing requirements intrinsic to SOA. As we judge a purely theoretical discussion of the transaction concept as being too abstract and less participatory in the overall fulfillment of the thesis objective, we have decided to use a real case, i.e. a particular application domain of web service transaction management, as a running example. The example is intended to render the abstract theoretical analysis concrete and thereby more comprehensible. Choosing an application domain sets a tangible stage that helps us think and communicate in a more focused and concrete fashion. As is shown in Figure 1, we start with the problem definition, and attempt to solve the problem from a theoretical angle as well as through case-oriented prototype implementation. The arrow marked theoretical analysis is bidirectional, reflecting the process in which theories provide us with better understanding of the problem, which, in turn, motivates a deeper and more focused theoretical exploration. Likewise, the arrow marked domain analysis is bidirectional, reflecting the dialectal relationship between theoretical perception of the problem area and first-hand coding experience. Our coding experience is guided by our abstract reasoning of how the problem should be
10
1. Introduction
solved. Meanwhile, fleshing out the theoretical and architectural designs has also contributed to a better understanding of the problem area. The case is taken from a public sector domain. As service-oriented integration of IT systems in the public sector has gained relatively wide adoption in Denmark, there is pent-up demand for the governmental organizations IT systems to handle cross-system transactions. We have chosen to collaborate with a Danish IT consulting and solution provider, Ementor Danmark, a key industrial provider of SOA-based solutions for integrating public sector systems. Ementor Danmark has made a series of successful deliveries of SOA projects in recent years. The selected case is derived from one of Ementor Danmarks ongoing SOA-based system integration projects Common Agricultural Policy (CAP). In order not to disrupt the main theme of this section, we defer a further description of the CAP project to Section 1.3. 1.2.2. Combining inductive and deductive approaches To untangle the knot of the selected case being specific while the thesis target is a generalized solution to web service transaction management, we aim at a convergence between the deductive method and the inductive method. [Holmberg, 1987]. The deductive method uses the general to reason about the more specific. We have employed the deductive method to set up the transactional scenarios covered by the case. We have not sought to address Ementor Danmarks entire CAP project. Instead, we have analyzed a set of use cases against the general theoretical transaction scenarios in the web service world and settled on a representative subset of the many scenarios and use cases in the CAP project. By constraining the scenarios in the case to a limited and representative few, we hope to achieve a broad coverage of different transaction types and transaction management scenarios in SOA. Downsizing the case into a manageable scope also helps prevent the domain-specific details from cluttering the more general discussion. Ementor Danmark has confirmed that the set-up of the case is capable of covering some of the major transactional challenges in the CAP project. The inductive method seeks to use empirical data or experience to prove the validity of general principles or theoretical models. We use the inductive method to extrapolate the solution from the CAP domain to other application domains that exhibit analogous transaction management requirements. To derive a general proof-of-concept from an application domain specific case, we
11
1. Introduction
have adopted the following two approaches. First, the case is set up to cover the general transaction scenarios, in accordance with the deductive method. Second, we strive to solve the case with a general, component-based prototype design so that the degree of pluggability and adaptability of the prototype implementation can be increased. In this way, we hope to infuse both semantic and implementational generality into a specific case, and as such arrive at a general conclusion from a specific case-oriented endeavor. 1.2.3. Combining iterative and waterfall processes The explorative nature of our thesis makes it inappropriate to use a waterfall approach, since our knowledge of both the application domain and the theory domain is relatively limited at the inception of the project. On the other hand, the projects short time span does not justify many full-cycle iterations either. As a compromise, we have settled for using a variant of the rational unified process, because we regard it as a reasonable combination of the waterfall elements (predictive planning) and the iterative elements (adaptive planning) [Fowler, 2003, p. 23].
Rational unified process
A Rational Unified Process (RUP) usually consists of four phases. Although RUP is essentially an iterative process, the first two phases (inception and elaboration) have in our project basically progressed in a water-fall like manner, the construction phase has been highly iterative and adaptive, and the final transition phase has been linear. The inception phase is an initial evaluation of the project. This initial phase lasted roughly two months in our project, during which we did literature screening, theoretical analysis, domain analysis and selection of representative scenarios for the case. The major product of this phase is a reference model for web service transaction management, which is the theme of Chapter 4. In the elaboration phase, the overarching architecture for the transaction management middleware is determined. This phase lasted roughly 1 month in our project and partly overlapped the theoretical and domain analysis in the inception phase. The overall architectural design sets a rough but predictable scope for the prototype implementation. In principle, the architectural design remained unmodified in the following phases. By narrowing down the use cases in the inception phase and settling for the overall architecture in the elaboration phase, the projects predictable scope results in a reduced
12
1. Introduction
requirement churn [Fowler, 2004, p.23]. The product of this phase is a documented architectural design of the middleware prototype, which is the theme of Chapter 5. In the construction phase, the software design of different components in the architecture is worked out and functionalities are implemented. This phase lasted roughly 2 months, in which we primarily followed the overall architecture determined in the elaboration phase. However, the logical and physical design of each subsystem has undergone small adaptive iterations and component designs have been reworked, refactored, and improved with the help of better design patterns. The product of this phase is the implementation and test of the prototype, which is the theme of Chapter 6. The transition phase usually contains deployment and user training, which is largely irrelevant for us. In the final phase of our project, we conducted a systematic evaluation of the prototype implementation, which is the theme of Chapter 7. Presentations have been made for the Ementor Danmark and their feedback has been taken into account in the overall evaluation. The product of this phase is the thesis report. 1.2.4. Methods for prototype implementation We have used the object-oriented paradigm to guide our prototype design and implementation. Object-oriented design provides not only material coherence to the system structure but also mental coherence for us as designers and developers. We will not elaborate upon other advantages of using the objectoriented methods but refer to [Mathiassen, 2000] and [Fowler, 2003]. Throughout the project, we have used UML as a documentation guideline to communicate major aspects of the component and system design. We have not pursued the notational stringency of UML, but employed UML as a sketching and communication tool. We have chosen Microsoft Windows Communication Foundation, henceforth abbreviated as WCF, as the web service implementation platform. WCF is designed to unify a broad array of .NET technologies (web services, .NET Remoting, Message Queue, etc.) into a single and simpler service-oriented programming model. As we have no previous exposure to large-scale serviceoriented middleware development, we have leveraged the simple and unified programming model in WCF to shorten our learning curve and achieve rapid application development.
13
1. Introduction
1.2.5. Evaluation of methodology In this section, we briefly reflect upon how the selection of the project methodology, and consequently the de-selection of other methods, affects the quality of the solution we arrive at. Every scientific project needs some form of empirical data to support and supplement the theoretical analysis. Our thesis is no exception. Empirical data comes in all sorts and flavors. We have chosen a prototype implementation as a proof-of-concept anchoring of the theoretical analysis. Thus, our hands-on experience in the implementation process and the knowledge sharing with the case company becomes the empirical data in our project, also called primary data [Andersen, 2003]. An alternative data acquisition method is resorting to secondary data, for instance by collecting data from a number of market leading system integrators and comparing their empirical experiences in web service transaction management. In this case, data collection methods could range from clarifying interviews to the examination of code and software architecture. Statistical and analytical observations could be made of such data. By excluding the collection of secondary data and focusing on an implementation-oriented approach and knowledge sharing with one industrial player Ementor Danmark -- we run the risk of our conclusions being opinionated and less representative. Inevitably, the conclusion will be colored by our own implementational experience and the specific application domain of the selected case. Another weakness of our project methodology stems from our treatment of the architectural design of the transaction middleware as a waterfall element in the elaboration phase of RUP. We take a snapshot of the architectural design and freeze this design in the iterative, construction phase. Even though we regard this as a good approach to fast prototyping in a time-boxed setting, it still constrains the architecture of the final product to our knowledge level and insight at the time when the architectural design is determined. Therefore, a sober evaluation of the architectural design as compared to the predefined criteria has been a serious component in the final phase (see Chapter 7). One could argue that a single-case study would not be as convincing as a multicase study. However, as our goal is to derive stereotypical transaction scenarios from the CAP case rather than solve the case as is, we consider a single-case design as sufficient, because a multiple-case alternative would likely provide equally strong, but not stronger support for the conclusion. As Yin [2002] points out, a rationale for selecting a single-case rather than a multiple-case design is when the single case represents a typical project among many different
14
1. Introduction
projects. Resources used in a multi-case study in which similar efforts are repeated for every case could, instead, be committed to an in-depth treatment of one single case with larger net gain, since the lessons learned from the representative, single case is assumed to be informative about the experiences of the average person or situation [Yin, 2002, p.41].
1.3. The CAP case

In this section, we will briefly introduce the CAP case that we use as a running example in this project. 1.3.1. Background As mentioned in Section 1.2.1, the case is taken from a public-sector system integration domain. We shorthand it as the CAP case since it is a categorical subset of Ementor Danmarks ongoing project CAP. CAP (Common Agricultural Policy) is a system used in connection with the European Unions agricultural subsidies and programs. The purpose of CAP is to guarantee a minimum price to farmers by compensating for reduced production with EU subsidies. The scheduled spending is 55 billion in 2007 1 . The Danish Directorate for Food, Fisheries and Agricultural Business (DFFE) has been delegated the responsibility of handling subsidy applications from Danish farmers in accordance with CAP policies. In 2006, Ementor Danmark won the bid for implementing a grant administration system for DFFE with a budget of 50 million DKK. Comprehensive subsidy administration usually involves orchestration of business logic from multiple existing legacy systems, such as the ESDH 2 (Electronic Case and Document Handling) system, the accounting system, etc. In Ementor Danmarks CAP project for DFFE, service-oriented architecture is adopted to tackle this system integration challenge, with the integration fabric implemented on a Microsoft BizTalk Server.
1 Source: http://europa.eu/pol/agr/index_en.htm. (The policy portal of the Eurponean Union). 2 ESDH is used as a de facto shorthand in Denmark for Electronic Case and Document Handling systems. The Danish term for case is sag, which explains why the acronym is ESDH rather than ECDH.
15
1. Introduction
1.3.2. Knowledge sharing with Ementor Danmark The purpose of our project is not to deliver a solution module to Ementor Danmark. Rather, our collaboration with Ementor Danmark has progressed as a mutually beneficial knowledge sharing process. At the inception of our project, Ementor Danmark was wrestling with the needs of introducing distributed long-running transactions into their SOA project, but did not have the resources to perform a thorough theoretical and technological feasibility study. Ementor Danmark expressed interest in cooperating with us so that they could hear a second opinion and be presented with academic reasoning regarding web service transaction management. From our perspective, knowledge sharing with Ementor Danmark has compensated for our lack of experience in middleware implementation and our limited exposure to real-life SOA projects. In our bimonthly meetings with Ementor Danmark, we have had the opportunity to present the work in progress and receive feedback as well as knowledge pointers to improve our work. In many ways we have exploited these knowledge-sharing meetings as a proof-of-concept for the intermediate results of our project, both theoretical and implementational. Supplementing the theoretical analysis and architectural design with a concrete case also makes it more manageable for us to design, implement and test the middleware prototype. Another advantage of using a real-life case is that it gives the end product more industrial relevance and makes it easier for the product to be adapted and applied to similar domains. 1.3.3. The adapted case
As described in Section 1.2.2, we have generalized Ementor Danmarks many use cases by combining the inductive method (from specific cases to general theory) and the deductive method (from general theory to specific cases). The result is a condensed and stereotypical case that captures the major theoretical categories of web service transaction management while still sticking to the CAP domain. The adapted case is illustrated in Figure 2. 1.3.4. Service landscape in the CAP case Figure 2 contains four orchestrated services, subsidy application, case registration, eligibility evaluation, and account payable. These services are orchestrated in the sense that they each invoke one or more distributed component services. At the top-level, the subsidy application service performs the whole process of
16
1. Introduction
handling a new subsidy application, delegating a part of the business logic to three component services case registration, eligibility evaluation and account payable. As a first step, a new case must be established in the ESDH system (represented by the case registration service), and the eligibility of the applicant must be evaluated in another system (represented by the eligibility evaluation service). The case registration and eligibility evaluation services can be executed in parallel. Successful execution of them both constitutes the precondition for invoking the third component service - account payable. Account payable transfers the granted subsidy to the applicants bank account.
business activity atomic transaction (nested) 5. applicant registry service 2. case registration service 6. document handling service business activity
replica1
7. fund reservation service 0. client service 1. subsidy application service 3. eligibility evaluation service
replica2
8. sampling & control service
atomic transaction (flat) 9. fund transfer service 4. account payable service 10. general ledger service
[precondition:case registered and evaluated]
Figure 2: Web service landscape in the CAP case
In a similar fashion, the three middle-tier component services (case registration, eligibility evaluation and account payable) are each orchestrated from two component services. For instance, the case registration service invokes document handling service to create a new document and the applicant registry service to make a new entry in the applicant registry database.
17
1. Introduction
Services 5 through 10 are leaf services as they do not entail invocation of other services. Leaf services are usually developed and deployed on stand-alone application servers. Services 1, 2, 3, and 4 are composite or orchestrated services, as they invoke other composite services or leaf services as part of the business logic. As will be explained in Chapter 2, composability is one of the most important architectural principles in SOA, much in line with the composite design pattern mentioned in [GoF, 1995, p.163]. We assume that the component services to be integrated run on distributed application servers, employing divergent technologies. 1.3.5. Complicated workflow in a composite service Especially worth noting is the eligibility evaluation service that encapsulates complicated business logic, involving both looping and human interactions. In Figure 3, we have fleshed out the somewhat convoluted workflow. Making such a workflow transactional is not a trivial task. The eligibility evaluation service entails iterative invocations of the two component services, fund reservation and sampling and control.
Figure 3: A zoom-in on the eligibility evaluation service workflow
In each iteration, the fund reservation service is called to reserve a calculated sum in a fund pool. Meanwhile, the sampling and control service is invoked in parallel, whose basic function is to conduct manual controls of selected samples. All application cases are targets for sampling but only a small subset will satisfy the
18
1. Introduction
sample selection criteria, which can be purely statistical according to the applicants postal code, track records, etc. If a case is selected as a sample for manual control, it is placed in the task tray of a controlling officer. Should the controller find sufficient evidence to support reasonable doubt, e.g. when a farmer and his wife both apply for subsidies for the same acres of land, the controller will attempt to clarify the case by communicating with the applicant by telephone, e-mail or via the traditional mail system. The communication process can in principle take days, weeks or even months. If the case is rejected, the eligibility evaluation service is terminated immediately and a written notice of rejection is sent to the applicant. However, if a case is approved or not sampled, it is included as a candidate in the next iteration, which is triggered by a recalculation timer in the fund pool recalculation application. Since each grant or rejection has bearing upon the remaining funds available for other applicants, and since the total set of applicants in the system is a dynamic entity, recalculation of reservable funds is done monthly or bi-monthly, triggered by the recalculation timer. This cycle of recalculation, fund reservation, sampling and control repeats itself in each iteration until the application handling deadline has been reached, or until the sampling and control service has rejected the application. 1.3.6. Making orchestrated services transactional It is desirable that each of the four orchestrated services be executed as a transaction, albeit with different transactional requirements. The account payable service should be performed as a flat distributed atomic transaction, in which the invocation of the fund transfer service is followed by that of the general ledger service, which updates the general ledger database with the actual fund transfer amount. The three services, account payable, fund transfer, and general ledger, are relatively closely integrated trust domains. It is an essential requirement that both subservices succeed. If one fails the effect of the other should be undone. The case registration service should also be performed as an atomic transaction. However, the execution order of the two component services is inconsequential and parallel execution is perfectly acceptable. Furthermore, if either of the two subservices (document handling and applicant registry) fails, the parent case registration service can attempt a retry or simply commit the overall transaction, ignoring the partial failure, in accordance with pre-defined business rules.
19
1. Introduction
The eligibility evaluation service is a long-running business activity, involving human interactions leading to long response latencies, and periods of inactivity. Placing such an activity in an atomic transaction scope would have devastating performance impacts and tie up resources for unacceptably long durations. Furthermore, the effects of intermediate fund reservations must be made immediately visible for the calculation service to perform the recalculation, which is another argument against the use of atomic transaction, in which the intermittent results are only visible after the global commit. Nevertheless, it is still desirable that the global net outcome of the eligibility evaluation service is consistent and atomic. Ideally, an established fund reservation should be compensatable by performing an undo/compensating operation. The top-level subsidy application service should also be made transactional. The challenge here is that each of the three component services has a different transaction type. In order to make the subsidy application service transactional with atomic and consistent outcomes, we need a coordination mechanism capable of accommodating disparate subtransaction types. Devising an orchestrated service transaction also necessitates propagating the transaction scope across all component services that could be running on disparate platforms. In SOA, many composite services have one or more of the characteristics that have been illustrated above: long-running, involving user interaction and likely periods of inactivity, having long message latencies, and spanning multiple systems/organizations. To transaction-enable longrunning business activities such as the eligibility evaluation service can be a daunting task and is one of the major challenges in web service transaction management. Furthermore, when decisions to commit, rollback or ignore the effects of transaction branches hinge on business rules, we need to interweave the transaction protocol logic with the business logic in an elegant way, i.e. without compromising the loosely coupled nature of web services. Later chapters provide systematic analysis of all these challenges.
1.4. Related work

Web service orchestration and composition, business process modeling and web service transaction has gained much interest in recent years. Several specifications dealing with these topics are under development in the software
20
1. Introduction
industry as well as in university research. A number of vendors already have implementations of a subset of the specifications. In the following we will provide a short survey of related work in relation to three aspects: web service specifications, academic research and commercial/open-source implementations. As the purpose of this thesis is not an in-depth survey we will only mention a limited number of the initiatives. 1.4.1. Specifications The number of web service specifications has exploded the last 5-6 years. These specifications range from web service fundamentals like WDSL and SOAP to the more advanced collection of WS-* specifications, which address more intricate cross-cutting concerns, such as security, federation, addressing, etc., in a web service environment. In the following we will focus on the transaction related specifications. Web Services Transaction (WS-Transaction), a.k.a. WS-TX, is a family of specifications including: WS-Coordination (WS-COOR) that defines a generic coordination framework supporting a number of coordination types including WS-AtomicTransaction (WS-AT) and WS-BusinessActivity (WS-BA). In the remainder of the report, we will sometimes use the WS-TX abbreviation to refer to all three WS-Transaction specifications: WS-COOR, WS-AT and WSBA. WS-AT defines a protocol (two-phase commit) for executing short-lived ACID-style transactions in a web service environment. WS-BusinessActivity defines a protocol for long-running business activities using a compensational approach instead of resource locking. The WS-TX family of specifications was developed by a joint effort between IBM, BEA, and Microsoft, through the standardization organization OASIS 1 . These specifications are released in August 2002 and approved as an official OASIS standard April 16, 2007. In Section 4.2 we will provide a more thorough introduction to WS-TX. WS-Composite Application Framework, a.k.a. WS-CAF, is a family of specifications including the following specifications: WS-Context (WS-CTX) that defines a generic context management framework, WSCoordinationFramework (WS-CF) that defines a pluggable coordination framework supporting WS-AT and WS-BA, and three protocols in WS1
OASIS (Organization for the Advancement of Structured Information Standards) is a notfor-profit consortium that drives the development, convergence and adoption of open standards for the global information society. (source: http://www.oasis-open.org)
21
1. Introduction
TransactionManagement. WS-TransactionManagement (WS-TXM) defines a two-phase commit protocol for web service interoperability with an ACID approach, a compensation-based protocol for long-running transactions, and a business process management protocol. The WS-CAF family of specifications was developed by IONA, Arjuna, Oracle, Fujitsu, and Sun. WS-TX and WSCAF have significant overlap and proponents of them have suggested convergence on a single standard. Both specifications cover the concept of generic coordination and a pluggable entity that can drive different protocols such as two-phase commit, compensation, and business process transactions. Participating web services in a transaction register with the Coordinator and specify the protocol type so that the Coordinator can drive the selected protocol for the web services that have registered for it. Another competing specification is the OASIS Business Transaction Protocol, a.k.a. BTP. BTP uses an extended transaction model for long-running tasks called cohesions. Little [2003-1] gives a comparative analysis of web service transaction protocols. The analysis identifies the similarities, differences, strengths and weaknesses of BTP and WS-TX. 1.4.2. Academic research In the academia, research has been made into web services composition and related technologies, using different approaches. These approaches range from optimization of existing transaction models and specifications to proof-ofconcept implementations. Some of the latest research addresses the challenge that the traditional ACID-style transactions are too strict and inappropriate for long-running business activities. Nor are ACID transactions applicable if participating services do not share a common trust domain. Proposals have been made on how to relax the ACID properties to better suit scenarios like these ([Bernstein 1997, Frank 2006, Kaye 2003, Little 2003-1]. In Section 3.3 we outline a number of possible ACID relaxations, including the use of compensating actions as an alternative to resource locking as in the two-phase commit protocol. Zhao [2005] criticizes the use of compensating transactions and suggests the use of a reservation-based transaction protocol to coordinate business activities, which involve long-running transactions in a loosely-coupled distributed environment. In the reservation-based protocol, an application has full control over the reservation activity, as well as over how long the resource should be reserved. This is unlike traditional resource locking, where the duration is typically internal to the resource, e.g. a database system. The
22
1. Introduction
resource is locked by the database system; the application has no control over how long the resource should be locked and thus has to wait for a timeout. The reservation-based protocol comprises two steps. The first step involves an exclusive blocking reservation of the resource. The second step involves a confirmation or cancellation of the reservation. Associated with each reservation is a fee, which is proportional to the duration of the reservation. Zhao [2005] describes how the reservation-based protocol can be implemented as a coordination mechanism on top of WS-AT or WS-BA. Roberts [2001] suggests a Tentative Hold Protocol (THP). THP is an open, message-oriented and loosely coupled protocol. This protocol is designed for businesses to exchange information, prior to their participation in the actual transaction. THP allows multiple clients to tentatively hold the same resource. If one client submits a request for a resource, e.g. booking a ticket, a Resource Coordinator (RC) will tentatively hold that resource and send notification to the client. If one of the other clients consumes the resource by executing the actual business transaction, the RC will notify the remaining clients that their tentative holds on the resource are no longer valid. An advantage of THP is that it avoids resource blocking as in the two-phase commit protocol. It can also be adapted to run in conjunction with the two-phase commit protocol or a compensation-based protocol, with the benefit of minimizing the number of rollbacks and compensating actions. In addition, THP aims to optimize transaction throughput and resource consumption, albeit on the expenses of increased message exchanges and extra processing time in executing the tentative hold algorithm. The performance aspect of web service transactions is addressed by Younas [2006]. This paper maintains that the majority of current approaches to web service transactions do not give sufficient attention to performance, especially when web service composition is involved. A Tentative Commit Protocol (TCP) is designed with a view to attaining improved performance by reducing network message delay and transaction processing time in composite web services. TCP is based on the concept of tentative commit that allows transactions to perform a tentative commit on shared data, and thereby avoiding resource blocking and improving performance. Younas [2006] documents experiments conducted to evaluate the TCP protocol in comparison with THP-based protocols. The results from these experiments show that TCP outperforms existing THP-based protocols. Erven [2007] sustains that the interface between a transaction initiator and the Coordinator is left undefined in the WS-BusinessActivity specification. This
23
1. Introduction
gives vendors free choices to integrate WS-BA Coordinators into their business process engines. By doing so, however, WS-BA allows for the undesirable use of proprietary protocols between the initiator and Coordinator. Erven [2007] proposes an extension protocol to the WS-BA specification. The extension is named the WebServices-BusinessActivityInitiator (WS-BAI) Protocol, which explicitly defines the interface between the initiator and Coordinator. This allows Coordinators and initiators to interoperate transparently and allows for the possibility to use a commonly trusted, third-party coordination service to guarantee correct coordination. In our prototype we have made a similar endeavor to explicitly define the interface between the initiator and the Coordinator (See Chapter 6). 1.4.3. Commercial and open-source implementations Several vendors and open-source projects have implemented, or are in the process of implementing, the WS-TX specifications. We only mention the WSTX specifications here, and not WS-CAF, BTP etc., because the WS-TX specifications are essential to our thesis and prototype implementation. Apache Kandula is an open-source implementation of WS-COOR, WS-AT and WS-BA on top of the Apache Axis Java web service stack [Kandula], [Axis]. It is unclear to us to what extent the WS-BA specification is implemented, as the Apache Kandula website states that only WS-COOR and WS-AT is implemented, whereas the user guide contains instructions on how to use BA. Erven [2007] also suggests the inclusion of the extension protocol WS-BA-I in the Kandula project. Two branches are available in the Apache Kandula implementation of WS-TX, differentiating in the version of the underlying Java EE platform. Java Enterprise Edition, a.k.a. Java EE, is a widely used platform for server programming in the Java programming language. We will use the abbreviation Java EE in the remainder of the thesis report. The JBoss Transaction Service is another implementation of the WS-TX specifications [JBoss, 2006]. It is highly integrated with the JBoss Application Server and their implementation of Java Transaction API (JTA) and Java Transaction Service (JTS). One detail we have noticed is that the JBoss WS-TX project does not offer an implementation of the Participant, i.e. the entity that acts on behalf of a transaction-aware web service to communicate with the Coordinator. It is up to the programmer to implement the Participant.
24
1. Introduction
The Windows Communication Foundation (WCF) also contains an implementation of WS-AtomicTransaction. Suns Project Tango develops and evolves the code base for Web Services Interoperability Technologies (WSIT), which purports to enable interoperability between the Java platform and WCF [Project Tango]. Project Tango's WSIT technology is bundled inside the Sun GlassFish 1 v2 Java application server and currently implements (in Beta) WS-Coordination and WS-AtomicTransaction. Other vendors like IBM also have versions of WS-TX implemented as part of their enterprise application middleware suites or application servers. We have not explored the details regarding these implementations.
1.5. Scope
Distributed transaction management in SOA-based system integration is a interdisciplinary subject area, whose comprehensiveness lies far beyond the scope of a thesis project like ours. To achieve a controllable scope, we have decided to focus on the technical aspects and tone down the business aspects of SOA, well aware that taking SOA and system integration out of their business context is, in the real-world, a non-starter. So, instead of saying that our project disregards the business aspects of the subject area, we would rather put it in this way: We base the technical-centric work in this project on the assumption that the underlying business considerations are strategically and economically sound. With this assumption we consider the following fields out of scope: the business perspective and managerial aspects of SOA, SOA governance (i.e. evaluation of a technological solutions alignment with the business targets), SLA (service level agreement), process analysis, etc. Within the technical arena of web service transaction management, there are many related fields such as web service security, reliable messaging, federation, policy, as well as business process modeling and execution. All these areas can be very important supplements to transactional web services, but are in this project also considered out of scope.
GlassFish is an Open Source, Community Based implementation of Java EE 5.
25
1. Introduction
Within the transaction-related technical arena, this project does not deal with platform-specific resource management and locking, but assume that we can leverage legacy transaction support on different platforms via a uniform web service interface. (More on this in Sections 4.1.7 and 5.5.3).
1.6. Assumptions about the reader

We assume that you, the reader, are well-acquainted with the basic motivation for using service-oriented architecture to integrate enterprise systems. In addition, a working knowledge of web services, including XML, SOAP and WSDL, as well as their respective roles in SOA, is assumed. We also presuppose that you are well-versed in distributed system concepts, especially concepts in relation to distributed transaction management, message passing paradigms and system architecture. We will take advantage of your existing knowledge of object-oriented and component-based design and development and port that knowledge to the discussion of service-oriented design and development.
1.7. Report structure

This thesis report is organized into 9 chapters. We start off with the introductory Chapter 1, dedicated to nailing down the problem definition, the methodology as well as earlier contributions to the chosen subject area. Chapters 2 through 4 tackle the theoretical focus of the problem definition. Chapter 2 kicks off the theoretical analysis by giving a turbo introduction to the set of SOA concepts essential to understand the rest of the report, setting the groundwork for discussing web service transactions in the subsequent chapters. Chapter 3 provides a retrospective and analytical survey of transaction processing theories prior to the advent of web services, covering both the classical and extended transaction models. Chapter 4 then concentrates on the challenge of adapting these models to meet transaction processing needs in SOA. We conclude Chapter 4 by setting up a mental reference model for web service transaction management. Chapters 5 through 7 then shift gears to explore how theories and models from previous chapters can be leveraged to build a middleware prototype for transaction management in SOA. To provide mental congruency when presenting the prototype, we structure Chapter 5, 6 and 7 with a top-down approach. Chapter 5 decomposes the mental reference model from Chapter 4
26
1. Introduction
into high-level framework architectures. Chapter 6 then takes an even deeper dive into the implementational design of finer-grained components in the individual frameworks. Chapter 7 wraps up with a systematic quality evaluation of the implemented prototype. Finally, Chapter 8 synthesizes the theoretical, architectural and implementational analyses into a general conclusion. Chapter 9 contains our reflections on the problem-solving process, and Chapter 10 provides our suggestions for future work. A common thread across all parts of the report is the consistent use of an illustrative case based on Ementor Danmarks CAP project. This report contains a lot of acronyms. We will provide a short description when an acronym is mentioned for the first time. An overview of all acronyms can be found in Appendix A.
27
2. SOA-based system integration

This chapter covers a number of essential concepts with respect to SOA-based system integration, providing a basis and reference point for many related topics discussed in subsequent chapters. A number of concept clarifications are provided in Section 2.1, followed by a differentiated discussion of SOA in relation to three system integration levels in Section 2.2. Section 2.3 further pursues SOAs loose-coupling design principle by relating it to different communication paradigms. The concepts and theories covered in this section are based on [Erl 2004, Erl 2005, Chappell 2004, Kaye 2003, Hasan 2006, Lublinsky 2003, Hagel 2002 and Tanenbaum 2006].
2.1. Concepts
In this thesis, we view SOA as the service-oriented integration architecture with a standard-driven approach to integrating heterogeneous legacy applications. Service-oriented integration is suitable under the assumption that leveraging legacy logic is preferable to replacing it. In SOA, web services are the integration fabric and the basic building blocks. In the system integration setting, we view web services as internet-enabled and service-oriented integration components. A web service can be used to expose and abstract application logic that is otherwise locked in existing legacy applications. A web service can also compose other web services to form a service-oriented process flow, as will be described in Section 2.2.3. Still another service type can be used solely for the coordination of other web services, such as the coordination service we will mention in later chapters. In regards to legacy applications, being legacy does not necessarily mean that an application employs older technology, e.g. mainframe. In our project, an application deployed on a modern Java EE platform can also be deemed legacy, when there is the need for it to communicate with business functionality deployed on a different platform, e.g. .NET. Being legacy or not hinges on whether an application needs to be integrated with other applications, and whether this need is hindered by these applications incompatible technological platforms, such as operating systems, databases, application servers or even proprietary integration technologies such as the EAI (Enterprise Application Integration).
28
Architecturally, SOA usually extends the applications existing multi-tier architecture by introducing a logical service integration layer that through the use of standard programmatic interface provided by web services, establishes a common point of integration [Erl, 2004]. The service integration layer is logical in the sense that its concrete integration components will vary in different system integration approaches. These variations are the topic of Section 2.2. Service-oriented design is a continuation and extension of object- and component-based design. The most common service-oriented design principles include loose coupling, autonomy, discoverability, reuse, contractbased design, abstraction, statelessness and composability.
2.2. System integration approaches

The primary motivation behind system integration is for two or more applications to collaborate on a certain task. This collaboration can be as simple as one application retrieving a value stored in anothers database. Or, it can be as complicated as merging the existing applications data, resources, and business logic in order to support the workflow in a new application. This section describes three system integration approaches with increasing complexity levels. The following discussion assumes that all legacy applications possess a data-tier and a application-logic tier, among other tiers. With respect to each integration approach, we will touch upon how a service integration layer can be designed to attain interoperability, loose coupling, etc. 2.2.1. Data-level point-to-point integration Data-level integration, in which data from application A is accessed by application B without involving As application logic, can be classified as being among the early system integration attempts. The assumption of incompatible database platforms precludes the use of traditional data access technologies such as remote JDBC or ADO connections. Interoperability, in this setting, can be attained by placing a wrapper service in an extra service integration layer above the existing data layer. Retrieved data is formatted as an XML document and transported within a XML message. In addition to interoperability, using a wrapper service as a central data access controller has the advantage of masking proprietary database details from the service client. If designed with a coarse-grained service interface, the use of the wrapper service can result in improved performance by obviating fine-grained data traffic.
29
2.2.2. Application-level point-to-point integration In application-level integration, direct data access is not an option and application A and B solely communicate via their application-logic tier. The assumption of incompatible application platforms prevents the use of traditional remote invocation technologies such as Java RMI, RPC or .NET Remoting. To establish interoperability, a service integration layer can be placed above the applications layer with either a proxy service, or wrapper service, as the integration component. A proxy service is the easiest to develop and can usually be auto-generated. The proxy service interface granularity mirrors the granularity of the legacy functionality from which the proxy is derived, e.g. a fine-grained Java method. Although xmlized, the proxy services usually use the PRC-centric message exchange pattern 1 because of its direct mapping to method calls. A wrapper service, on the other hand, is often custom-developed and designed to expose coarser-grained legacy logic. A wrapper service can use either an RPC-centric og document-centric exchange pattern. Because of the granularity difference, wrapper services usually outperform proxy services by reducing network traffic. A word of clarification: although we have given proxy service a quite narrowscoped definition here, the word proxy is often used in a broader sense in both industry and academia, encompassing any component that acts on behalf of another, possibly remote component. In later chapters, we will use the term proxy in its broader sense, disregarding the subtle differences in granularity between proxy services and wrapper services. 2.2.3. Process-level integration Process-level integration targets new automated business processes by integrating existing applications, or sub-processes. A business process integrating disparate and distributed legacy functionalities is also known as a distributed workflow, such as the one in the eligibility evaluation service in the CAP case. Process-level integration is very much in demand and a challenging system integration approach, given the rapid development of supply-chain integration in a global setting and the vast number of mergers and acquisitions today. To attain process-level integration, it is not enough to place wrapper or proxy services in a service integration layer on top of the legacy application layers, as
SOAP is designed to support both the more tightly coupled RPC-centric, and the loosely coupled document-centric, message exchange patterns.
1
30
in the point-to-point application-level integration model. We also need a means to store and execute the business rules governing the workflow. Traditionally, process-level integration is achieved through either proprietary point-to-point integration models or a proprietary hub-and-spoke EAI architecture.
Proprietary and centralized EAI architecture
An EAI hub usually consists of two major component types: broker components and orchestration engines [Erl, 2004, p.363]. A broker component is used to enable pairwise communication between disparate applications, usually with non-standardized vendor-specific implementation of data transformations. An orchestration engine encapsulates and execute process logic by (1) integrating with other applications to retrieve/relay data or logic using point-to-point adapters, (2) invoking the broker components for the manipulation of data and (3) coordinating the execution of business rules, exception handling, and transaction management features, etc. Placing brokering and orchestration logic on a central hub allows all data to flow through a central location, thus promoting a clean structure, the reuse of point-to-point adapters and centralized maintenance of process logic. However, the hub-and-spoke integration model is also fraught with risks. The hub is a potential performance bottleneck and a single point of failure. Although scalability and performance can be improved by measures like replication and clustering, the cost of putting together a fully scalable hub-and-spoke environment with full fail-over support may be prohibitive. [Erl, 2004]. Whats more, with EAI packages being proprietary, expensive to purchase, expensive to implement and expensive to replace, the choice of any vendor-specific EAI product as the enterprise integration backbone leads to undesirable vendor lock-in.
Non-proprietary and distributed SOA alternative
In terms of process-level integration, SOA offers a non-proprietary and distributed alternative to EAI, by leveraging the composability design principle [Lublinsky, 2003], [Hagel, 2002]. Being composable means a service can engage other services to collaborate on the execution of a business process. A composed service is also known as a service composition, service orchestration, controller service, composite service, or simply a process service, in contrast with the more simple proxy or wrapper service used for point-to-point application integration. The service composition structure can be hierarchical and recursive, comprising both leaf-level wrapper and proxy
31
services and other composite services, which encapsulate sub-processes. An example of a hierarchical service composition is the subsidy application service in the CAP case, which engages three other sub-process services to execute the long-running business process. Each of the sub-process services is, in turn, a process service composing other proxy/wrapper services. (See Figure 2 for the web service landscape in the CAP case). Nested compositions are possible in SOA, as all services treat each other as abstract contracts rather than implementations. In other words, the hosting environment of any service exposes the service as an abstract service endpoint, with the services interface and address described in a WSDL document. By publishing and promising to honor their respective contracts, case registration and eligibility evaluation services qualify as candidate composition members, to be engaged in service orchestrations like subsidy application. Data representation and messaging standards such as XML and SOAP ensure interoperability, eliminating EAIs proprietary stigma. Standardized message exchange also renders the use of point-to-point EAI-style broker components superfluous. In contrast with the centralized orchestration engine in EAI, the controller service containing the workflow logic can be deployed anywhere, again because it only exposes itself as an abstract endpoint, and not a specialized deployment on any central server. Whats more, whereas integrating two vendors EAI products using a higher-level EAI is a practical, if not a theoretical, impossibility, existing EAI functionalities can be treated like any other type of legacy functionality and converted to coarse-grained service composition candidates in service-oriented integration. In this sense, SOA not only helps leverage existing EAI investments, but also acts as an extension to the EAI architecture. The missing piece in the application-level service composition alternative is the management of system functionalities such as transaction management, which is included in most EAI solutions. Since web services are by design stateless, and transactions are per definition stateful, we need to find a service-oriented approach to solve this paradox without breaching the loosely-coupled SOA design principle. This design challenge will be dealt with in Chapter 4.
Enterprise service bus (ESB)
In recent years, a lot of attention has been drawn to an integration architecture in-the-making, the Enterprise Service Bus (ESB). David Chappell defines ESB as an open-ended, distributed, service-oriented logical architecture that is capable of linking together old and new server technologies (from mainframe
32
to modern application servers), integration technologies (EAI, web service, etc.) and communication paradigms (MOM, RPC, web service, etc.) in a unified manner. Figure 4 shows an example of how an ESB can look like. After reading through many competing definitions of what an ESB is and is not, it is our opinion that ESB is but a more imposing way of stating the composability principle in SOA. It describes a logical architectural grouping rather than an implementational blueprint.
Corporate EAI
Transformering
Service
Enterprise Service Bus Service Container Invocation and Management Framework ESB Endpoint Client Interface C/C++ HTTP WS SOAP
Application Applikation
Methods
Router
Figure 4: ESB architecture - a simplied view
(source: [Chappell, 2004], adapted from [Mller, 2004])
As Figure 4 illustrates, every composable piece of functionality on the bus is encapsulated as a service, and services are grouped and organized in logical constructs called Service Containers, a concept we will further explore in Section 5.5. For now, let it suffice to state that a Service Container is a more standard-driven way of organizing and grouping legacy functionality (legacy adapter interface, methods) and system functionality (invocation and management framework, routing service, etc), analogous to the Java EE Container concept. A Service Container exposes the encapsulated functionalities via an ESB abstract endpoint, shown as the ESB abstract endpoint in Figure 4. Abstract endpoints can be regarded as contract-based, logical abstractions of services that are plugged into the bus. The dashed lines between the ESB and the Service Containers are logical connections, indicating
Application Applikation
Service Service
33
that Service Containers can be deployed in distributed physical locations. Figure 4 also depicts ESB as an open-ended architecture, in which closely related service components can be grouped together on separate mini-buses that are linked together with the rest of the ESB. Because integration capabilities, such as transformation, coordination, orchestration, routing, and even legacy EAI deployments, are themselves implemented and represented as services and exposed as abstract endpoints, they can be deployed anywhere in the network and scaled independently as required [Chappell, 2004, p.110]. The ESB provides, and if necessary replicates, directory services. This means that all services can potentially be consumed as reusable composition members after having registered themselves with a directory service. This service-oriented organization of both business services and integration components is again, a manifestation of the composability principle. Following this principle, the workflow rules in the process-level integration can be abstracted as a separate service on the bus, further decoupling the stateless composition member services from their intended, stateful behavior.
2.3. Communication Paradigms in SOA

Loose coupling is one of the most important design principles in SOA. This principle reflects in the use of self-contained service contract independent of other services, as has been mentioned in the preceding sections. In addition to this, the loosely coupled characteristic also manifests in SOAs ability to accommodate a flexible range of communication paradigms. What follows is a brief overview of these communication paradigms. Along the way, we point out which degree of loose coupling the services in our prototype are designed to achieve. Tanenbaum [2006] divides coupling into two dimensions, referential coupling and temporal coupling. If two communicating entities are referentially coupled, they know each other explicitly, either by name, process identifier or service instance identifier. If two communicating entities are temporally coupled, they both need to be up and running in order for the communication to take place. SOA is capable of accommodating different combinations of these two dimensions, allowing for different degrees of coupling.
34
1) Temporally coupled, referentially coupled: The RPC/RMI message exchange pattern in proxy or wrapper services is an example of temporally and referentially coupled communication, which uses the synchronous request-reply paradigm. Some of the leaf-level legacy services in the CAP case, such as the general ledger and applicant registry, belong to this category, as these services faithfully mimic the underlying client-server model in the legacy applications. 2) Temporally decoupled, referentially coupled: Classical examples of this communication paradigm include the mailbox and the messageoriented middleware (MOM). The communication pattern between the eligibility evaluation parent service and its sampling and control child service is of such a type, as the execution of the sampling and control service involves human interaction and therefore periods of inactivity. In our implementation, the temporal decoupling between these two services is simulated with wait-and-notify between threads. In a real-world SOA implementation, this could be done with the asynchronous callback communication paradigm, e.g. by defining a call-back contract for duplex services in WCF [Lwy, 2007, p.590]. 3) Temporally coupled, referentially decoupled: A classical example of this communication pattern is the meeting-oriented coordination, in which all parties must be present but they do not have to know each other by name or reference. As we will see in later chapters, the activation and registration services in the coordination framework belong to this category. Conceptually, the invoked service instances are temporally coupled with the invoking client, as they send immediate responses back to the client in a synchronous RPC-centric SOAP message. Referentially, these services and their invokers do not know which process, or service instance, has issued the request or handled the response. To aid correlation, each service supplies an applicationspecific identifier (i.e. a Participant ID), to the receiving party. Correlation is then performed at the back-end application layer, not at the communication layer. Technically, this is not a 100% referential decoupling, as it does not provide total transparency between the invoker and invokee services. 4) Temporally decoupled, referentially decoupled: A textbook example of this most loosely coupled communication form is the tuple store, also called the generative communication by Tanenbaum [2005, p. 591]. Producers and consumers of data placed in a tuple store communicate
35
via the following mechanism: the producer attaches descriptive tags to data (the data and its tag forming a tuple), and the consumers retrieve data via associative search. In our prototype, all middleware protocol services, e.g. the ATParticipantService and BAParticipantService, etc., are loosely coupled in this way. The temporal decoupling manifest in the asynchronous, fire-and-forget service-invocation pattern. It also reflects in the persistence of all protocol messages, so that they can be correlated with the asynchronous responses arriving with great latencies. This is similar to the MOM model. The referential decoupling reflects, as in the former case 3), in that service instances do not reference each other directly but through a service faade, as will be explained in Section 5.5. In particular, the service instance that issues the request message does not have to be the same instance that receives the correlated response message. Again, we could argue that this form of referential decoupling is not 100% transparent.
2.4. Summary
In this chapter we have explored a subset of SOA concepts in the setting of system integration. The design of service integration layers in relation to three different system integration approaches has been discussed. The discussion aims to shed light on how web services can give platform-dependent integration methods an overhaul, and resultingly, make existing legacy functionality available for new interoperability opportunities. We have highlighted the usefulness of SOAs composability principle in attaining process-level integration. We also point out how SOA transcends the traditional hub-and-spoke, proprietary EAI architecture and extends/replaces EAI with a distributed, open-ended and standard-based integration approach. In addition, we have discussed varying levels of loose coupling in SOA in light of different communication paradigms. Pointers have been given in relation to the degrees of temporal and referential coupling exhibited by the services in our prototype implementation. SOA and system integration are both wide-ranging subject areas. We have barely scratched the surface with this short and compact introduction of SOAbased system integration. We refer the reader to the literature list given at the beginning of this chapter for further details.
36
3. Classical transaction processing models

This section outlines the nuts and bolts of distributed transaction processing models prior to the advent of web services. Understanding the strengths and limitations of these models sheds valuable light on how we can leverage the tried-and-true transaction processing methods and adapt them to the web service world, which in turn is the topic of the next chapter. The method we use to conduct the analysis in this chapter is a step-by-step theoretical screening. In Section 3.1, we state the failure models assumed when discussing transaction processing models in this thesis. Section 3.2 summarizes the essence of the classical transaction model building upon the ACID principle. Section 3.3 discusses the inadequacy of the classical ACID-style transaction and introduces an extended transaction model with relaxations on the ACID properties. Section 3.3 also contains an evaluation of the extended transaction model.
3.1. Failure models

Transactions are a fundamental concept in building reliable distributed applications. A transaction is the grouping of a set of operations so that they constitute an indivisible, logical unit of work. In order to set an unambiguous context for later discussions of transaction processing models, we will briefly state the failure model assumed in this thesis. 3.1.1. Logical failures The idea of placing multiple operations within a single transactional scope is to safeguard the system against various degrees of logical failures, such as lost update, inconsistent retrieval, dirty read, premature write, etc. [Coulouris, 2005, p.518 ff.]. 3.1.2. Omission failures Intrinsically, a transaction processing system should be capable of dealing with omission failures in the form of process crashes, disk failures, or communication failures. In traditional transaction processing systems, these omission failures are masked by assuming a stable storage and a stable processor. [Coulouris, 2005, p.517].
37
Disk crash omission failures are masked by assuming a stable storage, for instance by replicating each block on multiple disk blocks or using any variant of striping in RAID 1 . Each write operation is applied atomically to all replicas. After a disk crash, it is the job of a recovery manager to persist all committed updates from a recovery file to disk, as well as restore the old values for aborted transactions. A recovery file is a logical concept. Alternative physical organizations of a recovery file include logging, shadow versions, etc. [Coulouris, 2005, p. 589-598]. We assume such a recovery component is in place in all transaction processing models. Process crash omission failures are masked by the assumption of a stable processor, which basically means replacing a crashed process with a new process that is reinstated from stable storage and other processes. Communication omission failure is masked by assuming a reliable point-topoint communication channel. In the analysis that follows, we assume the existence of such a reliable communication channel. The means to attain reliable communication, however, is out of the scope of this project. In real-world SOA, reliable communication is often handled by another protocol on the SOA stack, namely WSReliableMessaging.
3.1.3. Byzantine and timing failures If a transaction processing system is supplemented by an extensive replication mechanism, arbitrary failures can also be guarded against to a certain extent, although masking arbitrary failures is not traditionally considered the transactional systems responsibility. Timing failures apply only to synchronous distributed systems, in which we have well-defined upper and lower bounds on (1) process execution time (2) message transmission latency and (3) clock drift rate [Coulouris, 2005, p.50]. A web service transaction typically executes in an asynchronous system where no timing bounds are assumed. In such cases, timing failures are irrelevant. In other cases, it is reasonable to impose latency constraints upon a subtransaction for which a synchronous atomic commit protocol is used. Under these circumstances, failures in meeting the latency constraints are usually detected by using a timeout fallback. Even though timeout is an unreliable failure detector, it is regarded as an acceptable implementation for transactional systems in this thesis.
Redundant Array of Inexpensive/Independent Disks.
38
3.2. Classical transaction model

In classical transaction theory, interleaved transactions should exhibit serial equivalence (any concurrent execution must have the same effect as a serial execution) and failure atomicity (the effects are atomic in the event of a server crash). In this thesis these requirements are collectively referred to as reliability guarantees. 3.2.1. The ACID properties Often, we mnemonically refer to the collection of reliability guarantees for transactions as the ACID properties: 1 Atomicity: either the effects of all operations are reflected in the transaction, or none are. 2 Consistency: a transaction must bring the system from one consistent state to another. 3 Isolation: the effects of the operations are not visible outside the transaction until it completes successfully. Each transaction appears as if it executes in isolation. 4 Durability: once a transaction successfully completes, the changes it has made will survive system failures. The discussions in the following sections pivot around the ACID guarantees with respect to (1) single-machine transactions, (2) distributed transactions with the synchronous RPC/RMI communication paradigm and (3) nested transactions. 3.2.2. Single-machine transactions The most prevalent single-machine transaction model is the database transaction model, where the database Recovery Manager is responsible for ensuring the atomicity and durability properties of transactions. Resource and transaction outcome is managed by a database transaction monitor, a.k.a. Transaction Manager. Consistency is often application-specific and is dealt with at the application level. Most modern DBMSs offer database-level facilities to enforce simple consistency constraints such as referential integrity. Isolation is usually guaranteed in varying degrees by a concurrency control mechanism, such as locking, optimistic concurrency control, or timestamp-based concurrency control [Silberschatz, 2002]. Most DBMSs allow for choosing between different isolation levels (read uncommitted, read committed, repeatable read, serializable) to match different data access patterns. As object-oriented programming became popular in the 1980s and 1990s, database transaction
39
processing technology found its way into the design of application-level, object-oriented transaction management components. Object-oriented languages often provide programming facilities allowing the client application to interface with a database transaction monitor through an object-oriented API. The System.Transactions namespace in .NET 2.0 Framework is a case in point [Lwy, 2005].
The one-phase atomic commit protocol
The single-machine transaction model assumes that all resources are under the control of a single Transaction Manager. To guarantee outcome atomicity, the Transaction Manager operates a one-phase atomic commit protocol, ensuring that a commit or abort operation is carried out as an atomic step for all resources(objects, data, etc.) participating in the transaction [Kaye, 2003, p.150]. 3.2.3. Distributed transactions A distributed transaction accesses objects managed by multiple, distributed servers. Most DBMSs have packaged solutions to target distributed database transactions. As applications become more and more decoupled from the databases, application-level transaction support increases its importance as a requirement. Many enterprise application servers provide built-in transaction support. An example is JTA (Java Transaction API) in the Java EE platform. Concurrency control is singled out as part of the system functionality, in order for a transaction to be controlled from client code or, alternatively, be completely managed by the server [Little, 2004]. To agree on a consistent global outcome, the classical model for distributed transactions - both database-level and application-level - utilizes a single Transaction Coordinator, and multiple Resource Managers, a.k.a. Participants. While the Transaction Coordinator initiates and coordinates an atomic commit protocol, each individual Resource Manager (RM) manages its local resource and responds to the Coordinator.
The two-phase atomic commit protocol
Although the one-phase protocol is a good fit in a single-machine environment, it is insufficient for distributed transactions. The reason here is that a one-phase protocol cannot handle a subset of the Participants aborting after a commit decision has been issued by the Coordinator. To meet the
40
atomicity requirement, distributed transaction processing systems use a twophase commit protocol for all parties to reach an agreement on the global outcome of a transaction. In the first preparation phase, the Participants are asked to vote for the transaction to be committed or aborted. Those who vote to commit log their updates to a reliable backup medium. An actual commit is not carried out until the second, completion phase, when a joint commit decision has been reached.
The three-phase commit protocol
Although superior to a one-phase commit protocol, the two-phase commit protocol cannot deal with all sorts of failures. In particular, the protocol can leave Participants in a blocking, uncertain state. This can happen when Participants have prepared to commit but wait in vain for a decision to commit or abort, because the Coordinator has crashed in the meantime. In such a situation, a Participant cannot unilaterally decide what to do next. Nor can the Participants cooperatively decide to commit or abort, since they will need the Coordinators vote in order to reach a consensus. As a consequence, Participants cannot release the local objects and must remain blocked until the Coordinator recovers. Tanenbaum [2005] suggests a more sophisticated, but less comprehensible three-phase commit protocol. In this protocol, surviving processes can always reach a collaborative decision. If they vote to commit, the protocol guarantees that any crashed processes have also given the vote to commit and thereby backed up their updates before they crash. If the surviving Participants vote to abort, the protocol will likewise secure that no crashed process would have already committed its local resource. The three-phase commit protocol is nonblocking in the sense that surviving processes can always reach a collective decision without introducing a globally inconsistent state. This is done by placing extra synchronization points on the communicating state machines at the Coordinators and Participants processes, respectively. Although a theoretically proven superior algorithm, the three-phase commit protocol is seldom used in practice. This lacking of widespread application is partly due to the fact that the condition for blocking the two-phase commit protocol very rarely occurs, and partly because the three-phase commit protocol is complicated to implement and often coupled with inferior performance. In the remainder of this thesis, we will only deal with the two-phase commit protocol due to its easy comprehensibility and practical relevance.
41
3.2.4. Nested distributed transactions Distributed transactions can be flat or nested. In a flat distributed transaction, each operation executes sequentially. When servers use locking, a transaction can only be waiting for one resource at a time. A nested distributed transaction can have a hierarchical tree-like structure. A parent transaction can open subtransactions that execute concurrently. Apart from the advantage of extra concurrency, nested distributed transactions are also more flexible in the sense that the root Coordinator can choose to commit the whole transaction even when some of the subtransactions have failed/aborted. Locking resources in nested distributed transactions are subject to more subtle rules which are described in [Coulouris, 2005, p.537]. We observe that nested transactions do not have to be distributed. However, the single-machine nested model can be regarded as a special case of the general model.
3.3. Extended transaction model

The distributed two-phase commit protocol is a well-formed, ACID protocol. While the strict ACID behavior is desirable in a database environment or a short-lived distributed transaction, it can be too expensive to secure in a longrunning distributed transaction environment. In the past decade, considerable attention has been drawn to the possibility of defining an extended transaction model. 3.3.1. Relaxing the ACID properties Theoreticians and practitioners alike have proposed ways to relax the ACID properties. This section outlines a number of the possible ACID relaxations [Bernstein 1997, Frank 2006, Kaye 2003, Little 2003-1].
Relaxing atomicity in the two-phase commit protocol
In larger enterprise systems, the outcome of distributed transactions often entails complicated business logic which is hard to capture with the all-ornothing property. In an extended transaction model, the atomicity property can be relaxed in the following ways. 1) The overall outcome may include only a partial set of Participants.
42
2) A Participant or subtransaction may have the discretion to perform a local commit without waiting for the transactions global outcome. The first form of relaxation allows resources to be selectively included in the commit protocol. This is akin to what is already present in nested transactions, but their selection rules are conceptually different. In nested transactions, the selection rule only allows provisionally committed subtransactions, with no aborting ancestors, to participate in the commit protocol. When relaxing the atomicity property, however, the rule governing which Participants to include in the final outcome is more business logic driven. For instance, a travel agency application could start three subtransactions asking three different bidders to provisionally book an itinerary. Even if all three subtransactions provisionally commit, the top-level transaction could still choose to commit only the subtransaction containing the best bid. The other two subtransactions, albeit provisionally committed, are excluded from the final two-phase commit protocol by the business rule. The second form of atomicity relaxation, where a Participant/subtransaction locally commits without waiting for the global outcome, leads to the violation of the isolation property at the same time. In later sections, well discuss remedying the loss of atomicity and isolation by providing alternative guarantees.
Relaxing Isolation
The database implementation of varying isolation levels mentioned in Section 3.2.2 is the simplest form of relaxation of the isolation property. For instance, the read uncommitted isolation level allows for a dirty read which could be desirable for reporting applications. Likewise, read committed and repeatable read both relax the isolation property through avoidance of strict two-phase locking. We will not discuss database-level relaxations further as they are only peripherally related to the focus of this thesis. Another way to relax the isolation property is to allow the Participants or subtransactions to release the resources as soon as the local operation is completed. In the case of long-running transactions, the exclusive locking of resources over extended periods of time is often undesirable. Applications that span cross-enterprise business domains might not allow outside control to hold long-duration locks on their resources. Not holding on to locks until the point of the global commit implies that interim results are immediately visible to concurrent users. Under such circumstances, the global consistency could be
43
violated, and having a transaction scope would be of no avail. For this reason, relaxing the isolation property often goes hand in hand with a relaxed rollback.
Relaxing rollback
When relaxing the isolation property, not using the long-duration locks exposes partial results of a transaction to the outside world. The important question is, should the global transaction abort, how do we rollback a part of a transaction whose result has already been seen from outside the transaction? In the classical model, a rollback operation must restore all values back to what they were prior to the start of the transaction. When locks are released early, reinstating the before-picture is impossible, since a later transaction could have read or modified the value and committed, leaving us with an unrecoverable dirty read or premature write. To address this, we can let the rollback mechanism perform a compensating action. In the CAP case, the case registration service is allowed to locally commit before the top-level transaction, the subsidy application service, has reached a result. If the subsidy application service aborts, we cannot technically undo the case registration because the status of this registration could have been observed from outside the transaction. Nevertheless, we can perform a semantic undo by adding another entry in the system stating that the case has now been deregistered. In this case, we allow both the registration and deregistration operations to take place, but their effects cancel each other. The result is a semantic, relaxed rollback operation. It has the extra benefit of keeping track of the history. It is worth noting that replacing rollback with compensating operations should be applied with care. If a high degree of potential dependency is anticipated between the interim results and other concurrent transactions, this form of relaxation could lead to cascading compensation. In the classical example of transferring money from one account to another, if the interim debiting result is observed and used by concurrent transactions to calculate interest etc., then it is not enough to only compensate for the original money transfer transaction. All transactions that potentially have read the intermediate results must be compensated for as well. This could turn out to be a very expensive task. In conclusion, relaxation of the isolation property by introducing a compensating rollback mechanism is most feasible when dependencies between interim results and other concurrent transactions are rare, or when they do not harm the system consistency. Both conditions are true, for
44
instance, in the subsidy application service in the CAP case, which is why we argue for a compensation-based transaction model. 3.3.2. Evaluation of the extended transaction model It has long been realized that ACID transactions are not adequate to model all real-life transaction scenarios. As Doug Kaye points out, as valuable as the ACID model has proven to be for tightly coupled distributed systems, it falls short for long-lived, loosely coupled asynchronous transactions [Kaye, 2003, p.155] The increasing interest in how to relax the ACID requirements confirms the need for modeling transactions in a more flexible way. The extended transaction model forms a mental framework that bridges the gap between an ideal, theoretical transaction world and real-life business transactions which tend to be long-running and driven by complicated business logic. The extended transaction model, however, is in our view an immature model with at least two limitations. These limitations can be termed (1) Reliance upon synchronous communication (2) Lacking an alternative model of reliability guarantee. We present discussions of these limitations in the subsections which follow directly.
Reliance upon synchronous communication
The first limitation is that the extended transaction model still bears a strong resemblance to its classical cousin, in the sense that both models assume a tightly-coupled synchronous RPC/RMI style communication paradigm. Many proponents of the extended transaction model have been heavily involved in designing transaction support in enterprise application server platforms such as the Java EE and .NET. In this setting, traditional transaction processing technologies, originally designed to work on a single machine, were often adapted for use within a tightly coupled distributed environment or programming platform, such as Java RMI. An example is the EJB transaction model in the Java EE specification [Little, 2004]. In distributed enterprise solutions, the mainstream communication pattern between remote objects is the synchronous paradigm. To our present knowledge, none of the existing enterprise solutions today allow for a transaction scope which spans over asynchronous message queues. The reason for this is simple. Transaction support in these enterprise systems all use the two-phase commit protocol, whose blocking nature prevents it from being
45
used in a temporally decoupled setting enabled by a message queue or tuple space. Imagine the consequence of placing five delay hours between the preparation phase and the completion phase. These five hours might be the time needed for the coordination messages to traverse the message queue. A lot of resources could be locked for those five hours, for each of the Participants and little concurrency can exist in such a system.
Lacking an alternative model of reliability guarantee
Another limitation of the extended transaction model is that it fails to construct an alternative reliability model that can stand in place of the strict ACID guarantee. In other worlds, if we deconstruct the ACID guarantees, which are the very reason for using transactions in the first place, what good is derived from the use of transactions? In our view, deconstruction of the ACID requirements requires the reconstruction of an alternative model of reliability guarantee. These alternative reliability guarantees should reconcile the effects of relaxing the ACID requirements without reintroducing the evils in the ACID model which we would like to avoid. Some of these evils are longduration locks, a blocking commit, and the reliance on synchronous communication patterns. Frank [2006] mentions some countermeasures to reduce the anomalies resulting from relaxing the ACID-model. However, these countermeasures are limited to database transactions where the systems are assumed to be tightly coupled and exchanging synchronous RPC style messages.
3.4. Summary
This chapter analyzes the classical transaction processing model designed to meet the strict ACID requirements, as well as the extended transaction model with relaxed ACID properties. In our view, the extended transaction model is the first step toward adapting the classical ACID model to the web service world, albeit with the limitation of only considering the synchronous communication paradigm. The extended transaction model also fails to provide alternative reliability guarantees to replace the classical ACID guarantees.
46
4. Web service transaction processing model

In the preceding chapter, we discussed the classical and the extended transaction processing models. This chapter concentrates on adapting these models to meet transaction processing needs in a service-oriented architecture. Section 4.1 points out how the use of SOA impacts the understanding of transactions, and why the ACID-style transaction models are inadequate for SOA. Major challenges in web service transaction management are emphasized, and the analyses of how to meet these challenges are operationalized into a set of eight design criteria. This is followed by a brief introduction of the WS-TX protocol family, which are web service standards governing the transaction space, in Section 4.2. Based on the design criteria and the web service standards, a reference model for web service transaction management is constructed and presented in Section 4.3. In the remainder of this thesis, the reference model serves as a conceptual foundation for the architectural design (Chapter 5) and implementation (Chapter 6) of the SOA transaction management prototype. This section is heavily inspired by [Newcomer 2004, Little 2003-1, Cabrera 2005, Erl 2004, Erl 2005, Webber 2003 and Weerawarana 2005].
4.1. Impact of web services on transaction management

Solving transaction problems at the web service level introduces an extra degree of complexity, which makes it a less efficient solution compared to transaction management at the application or database level. When used in system integration, web services usually wrap around coarse-grained functionality in legacy systems which otherwise would not be able to communicate. These legacy systems typically run in different execution environments with platform-specific transaction support, usually an implementation of the strict ACID-style two-phase or three-phase commit. If all services in a transaction are managed entirely within a single execution environment, web service transaction management will have no role to play. Transaction management at the web service level should only be considered when using web services to integrate disparate systems, as shown in Figure 5.
47
4.1.1. Interoperability Figure 5 illustrates how interoperability is provided, by wrapping legacy systems behind a web service interface, between otherwise incompatible systems. Examples of such systems are JMS, WebSphere MQ, CORBA, EJB, .NET, SAP and Siebel, etc. A web service transaction middleware consists of, among other things, transaction Coordinators. The figure also illustrates the possibility for transaction coordination middleware to work in conjunction with business process workflows expressed in e.g. BPEL (Business Process Execution Language). The purpose is to coordinate the systems to reach a common decision on whether to commit, rollback, or compensate the changes done by the business workflow. Note that the legend transaction Coordinators in this figure is merely an abstraction symbolizing the transaction middleware layer in general. In this report, we will use the term Coordinator in another sense, explained later in this chapter.
Figure 5: Integrating transaction domains in SOA
Inspired by [Newcomer, 2004]. Chapter 10.
Assuming web service transaction management is rightfully employed to render disparate transaction domains interoperable, we identify the following challenges pertaining to transaction management in the context of web services.
48
4.1.2. Stateful transactions vs. stateless services The composition principle in SOA makes it possible to aggregate several business services in a composite service, a.k.a. service orchestration or business activity in SOA. In this thesis, the three terms composite service, service orchestration and business activity will be used as synonyms. The subsidy application service in the CAP case is an example of a composite service, representing the entry point to a long-running activity and a semi-automated business process. Every service orchestration introduces a level of context into an application runtime environment. The more complex a business activity, the more context information it tends to bring with it. Managing transaction contexts in service orchestrations inevitably requires managing and propagating these contexts, or transactional states. As statelessness is a key service-oriented principle, requiring individual services to retain state for the sake of transaction management seems to be a step back from the service-oriented ideal. The paradox between stateful transactions and stateless services is exacerbated by the necessity of treating legacy business services as black-boxes in system integration. It is often a daunting task to extend or manipulate the legacy services existing interfaces to make them state-aware. If we cannot make the individual services manage states themselves, we will have to take state management to the services in a non-intrusive manner. This boils down to the first design criterion: Design criterion 1: Use a generic context coordination framework to turn stateless legacy services into stateful Participants. The idea is to standardize the management and interchange of transactional states by using a generic coordination and context management SOA layer, depicted as Transaction Coordinators in Figure 5. Isolating web service context management as a separate software component is a service-layer extension of the well-established middleware concept. Just as platform-dependent middleware aims at providing distribution transparency, the SOA coordination middleware should hide context management from business logic. Often, a coordination middleware is designed. This middleware collaborates with various other actors in a distributed computing environment, e.g. Service Containers, Participants, and Resource Managers, etc., to manage context and state management on behalf of the individual services. As the dependency is strictly one-way the coordination middleware knows the individual business services but the business services are happily unaware of their states being managed and disseminated, we have loosely coupled statefulness.
49
Seemingly an oxymoron, this term captures the essence of using a SOA-layer coordination middleware to maintain a persistent activity context in a stateless SOA setting. An in-depth coverage of various components in the coordination middleware is deferred to the next chapter. For now, it suffices to point out that the design of this middleware will require a standard-based approach targeting servicelevel interoperability. A generic context coordination framework in SOA transcends the realm of transaction management. Any description of the meaning and other characteristics of a runtime service activity can be classified as its context information. In this sense, a transactional context is just one type of context among a whole range of contexts in SOA. Likewise, transaction management is just one of the system-level functionalities, or cross-cutting concerns, in SOA. Examples of other contextual information include service security context, authentication context, service policy context, etc. Management of all such contexts requires cross-service coordination. It is therefore desirable that the design of the coordination framework not be tightly bound to any particular context type. This leads to the second design criterion: Design criterion 2: Decouple the context coordination framework from the context types. Although our thesis only concerns the coordination of transaction contexts, we have strived to follow this general design principle of decoupling the context coordination framework from the specific coordination protocols. Even within the transaction management sphere, this principle makes it possible to utilize the context coordination framework with any transaction protocol, be it a traditional ACID-style protocol, or the compensation-based protocol presented later in Section 4.1.5. The ideal scenario is that one coordination type can be unplugged and another plugged in through managed configuration and minimal code rewriting. Necessary dependencies should be made explicit. Decoupling the coordination framework from the context types requires propagating the coordination context in every message exchange. This is necessary due to the web service invocation pattern being predominantly asynchronous. Unlike traditional distributed transaction processing solutions, in which one can rely on a persistent server session to share the transaction context, transactional contexts in SOA have to be passed in every message
50
exchange. We will take a closer look at context propagation in the next two chapters. 4.1.3. Heterogeneous transactional requirements in SOA Orchestrated services have varying transaction requirements. As pointed out by Newcomer [2004], the relationship of a transaction to a legacy web service might be as simple as delegating a transaction to an existing transaction execution environment for the legacy service. It may also be as complex as coordinating a single transaction across multiple Participants in a long-running business process, across arbitrary execution environments. A variety of transaction protocols should be available for use with different transactional requirements, ranging from tighter-coupled, short-lived strict ACID transactions to loosely-coupled and long-running automated business process executions. This leads to the third design criterion: Design criterion 3: management protocols. Accommodate multiple transaction
Choosing the right protocol for the application helps ensure that complex web service applications can achieve consistent, predictable, and reliable results. 4.1.4. Tightly-coupled and short-lived atomic transaction Some service orchestrations entail more tightly coupled and short-lived web services, such as the account payable composite service in the CAP case. In this service, we need to wrap the general ledger and fund transfer subservices into a single action so that either both are carried out in one atomic step, or neither is carried out. The transaction must adhere to the strict all-or-nothing property. Lasting at worst a few minutes, such service orchestrations are good candidates for the use of locking-based concurrency control and the blocking two-phase commit protocol. However, this protocol must be applied with care, as permitting a service from one system to hold exclusive locks in another system requires a high level of trust between the hosting environments of these services. Additional awareness is necessary as atomic transactions are usually incompatible with the predominant asynchronous communication pattern in SOA. We have previously argued that two-phase commit is a blocking protocol in the face of a crashing Coordinator. This means that atomic ACID transactions are only suitable for web services running for a short duration in closely integrated trust domains. In general, atomic transactions at the web
51
service level should only be used to ensure interoperability across existing atomic transaction protocols in the underlying platforms. By this argumentation we arrive at the fourth design criterion: Design criterion 4: Atomic transactions should support flat transactions with strict ACID requirements as well as nested transactions with the relaxed atomicity principle. Management of atomic transactions in SOA should support the flat atomic commit protocol, giving the strict all-or-nothing property, illustrated by the account payable service in Figure 6. Flat atomic transactions implement the familiar commit and rollback features to enable cross-service strict ACID behavior.
Figure 6: Atomic web service transaction (flat)
The atomic transaction protocol should also support nested atomic transactions as illustrated in the case registration service in Figure 7.
Application Registry Service Case Registration Service
Im asking you two to do some work. If either of you mess up, I may consider giving it one more go.
Document Handling Service
Figure 7: Atomic web service transaction (nested)
52
In Figure 7, the application-registry service and the document handling service each run in the scope of a subtransaction. The top-level transaction Coordinator unilaterally decides the global outcome if one of the subtransactions fails or aborts. This set-up reflects the necessity of having a negotiated or conditional outcome in the web service world. The case registration root service can be configured with business rules for determining the global outcome of the transaction depending on the gravity of the partial failure. In certain cases, we have a benign failure and the global transaction is committed, while the failing subtransaction is given a second chance. In other cases, the failure of a subtransaction has serious impact so all subtransactions are rolled back immediately. Figure 7 illustrates just one possible outcome. The nested atomic transaction in SOA illustrates the relaxation of the atomicity property as described in Section 3.3.1. In the classical nested transaction model, the fate of subtransactions is only determined by whether both a subtransaction and all its ancestors can do an error-free job. In a nested transaction in SOA, a root service can ask a provisionally committed subtransaction to rollback, even when this service has no aborted parents. This is a direct consequence of relaxing the atomicity principle and replacing it with more flexible business rules. You may wonder why we have excluded the classical nested atomic transaction model (with no atomicity relaxation) from the design criterion. The reason for this will be covered shortly in Section 4.1.6, when we talk about Composable transaction models in general. 4.1.5. Loosely-coupled and long-running transaction Many service orchestrations are long-running. The reasons for this include an asynchronous communication pattern, temporal decoupling, referential decoupling, or complex business rules involving manual interaction or periods of inactivity, as in the eligibility evaluation service in the CAP case. Given the long-running nature of this semi-automated business process, the related transaction could last hours, days or even weeks. If an atomic transaction were to be used with the two-phase commit protocol, there could potentially be a considerable time lapse between the preparation and completion phases. As argued in Section 3.3.1, using ACID-style transactions for long-running activities implies that resources could be locked and unavailable for extended periods of time, preventing anyone else from accessing the same or related data (such as indexes) until the transaction is concluded. Long-duration locking of resources
53
to ensure isolation is unacceptable in an environment with highly concurrent resource access patterns. The timeout fallback to protect local Resource Managers (Participants) from waiting forever for a coordination message in the two-phase commit protocol is of little use in long-running transactions, as these transactions, by definition, execute over extended periods. Thus, the atomicity property would have to be relaxed and compensated for by other means than a blocking atomic commit protocol. This aspect leads to our fifth design criterion: Design criterion 5: Long-running business transactions can have relaxed ACID properties but should provide countermeasures as alternative reliability guarantees. As pointed out earlier, one of the limitations of the extended transaction model is that it fails to provide alternative reliability guarantees. Having argued for the necessity to relax ACID properties for long-running web service transactions, it is important that we devise countermeasures that provide alternative reliability guarantees in lieu of the ACID guarantee.
Countermeasures as an alternative to the ACID guarantee
A subset of these countermeasures are proposed in the subsections below. Compensating operation Relaxing the isolation property is done by eliminating the strict two-phase locking or similar concurrency control scheme, allowing locks to be released immediately after a local operation has been completed. As a consequence, partial results will be exposed, violating the isolation property. Isolation can sometimes be provided by a compensating operation as mentioned in Section 3.3.1. The compensation-based protocol is illustrated in Figure 8, where carrying out Plan A immediately triggers the release of locks on local resources, making the interim results visible. Later on, should the global transaction fail, a compensating Plan B is conducted to reestablish the isolation property. In SOA, the compensating operation is usually another service (e.g. a debiting service) that cancels the effect of the already completed service (e.g. a crediting service). It is worth noting that the compensation logic cannot be automatically generated and therefore relies upon application-specific business logic. Sometimes compensation might not be necessary at all, if it is acceptable to
54
ignore the failing part of the activity, as in the case of the read-only sampling and control service. In other cases, an operation cannot be easily compensated for, such as manufacturing an item. In such a case, the compensating operation can at best charge the customer a cancellation fee and offer to sell the item to other parties. The business transaction coordination framework should provide welldefined plug-in points for the application-specific compensating operations.
Figure 8: Compensation-based business transaction in SOA
Cascading compensation Another important issue to note is the risk of cascading compensation. Cascading compensation would be necessary if we need to rollback all concurrent transactions that have performed a dirty read on an interim result of a transaction that is later aborted and compensated for. Cascading compensation is not a problem when web service instances do not simultaneously operate on shared data resources. For instance, in the fundreservation subservice in Figure 8, fund reservation operations are always done on mutually exclusive accounts, forestalling cascading compensation. Another countermeasure against cascading compensation is commutative update, where each update adds a positive or negative increment with no dependency upon the preceding read (Thomas Write Rule in [Silberschatz, 2002], [Frank, 2006]). Semantic lock Yet another countermeasure to secure isolation without using locking is allowing all service instances to be submitted through a single queue or object store. The local Resource Manager can then process the request in a
55
(quasi)serial manner. In the case of a heavy stream of incoming requests, multithreading can be implemented using a semantic lock mechanism. A semantic lock is used to mark data elements that have been touched by a pending transaction whose intermediate result has been exposed. A semantic lock conflicts with both read and write locks. When taking requests out of the queue/object store, a selection algorithm will secure that only requests which do not require locks on data items marked by a semantic lock, be taken out and handled. Should a conflicting lock be discovered, the local operation is rolled back and placed back into the queue/object store. The semantic locks are removed and data items unmarked at the time of a global commit. Pessimistic view Pessimistic view is an alternative countermeasure that can be used to ensure isolation and system consistency [Frank, 2006]. The idea is to design the transaction in a way that users are allowed to have a wrong view of the system due to interim result exposure, but only if this wrong view is a more pessimistic view of the situation that cannot be misused. An example is a theater booking service that exposes interim results of tentative bookings. It does not impact the systems overall consistency if a user sees a temporarily occupied seat that later is released. Mutually exclusive locking space In a long-running transaction spanning a business activity such as the eligibility evaluation service, it is also necessary to relax the atomicity property, accommodating flexible outcome determination. Usually, the relaxed atomicity property does not need to be compensated for since it is often stipulated by valid business rules. Another factor that makes it easier to reinstate atomicity is the absence of distributed deadlocks. Thanks to the nature of the legacy services being self-contained resource domains, services have mutually exclusive locking spaces. This situation means that a business service in one legacy system usually does not lock resources in another legacy system. Thus, there is no need to detect distributed deadlocks by constructing a global waitfor graph, e.g. through the use of edge-chasing [Coulouris, 2005]. The vignette below summarizes some of the countermeasures that have been proposed in the preceding text to provide alternative reliability guarantees in a long-running business transaction with relaxed ACID properties:
56
1. Compensating operations can be used in lieu of resource locking to secure

isolation. Isolation guarantees can be provided by the semantic lock and the pessimistic view countermeasure. 2. Cascading compensation can be guarded against by disallowing web service instances to access the shared resources at the same time or by designing updates to be commutative. 3. The atomicity guarantee is usually automatically satisfied by valid business rules allowing the relaxation of atomicity. 4. Mutually exclusive locking space can be exploited as a countermeasure against distributed deadlocks in distributed web service transactions.
Table 1: Countermeasures as alternative reliability guarantees to ACID
4.1.6. Composable transaction model In some complex web service workflows, certain stages of a service orchestration require a strict atomic outcome, while other stages can have more relaxed ACID requirements. This is the case in our subsidy application service example, where the top-level service is a long-running business transaction, which entails three subtractions of different types - a flat atomic transaction (the account payable service), a nested atomic transaction (the case registration service) and a long-running business transaction (eligibility evaluation service). These heterogeneous nested transaction types represent the requirement which gives rise to our sixth design criterion: Design criterion 6: Support for composable transaction models with heterogeneous types.
Figure 9: A coordination hierarchy with embedded transaction models (With Inspiration from [Webber 2005])
57
Short-lived atomic transactions can be part of a long-running business transaction. Embedded atomic transactions might be committed and made visible before the parent transaction commits. Later on, we may need to compensate for the effects of such atomic transactions, in which case the compensating actions should be performed in an atomic manner. Support for embedded transaction hierarchy with heterogeneous types requires the coordination framework to be flexible and extensible as illustrated in Figure 9. This solution uses interposition to create a hierarchy of Coordinators, each of which looks like a simple Participant to Coordinators higher up the coordination tree, while acting as a normal Coordinator for Participants lower down the tree. The top-level Coordinator is unaware of this arrangement since it sees the interposed Coordinator as a Participant, while the local Participants are coordinated by their own local Coordinator. Webber [2005] points out a number of advantages pertaining to a composed transaction model. These advantages are outlined in the following vignette.
1) Protected trust domains: By using its own Coordinator, the domain a service resides in only exposes the Coordinator to the superior and not the individual Participants. This capability may be useful in restricting the amount of information that can flow out of the domain and hence be available to potentially insecure or untrusted services. 2) Increased performance: Fewer messages need to be disseminated over the Internet to the top-level Coordinator, and the more numerous coordination protocol messages remain on low-latency, high-bandwidth networking within a legacy system. In Figure 9 the eligibility evaluation service uses the same coordination protocol as the top-level Coordinator for the subsidy application service. However, since only the outcome of the local coordination needs to be sent over the Internet to the top-level Coordinator, and not the more abundant coordination protocol messages, this approach is performance-optimized, compared to registering all child Participants for the eligibility evaluation service directly with the top-level Coordinator. 3) Flexible coordination: Since the coordination within an enterprise is not visible to parties outside, the interposed Coordinator can use whatever coordination protocol is most suitable for the type of application being executed within the local system. This may or may not be the same coordination protocol as that used at the top level, so interposed Coordinators can be used as a kind of "bridge" between coordination domains. In Figure 9, the case registration service and account payable service are coordinated with different protocols (flat and nested atomic transaction) and not the same coordination protocol used at the top level - business activity transaction. Since the local Coordinators for these two services are effectively "bilingual" in the coordination protocols they understand (knowing both the Participant aspects of the top-level coordination protocol and the Coordinator aspects of their own internal coordination protocols), different coordination domains can be bridged without adding complexity to the overall architecture. Table 2: Motivation for using a composable transaction model
58
Although we are proponents for using the composable, or nested, transaction model in SOA, we consider the classical nested atomic transaction model as inappropriate in SOA. In contrast with the nested transaction model with the relaxed atomicity property as covered by design criterion 4, the classical model requires subtransactions at all levels to keep exclusive locks on local resources until a top-level transaction outcome has been generated. This defies the above-mentioned motivation 1 of using the nested model to protect trust domains. Webber [2005] argues that while a high level of trust can be assumed in a relatively flat setting, e.g. in enterprise intranets, it is usually absent in multi-level service orchestrations, whose very purpose is to shield one enterprises trust domain from anothers. Most enterprises are reluctant to allow an external Coordinator to dictate when the local locks can be released per a distributed, nested two-phase commit protocol. This reluctance is understandable, because how can you trust a remote Coordinator enough to believe that it will NEVER crash, or that if it does, it will be kind enough to release the locks on your local resources? For these reasons, we decide to leave the classical nested transaction model out of the web service design criteria. 4.1.7. Leverage transaction support in legacy systems Most of the enterprise systems hosting the legacy services have built-in transaction support. When considering transaction management at the web service level, a reasonable question to ask is whether web service transaction management is used to replace the existing platform-specific technologies. Is it necessary to implement resource locking, deadlock detection and all the other nuts and bolts in a transaction processing paradigm at the web service level? It is our conviction that reinventing the wheel at the web service level to implement resource management is both unnecessary and undesirable. Web service transactions are expressly designed for interoperability across various execution environments and are not intended to provide an implementation of the transaction processing paradigm by themselves. Furthermore, managing resources at SOA level would incur many fine-grained invocations into the underlying system and can by no means perform well. As Mark Little puts it in [Little, 2003-1]: Much has been made of the fact that ACID transactions arent suitable for loosely-coupled environments like the Web. However, very little attention has been paid to the fact that these loosely-coupled environments tend to have large strongly-coupled corporate infrastructures
59
behind them. Any Web services transactions specification should not ask "what can replace ACID transactions?", but rather "how can we leverage what already exists?" This question as to how to leverage existing transaction support in legacy systems is the focal theme of Section 5.5.3. For now, we will just put forth the principle as the seventh design criterion: Design criterion 7: Leverage existing transaction support infrastructure in legacy systems. To make matters more concrete, let us first consider the two transaction types mentioned above the atomic web service transaction and the long-running business transaction. The short-lived atomic transaction protocol in SOA does not need to concern with resource management at the web service level. Instead, it should be modeled as an extension to the existing platform-specific two-phase commit transaction support infrastructure, in particular to the Resource Managers that already exist within Java EE, .NET, and CORBA environments, to name a few examples. In the next chapter, we will provide an in-depth discussion of how a web service atomic transaction Coordinator interfaces with the platform-level Resource Manager. For now, it suffices to point out that web service transaction management components should not require changes to fundamental transaction processing system artifacts, but that they do require the underlying system to expose a minimal Resource Manager interface, in order to operate the two-phase commit protocol. Therefore, a key point when implementing atomic ACID transactions in SOA is emphasized by our eighth design criterion: Design criterion 8: Atomic transactions collaborate with the legacy systems Resource Managers to operate the distributed twophase commit protocol. The Resource Manager interface allows an external Transaction Manager, i.e. the atomic transaction Coordinator, to intercept the legacy systems two-phase commit protocol. (See Section 4.3 and 5.5.3). The web service transaction layer does not implement low-level resource management primitives such as locking and deadlock handling but delegates this to the underlying legacy systems.
60
As far as the long-running business transaction is concerned, the nature of the web service transaction model as a service-oriented, loosely-coupled, and potentially asynchronous means of disseminating information between parties is incompatible with the underlying traditional ACID-style transaction support. The fact that transactions in back-end systems are constructed with ACID properties can potentially lead to problems when composing business activities from these services/resources, since it presents opportunities to those parties to lock resources and prevent transactions from making progress. Therefore, long-running business transactions should bypass the platform-level transaction support and replace it with the compensation- and countermeasure-based model, as explained in Section 4.1.5.
4.2. WS-Transaction standard

As mentioned in Section 1.4.1, several competing web service transaction specifications have been proposed. We have chosen to base our prototype design upon the WS-Transaction specifications, i.e. the WS-TX protocol family. The WS-TX protocol family comprises three specifications: WSCOOR, WS-AT and WS-BA. These specifications are, from our vantage point, a good match to what we believe should be captured by a good web service transaction processing model.
Figure 10: WS-Transaction services and protocols
(Source: [Weerawarana, 2005])
In addition, the theoretical underpinnings of these specifications also reflect the eight design criteria mentioned in the previous sections. Since their birth in 2002, the WS-TX specifications have gained relatively wide adoption in the
61
industry. When we embarked on the thesis project, the ink was not yet dry on theses specifications. But they got approved as official OASIS standards as of April 16, 2007. This change of identity from specifications to standards is beneficial to our project, as it gives more openness and general relevance to our prototype implementation. In the following sections, we will introduce the three standards in the WS-TX family: WS-Coordination (WS-COOR), WS-AtomicTransaction (WS-AT) and WS-BusinessActivity (WS-BA). Figure 10 summarizes the core elements in WS-TX and will be used as a reference in the following text. 4.2.1. WS-Coordination The WS-COOR specification defines an extensible framework for coordinating activities across web services using a Coordinator and a set of coordination protocols. An activity is defined as a computation carried out as tasks on one or more web services. An activity has a lifecycle: it is created (activated), runs, and completes. The framework enables Participants to reach consistent agreement on the outcome of distributed activities. These properties corresponds to our design criterion 1 in Section 4.1.2, i.e. to use a generic context coordination framework to turn stateless legacy services to stateful Participants. WS-COOR is a generic framework in the sense that it supports different coordination types and protocols, and decouples the framework from the coordination types. This decoupling is compatible with our design criterion 2 and 3 in Section 4.1.2. WS-COOR defines three component services as also depicted in Figure 10: An activation service is used for the creation of a new coordination activity, as well for the specification of the desired coordination type for the activity e.g. WS-AT. A registration service is used for registering or enlisting Participants in the activity, as well as for selecting a coordination protocol for the activity, e.g. the Durable2PCCommit protocol in WS-AT. A set of coordination protocol services for each supported coordination type. These protocols are defined in the specification that defines the coordination type. For instance, the WS-AT coordination type entail two coordination protocols: Durable2PCCommit and Volatile2PCCommit.
62
Figure 11: WS-Coordination communication scenario
The WS-COOR services are defined in WSDL, and the WS-COOR operations are designed to use the request-response message exchange pattern 1 . WSCOOR standardizes the process in which an application start a new activity using the activation service, and the Participants enlist themselves in this activity using the registration service. The activation and registration services are illustrated in Figure 11, which also shows how the WS-COOR services are in action by collaborating with two other roles: a transaction initiator and a Participant. The two coordination protocols (WS-AT and WS-BA) are described in Section 4.2.2 and Section 4.2.3 respectively.
Activation service
The activation service defines an operation named CreateCoordinationContext, henceforth also referred to as the activation request. CreateCoordinationContext is invoked by the initiator application (step 1 in Figure 11) to create a new activity. The input parameters to this operation include: CoordinationType - a unique identifier for the coordination type desired for this activity, e.g. a URI for the WS-AT coordination type. Expires - an optional element that represents a time-to-live property for the CoordinationContext. CoordinationContext - an optional element indicating an existing coordination context flowed from the parent scope. If this element is absent, a new independent activity will be created. If this element is
Request-response is synonymous to what is usually called the request-reply message passing paradigm in distributed systems theory.
1
63
present, a new activity should likewise be created, but meanwhile also be nested as a Participant in the parent scope. A number of optional extensibility elements and attributes to supply additional information that could be useful for the activation service.
When the CoordinationContext has been created it is returned to the initiator application wrapped in a CreateCoordinationContextResponse output message. The CoordinationContext is an xml structure encapsulating the following elements: Identifier - that uniquely identifies this activity/context Expires - an optional element that will only be created, if an Expires parameter was supplied to the CreateCoordinationContext operation CoordinationType - the coordination type of this activity, e.g. WS-BA RegistrationService - a WS-Addressing endpoint reference to the registration service A number of optional extensibility elements and attributes
Registration service
The registration service defines a Register operation, which allows for a web service or application to register its interest in participating in an existing activity, and to select the specific protocol it wishes to participate in. The RegistrationService endpoint reference in the CoordinationContext is used to contact the registration service. For instance, the initiator application uses the registration service to register as a Participant for the completion protocol (step 2 in Figure 11). This registration entitles the initiator application, at a later point in time, to ask the Coordinator to commit or rollback the transaction. Input parameters to the Register operation entail: ProtocolIdentifier - a URI identifying the protocol (e.g. the URI for the CoordinatorCompletion protocol in WS-BA) ParticipantProtocolService - the endpoint reference where a Participant would like to receive protocol messages from the Coordinator A number of optional extensibility elements and attributes
When the Coordinator for this activity has registered a Participant for the completion protocol it will send a RegisterResponse output message in return. RegisterResponse is an xml structure containing the following elements:
64
CoordinatorProtocolService, the endpoint reference where the Coordinator service would like to receive protocol messages from the Participant A number of optional extensibility elements and attributes
After this initial handshake through activation and registration, the Coordinator side and the Participant side have obtained knowledge of each other in the form of their endpoint references. They are now ready to exchange protocol messages. The initiator application, which has now become a Participant in the completion protocol, can begin to execute the business logic in the coordinated activity, e.g. calling another web service to perform a distributed piece of work (step 3 in Figure 11). To tell the callee service the current coordination context its work should be performed in, the initiator application includes the CoordinationContext as part of the application message, e.g. adds it as a SOAP header, when invoking the callee service. In WS-TX parlor, this inclusion of CoordinationContext in application messages is called flowing the coordination context. The callee service will then use the registration service to enlist itself as a Participant in the activity, providing the activity identifier that is obtained from the flowed coordination context. (step 4 in Figure 11). At this stage the Coordinator and this very Participant have finished handshaking and can start to exchange protocol messages. When all business logic for a coordinated activity has been executed with success, the initiator application can issue a completion protocol message to the Coordinator (step 5 in Figure 11). The Coordinator will then drive the activity to its end, e.g. by running the WS-AT Durable2PCommit protocol with the Participants that registered for it. When the activity is completed the outcome will be communicated to the initiator. It is worth noting that step 2 (initiator registering for a completion protocol) is only specified in WS-AT, not in WS-BA, as shown in Figure 10. We will see in later chapters that the lack of a standardized WS-BA completion protocol has posed a major challenge in our architectural design. There are a lot of proponents for adding such a protocol to WS-BA, including those already mentioned in [Erven, 2007] (cf. the Related Work Section 1.4.2). 4.2.2. WS-AtomicTransaction WS-AtomicTransaction is a pluggable WS-COOR coordination type that represents activities exhibiting the ACID properties. Two coordination protocols are available for WS-AT as depicted in Figure 10:
65
The completion protocol is used to initiate the commit or rollback of a transaction. Usually, the initiator application uses this protocol to ask the Coordinator to commit or rollback a transaction. Two-phase commit (2PC) protocol. This protocol ensures that all Participants reach agreement on the outcome of an atomic transaction. The protocol consists of two phases: the preparation/voting phase and the completion/commit phase. A Participant registers for the 2PC protocol in order to participate in the protocol. WS-AT defines two subtypes for two-phase commit protocol: Volatile2PC (used by Participants managing volatile resources such as a cache) and Durable2PC (used for Participants managing durable resources such as a database).
The use of Volatile2PC and Durable2PC are as follows. On receiving a Commit message as part of the completion protocol, the Coordinator will first issue a Prepare message to all Volatile2PC Participants. The Volatile2PC Participants have to respond with a vote before the Coordinator can start the prepare phase for the Durable2PC Participants. If all Volatile2PC Participants vote to commit, the Coordinator will issue a Prepare to all Durable2PC Participants. If the volatile and durable Participants all vote to commit, the Coordinator will start phase 2 by sending a Commit message to all Participants. If either a Volatile or a Durable Participant votes to abort, the Coordinator will send a Rollback message to all Participants. 4.2.3. WS-BusinessActivity WS-BusinessActivity is a pluggable WS-COOR coordination type that supports long-running transactions by allowing for subparts of the business activity to commit immediately, thereby avoiding expensive resource locking over uncertain periods of time. If the overall business activity should fail later, WS-BA uses compensating actions to reestablish a consistent state. WS-BA supports two coordination types: AtomicOutcome: This coordination type must direct all Participants to close or to compensate.
66
MixedOutcome: This coordination type may direct some Participants to close, some to compensate, and in addition, allow others to exit from the activity midways, if the business rules allows for this.
Both of these two coordination types define the following sub-protocols for the completion phase. BusinessAgreementWithParticipantCompletion: When registering for this protocol, the Participant assumes the responsibility to notify its Coordinator when all its work is done. BusinessAgreementWithCoordinatorCompletion: When registering for this protocol, the Participant relies on its Coordinator to signal the completion of an activity. As far as an individual Participant is concerned, this signaling means that no subsequent requests will be issued to the Participant, which can then safely complete the activity, e.g. by persisting to a database.
4.2.4. Common aspects of the standards The following sections briefly describe common aspects of the WS-TX standards.
State machines
The WS-AT and WS-BA standards define communicating state machines for different coordination protocols. We will give a more in-depth coverage of the concept of communicating state machines in Section 5.3. All protocol messages (notifications) to drive the protocols are defined. The standards also include state diagrams abstractly showing the events and state transitions, as well as complete state tables for a detailed description of the protocols. These state tables have been an important reference when implementing the prototype in this thesis. We include the WS-AT and WS-BA state diagrams in Appendix B and C. Refer to [WS-AT] and [WS-BA] for details of the state tables.
Fault model
A fault model is defined to handle error situations in the framework. When error situations occur a SOAP fault message is returned to the requester of the service. Each fault is a SOAP message with its Action property set to a specific URI. The message includes a fault code, a sub-code, a human readable
67
explanation of the fault, and a detailed description. An example is the Invalid State fault defined in WS-COOR. Such a fault will, for instance, be generated if a Participant receives a Commit message when it has not transitioned to the Prepared state.
Security model
The standards also define a security model. An example is to enforce that only authorized principals can register to participate in an activity. Security-related issues are out of the scope of our project, but it is worth pointing out that the WS-TX security model makes use of a number of other WS-* specifications such as WS-Security, WS-Policy and WS-Trust.
Composable architecture
The standards define a composable architecture by using the XML, SOAP, and WSDL extensibility models. Everything is not contained in each standard. Instead, mutual references are made between standards to form a complete WS-TX framework. Examples are the use of WS-Addressing as a standardized way to reference service endpoints, or the use of WS-Security as a standardized way to specify security requirements. The extensible nature of the composable architecture also allows for application-specific protocols or features to be added to the framework.
4.3. A reference model for web service transaction management

In line with the eight design criteria and the protocols specified by the WSTransaction standards, we have constructed a reference model for web service transaction management (Figure 12). This model serves a dual purpose of (1) summarizing our theoretical analyses hitherto (2) serving as a transition to the second part of the report, which focuses on building a service-oriented middleware solution upon the theoretical groundwork. 4.3.1. Reading instructions for the reference model This section contains reading instructions for Figure 12.
68
Three actors in distributed web service transaction
The model presents three distributed and stereotyped actors in any web service transaction. Here is a high-level explanation of their respective roles:
Msg exchange b/n Service Container & Coordination Framework or legacy server Optional message Transition from phase to phase in the life cycle of a transactional service orchestration WS Ressource Manager
Coordination Framework Pluggable tx protocols Pluggable tx protocols
Legacy server
Coordinator Activation Registration Completion
(Sub)coordinator A R C
Service Container Framework

Service Endpoint
Service container
(1) (2)
Enlistment
CoordinationContext
(1)
(2)
App data payload Execution & invoke other web services
(3) (4)
(5) Termination (4)
Figure 12: A reference model for SOA transaction management
(With inspiration from [Newcomer, 2004])
The Service Container Framework hosts the orchestrated services (e.g. subsidy application) as well as the invocation proxies for leaf-level legacy services (e.g. document handling). The Service Container Framework plays the Participants role on behalf of the services it hosts. On behalf of an orchestrated service, it acts as a transaction initiating Participant, or in the WS-Transaction parlance, the Participant registering for the completion protocol. For a hosted web service, it acts as WS-AT or WS-BA Participant. Interfaces for the hosted services and the Participant API are both exposed as Service Endpoints to be used by external invokers. A more detailed review of the Service Container architecture is provided in the next chapter (Section 5.5). The Coordination Framework coordinates an extensible set of distributed web service protocols. In our project, this set merely consists of transaction processing protocols, represented as circles marked with
69
Pluggable Tx Protocols in Figure 12. The Coordination Framework plays the role of Coordinator, or what traditional transaction processing theories call Distributed Transaction Manager. In our report we will use the term Coordinator for the sake of consistency. A Coordinator could be a toplevel transaction Coordinator, or an interposed subCoordinator in the composable transaction model as explained in Section 4.1.6. Both Coordinator types provide functionalities exposed by the following three service endpoints: a) Activation service via which we start a new transaction context, henceforth referred to as CoordinationContext in accordance with the WS-Transaction terminology, b) Registration service via which a Participant joins an existing transaction scope, e.g. WS-AT or WS-BA, and c) Completion service via which we drive a transaction to its termination, e.g. commit, rollback, or compensate. The Legacy Server is where the legacy services are hosted. Local resources are accessed through a local Resource Manager API. The Resource Manager also exposes another set of API for a remote Participant to intercept its local resource management at the end of a transaction. See further detail of this interception in the next chapter. (Section 5.5.3).
Three phases in the lifecycle of a transactional service
The reference model also captures the three major phases in the lifecycle of a transactional service, or service orchestration. In Figure 12, the three phases are drawn inside the Service Container, alluding to the fact that all transaction services are hosted by Service Containers. In the Enlistment phase two types of messages are exchanged between a Participant and a Coordinator. o Activation: The Participant initiating the transaction issues a request to the Activation service for creating a new CoordinationContext. Messages carrying the activation requests are represented by arrows marked (1) in Figure 12. Note that the model includes the optional use of an interposed Subcoordinator. Messages addressed to a Subcoordinator are therefore marked with dashed arrows to indicate optionality. o Registration: Participants wishing to join an ambient transaction scope send requests to the Registration service. Messages carrying registration requests are represented by arrows marked
70
(2) in Figure 12. A registration request must indicate which specific protocol it desires to join and in which CoordinationContext. Some Participants send both Activation and Registration messages. For instance, a CompletionInitiator Participant would activate a coordination context and then register for the completion protocol. Other Participants only send Registration messages to join an existing transaction context. A third type of Participant first registers for a protocol in the parent scope, and then activates a new transaction context operated by an interposed Subcoordinator. Examples of this third type are the case registration and eligibility evaluation Participants in the CAP case. In the Execution phase, actual application logic is executed, plausibly involving the invocation of one or more child services running on legacy servers. Before invoking a legacy service, the Service Container Framework creates a Participant for this service and appends the CoordinationContext as a SOAP header to the application payload. Flowing CoordinationContext together with an application payload is captured by the arrow marked (3) in Figure 12. In the Termination phase, specific transaction protocols will be driven to their termination. Completion messages are sent from the Coordination Framework to each Participant in each Container Framework. These messages are marked with (4) in Figure 12. The last type of message, marked (5), carries the resource management decision from a Participant to a legacy servers back-end Resource Manager, which could, for example, be an instruction to commit, rollback or recover. Although we have presented the three phases with an ordering of EnlistmentExecutionTermination, this ordering is only logical with respect to the intuitive progression of a transaction. In practice, however, each time the execution phase invokes a new legacy service, a new Participant will be created. Such a Participant will subsequently join the ambient transaction by sending registration request to the Coordination Framework. Therefore, Enlistment phase and the Execution phase can be interleaved. 4.3.2. Meeting the design criteria
In this section we will briefly outline how the reference model maps to the eight design criteria we have set up. These design criteria are collected in the following vignette:
71
1. Use a generic context coordination framework to turn stateless legacy services into stateful Participants. 2. Decouple the context coordination framework from the context types. 3. Accommodate multiple transaction management protocols. 4. Atomic transactions should support flat transactions with strict ACID requirements as well as nested transactions that follow the relaxed atomicity principle. 5. Long-running business transactions can have relaxed ACID properties but should provide countermeasures as alternative reliability guarantees. 6. Support for composable transaction models with heterogeneous types. 7. Leverage existing transaction support infrastructure in legacy systems. 8. Atomic transactions collaborate with the legacy systems Resource Managers to operate the distributed two-phase commit protocol.
Table 3: Summary: design criteria for transaction management in SOA
The Service Container Framework provides stateful representatives of the stateless legacy services, and these stateful counterparts then participate on behalf of the legacy services in transactional scopes coordinated by a Coordination Framework. Placing the transaction management functionality in the hands of the Service Container Framework and the Coordination Framework effectively decouples the business-level logic from the transaction protocol logic. This meets design criterion 1. The Coordination Framework is generic, allowing for multiple coordination protocols to plug in. This satisfies design criteria 2 and 3. Design criteria 4 and 5 cover specifics of two of the pluggable transaction protocols. These specifics are not captured by the high-level reference model. Design criterion 6 support for a composable transaction model - is illustrated by the use of interposed, Subcoordinators playing the bilingual role as mentioned in Section 4.1.6. Design criteria 7 and 8 advocate for leveraging existing transaction support in the legacy system infrastructure. This is reflected in the reference model by letting transactional resources expose a Resource Manager interface. A Resource Manager manages the local transactional resource and allows a remote Participant to dictate its actions through service calls. What the Resource Manager exposes is a very limited API which we will elaborate upon in Chapter 5.
72
4.4. Summary
In this chapter, we analyze what characterizes transaction management in SOA and differentiates it from transaction processing in traditional programming environments. Services are loosely coupled and often span multiple organizations independent systems connected by a wide-area network. In the web service world where asynchrony and long-running transactional scenarios are the rule rather than the exception, blocking transaction processing models such as the two-phase commit protocol are seriously challenged. New compensation-based transaction models are becoming the predominant web service transaction management protocol in lieu of the ACID-style transactions. Appropriate countermeasures must be used to provide alternative reliability guarantees for compensation-based transaction protocols. Various possible combinations of web services within a transaction often require the use of multiple protocols and an external Coordinator capable of bridging disparate execution environments. In general, web service transaction management must allow transaction protocols to be modeled as plug-ins to a generic transaction Coordinator. The analysis in this chapter operationalizes eight design criteria which form the basis for a reference model applicable in the designing of web service transaction management middleware. This reference model is used in the rest of this thesis as an overarching context and unifying framework. In the next chapter, we take a closer look into the component architecture incorporated in this reference model.
73
5. SOA transaction middleware prototype: architecture
5.
SOA transaction middleware prototype: architecture
The previous chapters 2 through 4 provide the concepts and theories that can now be applied to a prototype implementation. As a first step on the path to implementation, this chapter provides an architectural decomposition of the mental reference model for web service transaction management presented at the end of the last chapter (Figure 12). While the reference model takes the internal workings of the Coordination Framework and the Service Container Framework as given and focuses on the external message-passing among them, this chapter adopts a whitebox perspective and explores the frameworks internal architectures along with the theoretical underpinnings of the architectural design. We start by mentioning the general system requirements for our software production in Section 5.1. We then discuss the theory behind the middleware modeling in Section 5.2, placing special emphasis on how our target middleware is both similar to, and different from, traditional distributed system middleware. We conclude this chapter by presenting the logical architecture of the generic Coordination Framework (Section 5.3) and the Service Container Framework (Section 5.5). Although the architectures presented in this chapter have a higher degree of detail than the reference framework, they still operate on a relatively high level of abstraction, as they focus on architectural modeling rather than the nuts and bolts of individual components. In Chapter 6, we dive more deeply into the implementation details of software components.
5.1. System requirements

As the primary target, the implementation should produce a transaction management middleware prototype. As a secondary target, the middleware should be applied to solve the transaction challenges in the CAP case as introduced in Chapter 1. By case-testing the middleware in a tangible, manageable setting, the overall resulting system should be a proof-of-concept implementation. In other words, the production should contain both a general middleware solution and a specific transaction-enabled SOA application for the CAP case. The inevitable paradox apparent when targeting the general and the specific levels simultaneously is analyzed in Section 5.2.2. A more detailed correctness specification is given in Table 4 below. Note that well only cover the system requirements in very broad terms in this architecture
74
chapter. In the evaluation chapter (Chapter 7) we will elaborate further on the fulfillment of these criteria after implementation and testing.
1. The implementation should comprise a generic middleware application as well as a specific SOA test application, i.e. the CAP case. 2. Functionally the prototype should be capable of handling distributed flat atomic transactions and business activity transactions, as well as the interpositioning of Coordinators in distributed web service transactions. 3. Coordination of the distributed transactions should be implemented in accordance with the generic WS-Coordination standard. 4. In terms of atomic transactions, the prototype should comply with the WSAtomicTransaction standard, but only implement the Completion and the Durable2PC protocols, including, in particular the communicating state machines defined for these protocols. The implementation should work correctly for the account payable service in the CAP-case. 5. In terms of business activity transaction, the prototype should comply with the WS-BusinessActivity standard but only implement the CoordinatorCompletion protocol, including, in particular the communicating state machines for this protocol. The implementation should work correctly for the eligibility evaluation service in the CAP-case. 6. In terms of Coordinator interposition, the prototype should comply with the WS-Coordination standard. The implementation should work correctly for the subsidy application service in the CAP-case. 7. In terms of nested atomic transactions that follow the relaxed atomicity property, the prototype should implement them with nested WSBusinessActivity protocols using Coordinator interposition. The implementation should work correctly for the case registration service in the CAP case.
Table 4: The Correctness criteria
5.2. Middleware modeling

This section discusses the theoretical underpinning for the architectural designs. In Section 5.2.1, we focus on how our target middleware is architecturally and functionally analogous to traditional distributed system middleware. In Section 5.2.2, we shift the focus and look at the factors that render the development of our target middleware architecturally more challenging than traditional distributed system middleware.
75
5.2.1.
Layered system architecture
In distributed systems, layered system architectures are often used to achieve separation of concerns. Figure 13 outlines a classical setup of the layered network protocol communication stack, where the middleware layer is sandwiched between the application layer and the transport layer. Agreements between layers are accomplished by having open systems use standard rules (a.k.a. protocols). By letting the middleware layer handle distribution-related issues such as replication, multicast, remote method invocation, etc., the application programmers can focus on the domain-specific business logic and blissfully ignore the software components being spatially or temporally distributed. Such an advantage is called distribution transparency, alluding to a distributed systems ability to present itself to users and application programmers as if it were a single computer system. Distribution transparency is traditionally regarded as the major motivation for having a middleware layer. Motivations for adopting the layered system architecture also include better comprehensibility, maintainability (since each layer can be changed without affecting the other layers, except probably the immediate neighbors) and loose coupling (since each layer usually only interfaces with the layers immediately above or below it). In addition to distribution transparency, a middleware layer also provides a convenient programming model for application programmers [Coulouris, 2005, p. 32].
Figure 13: Middleware layer on the network communication protocol stack
(Source: [Tanenbaum, 2006, p. 123])
Many application servers today have built-in middleware offerings to handle common system functionalities such as remote method invocation, persistence, concurrency management, etc. However, application server middleware is
76
platform-dependent. For instance, although Java EE and .NET application servers both offer built-in support for remote method invocation, Javas RMI is not interoperable with .NET Remoting. Nor is Java EEs built-in support for transaction management 1 (CMT, BMT) compatible with, for instance, .NETs System.Transactions namespace. When the application layer is serviceenabled, the platform-dependent middleware layer is no longer sufficient. What we need is an interoperable middleware layer, and the obvious approach to making the middleware layer interoperable is to service-enable it by converting all protocol-level method calls to XML-based web service invocations. Fortunately the web service transaction protocols are now standardized as the WS-TX family, and are ready to be incorporated into the middleware implementation. (Section 4.2). Thus, the web service transaction handling middleware resembles the traditional middleware solution in that it masks the heterogeneity of underlying platform differences. SOAs standard-based approach is a logical extension of traditional distributed systems protocol-based open system concept. The functionality of a SOAP header is analogous to an IP header, TCP header, etc. Also analogous to traditional middleware, a service-level middleware layer provides separation of concerns by decoupling the transaction processing logic from the application domain logic. Of course, the application programmer still needs to articulate the transactional needs explicitly either programmatically against an API, through annotation/reflection, or declaratively in a configuration file. 5.2.2. The paradox between generality and specialization The idea of making the middleware layer applicant-independent to achieve distribution transparency is theoretically appealing. In practice, however, it can be very difficult to achieve a 100% clean separation between application logic (i.e. the specific) and protocol logic (i.e., the general and application-independent). In the process of implementing our middleware prototype, tension has arisen multiple times between the ambition of making the transaction management middleware generally applicable, and the need to adapt and optimize this middleware to the case at hand. We illustrate this paradox with two examples in the following subsections.
CMT: Container-managed transaction demarcation. BMT: Bean-managed transaction demarcation.

1
77
A generic coordination framework vs. specific protocols
We aim to implement a generic Coordination Framework capable of accommodating not only an array of transaction protocols, but also other distributed system protocols such as multicast, replication, etc. Although these other protocols are not the concern of the current project, the Coordination Framework should nonetheless be open and extensible, allowing more protocols to plug in. This adaptability is also the design guideline behind the WSCoordination standard. When designing the Coordination Framework component in our middleware, we have the option of making several versions of a Coordination service, e.g. a TransactionCoordination service, MulticastCoordination, etc. The advantage of this option is that each type of Coordination service can be specialized and optimized for the performance of that protocol, which is especially useful as the coordination policies behind these protocols are rather different. The disadvantage of such a specialization is the sacrifice of generality and flexibility, as an individual Coordination service is necessary for each type of protocol.
Figure 14: Interceptor: bridging the generic coordination service and specific protocols
In this project we have chosen to prioritize generality by adding an extra indirection from a general Coordination Service to the Transaction Protocol component. This extra indirection takes the form of an interceptor, illustrated in Figure 14. Each incoming message to the Coordination Framework is received by the Coordination Service, and collected by the interceptor. The interceptor is basically a routing component, which parses the header of the message and consults with a Protocol Table in order to ensure message routing to the correct protocol component. This form of intercepting is a special instance of contentbased routing [Chappell, 2004, p.129], which covers a wide array of routing scenarios. In principle, between the service receiver (i.e. the Coordination Service) and the framework component performing the actual protocol logic (e.g., the Transaction Protocol Component), there could be a pipeline of
78
interceptors taking care of security, logging, monitoring, etc. Interception is an important building block in SOA and an ESB construct which is, for instance, widely used in Microsoft BizTalk Server. The use of such a software construct to break the usual flow of control also makes it possible for the protocol handling logic to be decoupled from the generic Coordinator and even deployed elsewhere. The strength of the interceptor approach is flexibility and generality. The price to pay is complexity, extra indirection and potentially bloated middleware or compromised performance. Another place we have applied the interceptor concept is in the area of context management, illustrated in Figure 14.
Figure 15: The use of the interceptor principle for context management
(Source: [JBoss, 2006])
Since coordination contexts (including transaction contexts) need to be propagated together with the application messages, we intercept the outgoing application message and compose it together with the context. Similarly, at the receiving end, the ingoing message is preprocessed by the interceptor. After decomposing the ingoing message into the actual application payload and transaction context, the interceptor dispatches the application payload to the application layer, and the transaction context to the transaction processing component. Again, this practice is similar to the lower layers analogous composition/decomposition of IP and TCP headers.
Protocol outcome highly dependent on application logic
In the case of flat atomic transactions, the task of separating protocol code from application code is relatively easy, since the decision to commit or abort in the completion phase is not determined by business domain logic. In the case of nested atomic transactions and the potentially long-running business transactions, however, the protocol layer cannot unilaterally determine the transaction outcome. For instance, the business logic may dictate a global
79
commit of the case registration service in the CAP case, even when either the applicant registry or document handling has failed. This allowance is due to the relaxation of the atomicity principle in nested atomic transactions. With a higher level of complexity, the eligibility evaluation service interweaves the elaborate workflow and looping logic with the transactions completion phase. Common to the WS-BA and the nested WS-AT protocols is that they can only be implemented if application requirements are taken into account. While the two-phase commit phase in the classical flat atomic transaction can virtually take place in the middleware space without the interference of the application logic, the completion phase governed by WS-BusinessActivity can only occur through synchronization and collaboration between the application layer and middleware layer. Defining a one-size-fits-all communication interface between the application layer and the middleware layer is a daunting task, as the heterogeneous roles application logic plays in determining the transactions outcome are difficult to generalize as fitting into a fixed number of categories. This relation partially explains why the WS-BusinessActivity standard has not attempted to standardize the communication between the business application and the transaction protocol, leaving it as an implementation-specific issue. In our implementation of the collateral determination of transaction outcomes we have chosen to prioritize specialization and keep the transaction middleware closer to the application level instead of the transport level. In other words, we have not attempted to define a standard and one-size-fit-all interface for any business application to collaborate with any middleware protocol. It is the responsibility of the programmer of the orchestrated service to define and implement this communication. This decision to go the specialization route is also inspired by the end-to-end argument mentioned in [Coulouris, 2005, p. 3334] and [Tanenbaum, 2006, p.54 ff.]. Both authors acknowledge the beauty of modeling middleware to handle extra functionalities independent of applications. But both also drew attention to the erroneous belief that all communication activities can be abstracted away from the programming of applications by the introduction of appropriate middleware layers [Coulouris, 2005, p. 34], and called for sober judgment in each specific case. There are cases where the protocol layer needs to extend its tendrils into the applications address space and vice versa. Business transaction management in a modular vacuum does not work. Nevertheless, as implementation progresses, our ad-hoc approach to specializing the middleware layer has shown weaknesses. The absence of a
80
clear-cut communication API between the application service layer and the protocol layer introduces tighter coupling between the orchestrated services and the transaction framework. This absence places more programming burden on the service programmer, compromises the transparency principle and invites the pollution of the application space with protocol code. Defining a more generalized collaboration API for the application layer, which supplies business logic to the protocol layer, for the purpose of flexible transaction completion, is a target for future work. In order to mitigate the negative effects of specialization, we have made our best-efforts to make dependencies between the middleware and the application layers explicit by using design patterns (Section 5.5).
5.3. Communicating state machines

Before we delve into the architecture of each framework, we will bring forth an important aspect of back-end logic that is common to both frameworks the communicating state machines. To get a visual preview of the roles of state machines in the individual frameworks, refer to Figure 17 and Figure 19 below. The formal analysis of communicating state machines is inspired by Sipser [1996], Casavant [1990] and Godskesen [2007]. Communicating state machines are a central element in the WSAtomicTransaction and WS-BusinessActivity protocols. Both protocols have elaborate state tables defined for state transitions, pre- and post-transition events, as well as error-handling logic related to illegal state transitions or unexpected events (See [WS-AT] and [WS-BA] for protocol details). A distributed transaction is coordinated via the communication of a state machine at the Participants site (i.e. the Container Framework) and a corresponding state machine at the Coordinators site (i.e. the Coordination Framework). When talking about a Coordinators state machine, we mean the Coordinators view of a state machine belonging to one of the Participants in the transaction under the Coordinators coordination. A more conceptual way to put it is that the Coordinators state machine reflects the Coordinators knowledge of which phase of the transaction lifecycle a Participant is currently in. With that knowledge, a Coordinator can answer questions such as - Have all Participants completed the preparation phase, etc.
81
For a WS-AT transaction, a Coordinator operates one single state machine for the completion initiator Participant, defined by the WS-AT completion protocol. If there are two Participants that join this WS-AT transaction later on, the Coordinator will operate two more 2PC (two-phase commit) state machines in addition to the state machine for the completion initiator. Different state machines are defined for the completion protocol and the two-phase commit protocol in WSAT. For a WS-BA transaction, the WS-BA standard defines no completion protocol, which is why only one type of state machine is defined for all WS-BA Participants. A state transition is triggered by either a pre-transition internal event or service invocation event in the form of one-way service calls between the Participant and the Coordinator. Each state transition could in turn trigger a posttransition internal event or service invocation event. A Coordinators view of a Participants state machine and the Participants actual state machine could thus be temporarily out of sync due to service invocation latencies but should eventually converge on the same global state. This temporary out-of-sync behavior is inevitable in a distributed transaction system. In the following discussion, we will use the relatively simple state machine for the WS-AT Completion Protocol as an illustrative example, as shown in Figure 16. Other state machines communicate under the same principle. See Appendix B and C for a complete collection of state machine diagrams as defined in the WS specifications. The diagrams in the appendices are equivalent to what we show in Figure 16, although the WS specifications use a less formal definition for input and output events. In formal terms, Casavant [1990] defines communicating state machines (a.k.a. communicating finite automata) as a modeling technique for distributed computation based on a combination of directed graphs. In communicating state machines, a transition is either an output, an input or an internal transition. The following definition of communicating state machines adopts, but slightly adapts, the definition of finite automata by Sipser [1996], p. 35:
M = (Q ,
, , q )
0
(1)
where Q is a finite set of state
82
is a finite set of input events
is the transition relation, and q0 Is the initial state
Figure 16: Communicating state machines for WS-AT Completion Protocol
Defining q a! q' as an output transition, q a ? q' as an input transition, and as an internal transition, the transition relation can thus be further specified as
q q'
(Q
{!, ?} Q) (Q Q)
(2)
Sipser [1996] uses the term input alphabet, which we have generalized here as input events.
83
where the first part of the union captures the input and output transitions, and the second part describes the internal transitions. Let M c = (Qc , , , qc 0 ) denote the Coordinators state machine and
M p = (Q p ,
completion initiator). Further more let M = (Qc Q p , , , qc q p ) denote the composition of M c and M p . Applying this formal definition to the state machines shown in Figure 16, we observe the following: M c and M p have the same set of states Q . That is, Qc = Q p . M c and M p run in parallel, and handshake/synchronize on dual input/output events. In this particular communicating M , internal state transitions are absent. All state transitions are triggered by output/input events. The composition M is a closed system, since its components ( M c and M p ) only communicate internally. An output action a! in M c can be executed only if a corresponding input action a? in M p is enabled, and vice versa. [Godskesen, 2007]. Given the above definitions and observations, the target for our implementation becomes clear. It is the network of all state machine composition M ' s , which, in turn, consist of pairs of communicating state machines M c ' s and M p ' s . When introducing components in the two frameworks in the ensuing text, we will omit common aspects already covered in this section regarding communicating state machines. Implementation aspects will be discussed in Chapter 6, Section 0, in which well talk about the use of the State and Flyweight design patterns, etc, to model state representations, transitions and events, as well as the interaction of M ' s with its environment (i.e., its invokers and invokees).
, , q
p0 )
denote the state machine of the Participant (i.e., the
5.4. Generic coordination framework

Having explored the theoretical background for modeling middleware, we devote the remainder of this chapter to the presentation of the architectures of the Coordination Framework and the Service Container Framework.
84
Figure 17 outlines the generic Coordination Framework, whose architectural design follows the layered approach, with cross-cutting concerns such as logging, monitoring and error-handling, etc. spanning all layers. Below is a brief outline of each layer and how they collaborate. The Coordination Framework is deployable as a standalone component and does not need to consult application logic to conduct the coordination.
1. Service facade layer Activation Service 2. Protocol preprocessing layer Invocation Handler
<Singleton> Protocol Table
Registration Service
TxCoordinator Service
3. Transaction protocol layer State Machine
Transaction Manager
<Singleton>
State
Coodinator Coodinator Coordinator
Participant Participant Participant CoorView CoorView CoorView
<Singleton> <Flyweight>
WorkerThread pool
4. Data access layer
Persistence Manager
<Singleton>
RecoveryManager
Service Agent
5. Framework data store
6. Services
Figure 17: Architecture - Coordination Framework
Service faade layer
The service faade layer consists of instances of the ActivationService, RegistrationService, and a range of typed TxCoordinatorServices. These services implement the contracts defined by the WS-COOR, WS-AT and WS-BA protocols. This layer is implemented with the faade design pattern [GoF, 1995, p.185], and provides a high-level, unified, and simplified interface for accessing a complicated subsystem.
85
Protocol preprocessing layer
The protocol preprocessing layer contains the Interceptor pipeline as described in Section 5.2.2. In our prototype, the Interceptor role is performed by a Singleton InvocationHandler object that fetches the protocol-type information from the SOAP header and looks it up in a ProtocolTable. The InvocationHandler then routes the message to the relevant protocol processing component - in our case, the transaction protocol layer. Note that the TxCoordinatorServices bypasses the protocol preprocessing layer and talk directly with the Transaction protocol layer. This communication relationship is due to that fact that the TxCoordinatorServices implements the transaction-related protocols such as WS-AT and WS-BA. Therefore, Interceptors help is not needed to direct to the correct protocol processing component.
Transaction protocol layer
The transaction protocol layer contains the actual transaction processing logic. A Singleton TransactionManager keeps a map of Coordinators, each coordinating a single transaction. Every Coordinator, in turn, keeps a map of ParticipantCoorView objects, which are the Coordinators View of a certain Participant. Each ParticipantCoorView object operates a finite state machine on behalf of the corresponding Participant. How Coordinators state machines communicate with Participants state machines in the Container Framework has been described in Section 5.3. Activation and Registration service calls are directly handled by the TransactionManager, resulting in the creation of a new Coordinator (in the case of an ActivationService) or the adding of a new ParticipantCoorView to those managed by an existing Coordinator (in the case of a Registration service). The TransactionManager contains an extensive array of factory methods for threadsafe identity management, lookup, correlation, etc. The implementation of the finite state machine uses the State pattern as mentioned in [GoF 1995, p. 305]. We defer the argumentation for the use of this pattern and the implementation details to Chapter 6. For now let it suffice to point out that every State object is a Singleton Flyweight [GoF 1995, p. 195] that is shared among all transactions, and that all State objects delegate all outbound service invocations to background WorkerThread objects in a thread pool.
86
Data access layer
The data access layer provides a unified interface for managing persistence. In terms of the transaction management protocol, persistence of data structures, states, etc. is handled by a PersistenceManager. In our prototype, the PersistenceManager has only a minimal dummy implementation. All outbound service invocations go through a Service Agent. Since service invocation code tends to be heavily peppered with fault-handling code, collecting service invocation methods in one Service Agent component prevents service invocation code from being spread around the protocol layer.
Data layer
The data layer consists of a data store, as well as an endpoint directory for outbound service invocations. As far as transaction management is concerned, the Endpoint Directory keeps records of the Participants service endpoints.
5.5. Service Container framework

Coordination Framework is only one side of the equation in our prototype. This section covers the other side of the prototype, the Service Container Framework. While the Coordination Framework only concerns protocol logic, the Container Framework also needs to deal with the interaction between application-level logic and protocol-level logic, since it is the very place where services are linked together to form transactional scopes. The Service Container Framework is thus an integration component that can be deployed on a service bus or as a standalone installation. Prior to introducing the Service Container Framework, we digress to explain the concept of Service Container. 5.5.1. The Service Container concept Figure 18 uses the standard and the well-known Java EE Container concept as a form for explanation of the ideas behind the Service Container concept. The term Container is a mnemonic for the architectural style of an application component being contained in a server platform called the Container. Just like an Enterprise Java Bean (EJB) Component is contained in the Java EE Container, a Service Component is contained in a Service Container. Functionally, the Container is used to host the Component and enrich it with general system functionalities (e.g., remote method invocation, thread management, transaction
87
management, etc.) in a way that is transparent to the application programmer. The primary motivation for using the Component/Container architecture is to separate the domain-specific logic from the general. The more system functionalities the Container is responsible for, the easier it is to develop the Component itself. Of course, there must be a standard contract between the Component and the Container, as illustrated by the circles marked A and B in Figure 18. Interface A is the API supplied by the Container and used by the Component to access system functionalities (e.g., life-cycle management), while Interface B is the API supplied by the Component and used by the Container (e.g., transaction configuration parameters).
Figure 18: Java EE Container & Service Container
Internally, the application component programmer gets system functionalities for free from the Container. Externally, the Container pretends as if it is the Component by supplying the same external interface. This external interface is called a component interface in Java EE, and an abstract endpoint in a Service Container. External interfaces are represented by the circles marked as C in Figure 18. Every incoming call to the component is intercepted by the Container, which fulfills system functionalities such as checking security, starting or stopping transactions, logging, etc., before dispatching the call to the application component. Outgoing messages are similarly intercepted by the Container. Sometimes we say that the Container acts as a proxy for the actual component. Seen from the outside, the Container is the Component because they offer the same interface. In this manner, system functionalities are performed transparently to both the external invoker, and in most cases, also to the internal component. The endpoint exposed by the Service Container is abstract in the sense that it is a location- independent service contract. Whether the actual implementation of this contract is a proxy hosted by a Service Container or the actual service instance running on a legacy server is purposefully abstracted away from the service consumer.
88
Just like an EJB can call out to other EJBs hosted in other distributed Java EE Containers, a Container service can also invoke other services hosted in other distributed Service Containers. Hence, remote invocations between distributed Containers should also be governed by a standard API. Having focused on the similarities between the Java EE Container and Service Container architectures, it is worth noting that the technological standardization processes supporting these two architectures are in different maturity phases. While the Java EE specification is an industry standard adopted by all major vendors of Java Enterprise Servers (IBM, BEA, JBoss, etc.), Service Container remains a relatively new field for experimentation, as there are currently no standard APIs and industry-wide specifications to make the potentially many solution offerings of Service Containers interoperable. This observation seems contradictory to the intuition that the basic motivation for using SOA is interoperability. Indeed, by packaging legacy business logic as web services, basic business-level interoperability is achievable. However, web services, just like their platform-dependent cousins such as the Java EE component, necessitate much more than the exchange of bare-bone business logic. Standardization of protocol exchange mechanisms between different Service Container implementations is still a work in progress which has gained much momentum in recent years. Extending bare-bone business services with standard-driven system functionalities is also referred to as extending business services with a SOA stack or second-generation SOA. The WS-TX protocol family falls under this category. In the absence of a standard specification for achieving the transparent exchange of protocol-level messages between Service Containers, we have to expose both application-level and protocol-level endpoints from the Service Container to the outside world, the latter being represented by the circle marked as D in Figure 18. Likewise, the lack of a standard API between the Service Component and the Container is the reason why we have to devise such an API ourselves in the prototype implementation. 5.5.2. The Service Container framework Figure 19 outlines the Service Container Framework. Notice that although the current framework only supports transaction processing protocols, it is designed as an extensive framework, allowing for the inclusion of other protocols.
89
Service faade layer
The service faade layer consists of instances of Container Services (i.e., implementation of the abstract endpoint) and Participant Services, which implement the Participants side of the contracts defined by the WS-COOR, WS-AT and WS-BA protocols. Ideally, when the Service Container standardization process has borne fruit, the Participant Services will retreat into the background and cease to be part of the Service Faade layer, just as the Java EE system-function RMI invocations between Containers occur transparently. Similar to the Coordination Framework, the Service Facade layer provides a high-level, unified, and simplified interface for accessing a complicated subsystem.
Logging, Monitoring & Error Handling

Ressource Manager
Figure 19: Architecture - Service Container Framework
Interaction layer
The interaction layer plays a vital role in the Service Container Framework and models how the application works in conjunction with the transaction middleware to orchestrate the transaction outcome. This layer is our nonstandardized implementation of the interaction API between the application logic and the middleware protocol logic, serving the function depicted by the two green circles marked as A and B in the Service Container diagram in Figure 18.
Ressource Manager
Configuration
90
When modeling the interaction layer, we use the Command design pattern [GoF, 1995, p.233] to accomplish loose coupling and separation of concerns. We defer the discussion of this pattern to the next chapter. For now let it suffice to articulate that a Container Service from the service faade layer delegates all application and protocol work to its own Command object. The major role of a Command object is to interweave the application workflow (encapsulated in WorkflowRuleSet objects) and the transaction protocol state transitions in a loosely coupled manner. See Section 0 in the implementation Chapter 6 for a detailed explanation of the Command pattern. In collaboration with the ChildServiceInvocationThreads, the Command objects will drive both the workflow and the protocol state machine to termination.
Transaction protocol layer
The transaction protocol layer contains the actual Participant-side transaction processing logic. A Singleton TransactionManager keeps a map of typed Participants and a palette of factory methods for thread-safe identity management, lookup, correlation, etc. Every Participant object operates a state machine, synchronized with the state machine operated by its ParticipantCoorView counterpart in the Coordination Framework as described in Section 5.3. During the Enlistment phase, Participants invoke the Activation and Registration services in the Coordination Framework. Also similar to the Coordination Framework, every State object is a Singleton Flyweight shared among all transactions, and all State objects delegate all outbound service invocations to background WorkerThread objects selected from a thread pool.
Data access layer
The data access layer provides a unified interface for managing persistence. In terms of protocol-related persistence a PersistenceManager is responsible for persisting protocol data structures, states, etc. in collaboration with a RecoveryManager. In our prototype, the PersistenceManager has only a minimal dummy implementation. In terms of business logic specific persistence, a Data Access Component has sole responsibility. Notice that there is no Data Access Component in the Coordination Framework. This absence is due to the lack of application-specific logic. All outbound service invocations are done through a Service Agent. Note that ChildServiceInvocationThread components sometimes bypass the transaction protocol
91
layer and talk directly with the Service Agent, for instance, when calling out to a remote legacy business services.
Data layer
The data layer consists of a data store, as well as an endpoint directory for outbound service invocations. In our target prototype, implementation of a data store is out of scope and persistence logic is simulated with dummy objects. 5.5.3. Intervening back-end resource management in WS-AT Figure 19 fails to capture in sufficient detail the component architecture governing communication between the Participants, the Transaction-aware legacy services and their back-end resource management. This communication plays a pivotal role in allowing the persistence and recovery of the back-end work to be done at the end of a transaction. For instance, how does a remote Participant engage the back-end Resource Manager in the participation of the twophase commit in WS-AT transactions? We have zoomed in on this architectural element in Figure 20, where we use the case registration service as an illustrative example. For the sake of brevity we have only included one child service, document handling and left out the applicant registry service. We have also omitted the Command and ChildServiceInvocationThread objects in order to focus on essentials.
Figure 20: Intervening back-end resource management in WS-AT
92
On the Service Container side, the case registration service runs a distributed atomic transaction with the document handling service, which executes on a remote legacy server and utilizes a remote data store. The invocation of this legacy service is done through a document handling proxy that is also responsible for propagating the transaction context to the legacy service. The document handling Participant is the entity performing the work pertaining to transaction management, on behalf of the legacy document handling service. In order to do this work, the document handling Participant needs to communicate with the back-end Resource Manager and intervene in the back-end prepare, commit or rollback. To this end, our prototype requires that each legacy service provider supply a Resource Manager Endpoint that implements a set of API with the purpose of externalizing a minimal set of legacy resource management primitives (prepare, rollback, commit, etc.). This service API can then be used by the remote Participant to intercept the local two-phase commit in a service-oriented manner. We have drawn the Resource Manager Endpoint and the document handling service as two logical components in the legacy server, in Figure 20. Practically speaking, we can use a single physical service class that implements two service contracts. We call the exposed legacy service endpoint document handling a transactionaware service, because it has the responsibility of creating a Resource Manager with the Transaction ID flowed into the transaction context. Afterwards, the Resource Managers endpoint is passed on to the Service Container, thereby establishing the connection between the Participant and the local Resource Manager, making it possible for the Participant to remotely control the Resource Managers work on the local resource. Preferably, this API requirement for the legacy service provider should comply with existing industry standard, and not apply to a proprietary interface we think fits. A good candidate for these resource management primitives is a subset of those defined in the industry standard XA interface based on the X/Open CAE (Common Applications Environment) Specification. The XA interface has mappings in both Java (the XAResource interface in JTA (Java Transaction API 1 ) and the .NET platform (The IEnlistmentNotification interface 2 ). Service-enabling the X/Open specification, in particular the XA interface, is the point of departure when we design the Resource Manager interface.
1 2
http://www.datadirect.com/developer/jdbc/topics/jta/index.ssp http://msdn2.microsoft.com/en-us/library/system.transactions.ienlistmentnotification.aspx
93
By using a simple Resource Manager API to utilize the back-end resource management logic, we also satisfy design criteria 7 mentioned in Chapter 4 leveraging existing transaction support infrastructure in legacy systems. There is currently only a dummy implementation of the Resource Manager in our prototype, since we have excluded persistence from the project scope. Nor have we service-enabled the Resource Manager endpoint contract, though doing this would be a straightforward task.
5.6. Summary
The focus of this chapter is the architectural design of our target implementation. We begin by specifying the system requirements as the production of a software package consisting of two subcomponents: a middleware framework as well as a small SOA application based on the CAP case. We then review the role of middleware in the layered system architecture and explain how the SOA transaction processing middleware resembles traditional distributed systems middleware as a means to mask heterogeneity, achieve transparency and facilitate loose coupling. We also dwell upon the paradox between modeling generic middleware and specialized middleware, a challenging aspect in the design of our middleware prototype. We give two examples of how the Interceptor design construct is used in our implementation in order to adapt and customize the middleware by bridging between the general and the specific. We also discuss how we handle the challenge of interweaving middleware layer logic and application layer logic, particularly in the WS-BA and nested atomic transaction context. The concept of communicating state machines plays a pivotal role in the collaboration between the Service Containers and the Coordination Framework. This topic is given an emphatic treatment, particularly with respect to how a global state transition is triggered by first having state transition in a Coordinators or Participants state machine, which in turn triggers the generation of an event (e.g., a protocol message passing) that leads to state transitions in other state machines. Next, we explore both sides of the equation for a web service transaction processing framework, the Coordination Framework and the Container Service Framework. In relation to the Container Service Framework, we briefly introduce the Service Container concept, drawing parallels to the Java EE Container. An
94
emphatic exploration is provided as to how our framework remotely intercepts a legacy-servers back-end resource management. Adhering to the top-down approach, this chapter serves as a logical continuation and architectural realization of the mental reference framework introduced in Chapter 4. At the same time, this chapter serves as a transition to a finer-grained discussion of implementation specifics in the next chapter.
95
6. SOA transaction middleware prototype: implementation

Having covered the theory, the reference model, and the architecture, we now need to roll up our sleeves and go about implementing the ideas. The previous chapter deals with the design at the architectural level. This chapter deals with design at the software development level, including component and class structures, collaborations, etc. We will start by introducing the development environment in Section 6.1. We then outline the component and data structures in Section 6.2, and explore the behavioral aspects in Section 6.3. A number of other implementational aspects such as logging, error handling and concurrency management are described briefly in Section 6.4. Throughout the chapter we will refer to various classes and components. The collection of the source code can be found in Appendix E, which is separately attached both in print and as a CD. The entire Visual Studio solution, named athena, can also be downloaded at: http://www.itu.dk/people/decorus/athena.zip.
6.1. Development environment

The prototype is implemented in C# 2.0 using Visual Studio 2005 on Windows XP. Windows Communication Foundation (WCF) is used as the web service implementation platform. In order to develop and execute WCF applications the Windows Vista SDK and .NET 3.0 runtime components are installed. All web services are hosted in Internet Information Services (IIS) 5.1. Other tools in our programming environment include Subversion for version control, NUnit as the testing framework, and NAnt for build and test automation.
6.2. Modeling Structure

In this section, we outline the overall structure of the prototype, including how the Visual Studio solution is organized into projects. We provide an overview of the software components, including class responsibilities and collaborations. The use of contract-driven design, which means always coding against a predefined set of interfaces, is exemplified, followed by a description of the essential data structures.
96
6.2.1. Overview of VS solution All code pertaining to the prototype is contained in one single Visual Studio solution. Each project contains one or more components in the architecture. The VS solution overview is shown in Figure 21. The five projects highlighted with rectangles are the main components of the transaction middleware. The projects named *.tests contain the unit tests of the corresponding component. The BusinessServices project contains the service- and data contracts for all legacy services from the CAP case, as well as simple implementations of the 6 leaf services: applicant registry, document handling, fund reservation, sampling and control, fund transfer and general ledger. Composite services such as account payable are included in the ContainerService project, as these services depend on Container interception in order to participate in transactions.
Figure 21: Visual Studio solution and projects
6.2.2. Overview of middleware components and classes Figure 22 zeroes in on the major classes for each of the five highlighted components in Figure 21. The Coordination and Container frameworks each consists of two sub-components: the back-end core illustrated by the two boxes at the right and left side of the figure, and the outward facing service layers illustrated by the two boxes in the middle. The outward facing service layers are divided by a network, which is not necessarily a physical network. To
97
give an idea of class collaboration, WS-COOR and WS-AT messages exchanged between ContainerService and CoordinationService are also shown in Figure 22. Implementation and message exchange with respect to WS-BA is omitted, since it follows a similar pattern. In the following, essential classes in each of the highlighted components in Figure 22 are described briefly in the setting of WS-AT.
Figure 22: Overview of the Service Container Framework and Coordination Framework
Common
The project Common is, as the name suggests, a common library containing service contracts, interfaces, abstract classes, default implementations and utility classes etc, to be used and referenced by other projects. Internally, the Common library groups related definitions as namespaces. For instance, the Common.DataContracts.COOR namespace contains C# classes representing
98
the WSDL data types and interfaces defined in WS-COOR. We use the Common project to promote contract-based design, reuse and loose coupling. The use of Common effectively converts potential spaghetti dependencies between pairs of components to a unidirectional dependency on Common.
ContainerFramework
This component implements the OO back-end in the Service Container architecture, mapping to the transaction protocol layer and data access layer in Figure 17, p. 85.
TxManagerSvcContainer provides the CRUD (Create, Read, Update and
Delete) facilities for managing critical data structures such as the ParticipantMap. The service front-end relays all correlation and lookup to this Singleton class.
ResourceManager2PC is an OO simulation of a legacy-server Resource
Manager, implementing the commit, abort, rollback interfaces, etc.

ATCompInitiatorParticipant and ATParticipant play the Participant
role on behalf of a web service. They operate the Container side of the communicating state machines by delegating all state-related behavior to their current State object.
ATCompInitiatorStates and AT2PCStates objectify all states defined in
the WS-AT completion protocol and 2PC protocol, respectively. State objects are Singleton Flyweights shared between all participants (see Section 6.3.1.).
ATWorkerThread objects are used by State objects to execute the post-
transition service-invocation events in background threads. ATWorkerThreads delegate to ATProtocolSvcInvocationUtil to perform the actual service invocation.
ATProtocolSvcInvocationUtil is a utility class containing fine-grained
service invocation code, including service directory lookup, fault handling and fault propagation.
ContainerService
CS<ServiceName> implements an abstract service endpoint. CS stands for
Container Service, as it is hosted inside a serviced Container. A Container
99
Service is responsible for (1) acting as a transactional proxy for a nontransactional legacy service (2) performing Container interception (such as marshalling and un-marshalling the SOAP header with Coordination Contexts, etc.), transparent to the service invoker.
<ServiceName>Command encapsulates the interaction API between the
application layer and the transaction middleware layer. A Container service delegates to its Command object to interweave the business workflow with the underlying transactional flow. (See Section 6.3.2 for Command pattern details.)
ATParticipantService implements the service contracts for WS-AT
Participants. Theses service contracts are defined in the Common project in accordance with the WSDL specifications for WS-AT.
CoordinationFramework
This component implements the OO back-end in the Coordination Framework architecture, mapping to the transaction protocol layer and data access layer in Figure 19, p.90.
TransactionManagerCoor provides the CRUD (Create, Read, Update and Delete) facilities for managing critical data structures such as ActivityMap.
The service front-end relays all correlation and lookup to this Singleton class.
PersistenceManager is an OO simulation of persistence logic, mapping to
the data access layer in the Coordination architecture (Figure 17). It is responsible for persisting transaction states at different checkpoints, e.g. when the preparation phase of 2PC is completed.
ATCoordinator implements the role of a transaction coordinator. For each
transaction or sub-transaction, an ATCoordinator operates communicating state machines with each distributed Participant. State-specific behavior is delegated to the responsible State object.
ATComplInitiatorParticipantCoorView and AT2PCParticipantCoorView represent an ATCoordinators view of a remote Participant, particularly
the Participants current execution state.

ATCoorComplStates and ATCoor2PCStates are counterparts to the State
objects at the Container side.
100
CoordinationService
This component implements the service front-end in the Coordination Framework architecture, mapping to service faade layer and interaction layer in Figure 19.
ActivationService implements the ActivationPortType in WS-COOR, including the CreateCoordinationContext() service operation. RegistrationService implements the RegistrationPortType in WS-COOR, including the Register()service operation. InvocationHandler intercepts all incoming messages and routes them to the
relevant protocol layer. In our prototype, the InvocationHandler only deals with the transaction protocol. However, it is designed to be extensible to support different protocols and contain advanced routing logic.
ATCoordinatorService implements the service contracts for WS-AT
Coordinators. These service contracts are defined in the Common project in accordance with the WSDL specifications for WS-AT. 6.2.3. Essential data structures, lookup and correlation Our prototype only uses in-memory data structures, as persistence implementation is considered out of scope. Each Coordinator and Participant has a C# Guid as their Global Unique Identifier. The multitude of Coordinator and Participant objects are collected as associative maps, i.e., C# Dictionaries, keyed by their Guids. Figure 23 illustrates the responsibility distribution for managing data structures in the Coordination Framework. Similar correlation logic is implemented at the Container side. The TransactionManager contains an ActivityMap comprising all active 1 Coordinators, keyed by their Guids. Each active Coordinator, in turn, uses a ParticipantMap to keep record of all Participants, keyed by the Participants Guid. GetCoordinatorByGuid() and GetParticipantByGuid()are threadsafe methods for looking up a Coordinator or a Participants, respectively.
A Coordinator is considered active if one or more of its participant has not propagated to the None (AT) or Ended (BA) state. Inactive Coordinators are eligible for garbage collection, which is not implemented in our prototype.
101
Figure 23: Essential data structures for lookup and correlation
The WS-TX standards define all protocol messages (e.g. Prepare, Prepared, Commit, Committed, etc.) as the Notification data types, which translate into the Notification classes in our implementation (Figure 23). As WS-TX prescribes the asynchronous, one-way, fire-and-forget service invocation pattern, we need a way to correlate a one-way request with its one-way response. This is done by including two identifiers in all Notification messages: TransactionIdentifier and ParticipantIdentifier. These identifiers are then passed from the service faade layer to the back-end Transaction Manager, which uses the TransactionIdentifier to look up the responsible Coordinator in the ActivityMap. The Coordinator, in turn, uses the ParticipantIdentifier to look up the corresponding Participant in the ParticipantMap.
6.3. Modeling Behavior

We use some of the design patterns described in Design Patterns Elements of Reusable Object-Oriented Software [GoF, 1995], to model responsibility distribution in our software. In the design process, thinking through the run-time control flow and micro-level collaborations between all objects has been intellectually challenging. Having to shift gears constantly between object and service modes
102
did not make it easier. However, well-engineered design patterns have many times come to our aid by raising the level of abstraction. Instead of thinking in fine-grained objects, we can instead think in courser-grained design pattern roles. Using design patterns has also helped us leverage the expertise of other skilled software architects to clarify our own ideas and to make the software more understandable. In this section, we describe the use of the State, Singleton, Flyweight and Command patterns. We have already mentioned the use of the Faade and Proxy design patterns in previous chapters. 6.3.1. Model state machines with State, Singleton and Flyweight patterns Communicating state machines play a central role in our middleware implementation. Section 5.3 has formally introduced the concept of communicating state machines. In this section, we describe how the State, Singleton and Flyweight design patterns are used to model state representations, transitions, events, as well as state machines interactions with its context.
Figure 24: Simplified class diagram of the State pattern implementation
Roles
Figure 24 shows a simplified class diagram of the State pattern as implemented in our prototype. In order to emphasize the major roles, a number of interface and class hierarchies have been left out. Only one subclass of the IParticipant interface and two subclasses of the IState interface are drawn in the figure to illustrate the modeling of a WS-AT state machine,
103
governing the two-phase commit. In the actual implementation, the class hierarchy for both IParticipant and IState is more complicated, with specializations to handle the different coordination types (WS-AT, WS-BA) and protocols (e.g. WS-AT Completion and 2PC protocol). The IParticipant interface represents the entity having multiple states and exhibiting the state transition behavior. In [GoF, 1995], such an entity is called a Context. For instance, the AT2PCParticipant is a Context that can migrate from one state to another during an atomic transaction. Each Context object keeps an instance of a state subclass that defines the contexts current state. For instance, when an AT2PCParticipant is created, its current state is configured to AT2PCStateActive, also shown in Figure 24. This pre-configuration of new Participants is the responsibility of a Transaction Manager, which has been shown earlier in Figure 19, and therefore omitted in Figure 24. Each state subclass implements IState, which defines an interface for encapsulating the state-specific behavior. This behavior is abstracted in the diagram as the Handle<Event> method. In our implementation, Handle<Event> is responsible for handling input, output and internal events as depicted by in the state machine definitions introduced in formula (1) and (2), p.83. For example, an incoming Prepare service invocation is handled by a HandlePrepare method in a state object.
Collaborations
There
is
clear
work
division
between
Context
object
(e.g.
AT2PCParticipant) and a State object (e.g. AT2PCStateActive), as depicted
by the three steps in Figure 24. Upon receiving any state-specific request, the Context object delegates the request to its current State instance by calling State.Handle<Event>(this). This delegation passes the Context object itself as an argument to the current State object, as illustrated by Comment 1 in Figure 24. The current State object is responsible for handling the specific event. This event handling could involve one or more of the four generic sub-steps as shown in Comment 2. First, the State object may need to do some pretransition work, which typically involves resolving the next state of transition. Then, optionally, there could be a pre-transition guard (i.e. condition) the state object needs to ascertain. An example of a guard could be affirming the
104
availability of an exclusive lock on some shared data structure, in order to avoid race conditions caused by competing state transitions. A state transition is effectuated with the ChangeState() method defined on the Context object, i.e. AT2PCParticipant in Figure 24. When invoked, ChangeState() will point the current state reference in the Context object to its successor state (Comment 3 in Figure 24). Then, some post-transition work ensues, such as generating a response message (e.g. Prepared) to the counterpart state machine that has originally generated the input event (e.g. Prepare). Sometimes the execution order of the state transition and the post-transition work is inconsequential, as they do not form preconditions for each other. In our implementation, outbound events whose effect is orthogonal to a state transition are handled by a background worker thread.
Motivation
The point of departure for modeling the state machines is the state tables defined in the WS-AT and WS-BA standards. Figure 25 gives a glimpse of such a table, in which each column corresponds to a single state, and each row corresponds to an input/inbound or internal event. Each cell in the state tables prescribes the next state to transition to, and if necessary, the output/outbound event to be generated. By using the State pattern, we achieve a direct mapping of each column in the state table to a State class, and each cell in the table to a method in a State class, as illustrated Figure 25. This gives cognitive congruency, rendering it a straightforward task to code up the State classes according to the specifications in the state tables. Some may argue that using a new object to represent each state in a complex system increases the number of classes and objects, rendering the overall structure less compact than if a nested switch has been used in the Context class. As explained in Fowler [2004], a nested switch is a direct and centralized method to implement a state table by turning each column (i.e. state) into a top-level switch, and turning each row (i.e. event) into a second-level switch nested inside the top-level switch. Such an approach can quickly become long-winded and incomprehensible with hard-to-debug conditional logic, when it is used to model a WS-AT two-phase commit state table with 7 columns and 10 rows. Therefore, although the use of nested switch gives more compact and centralized code, we consider it an inappropriate approach to dealing with the elaborate state tables defined in WS-AT and WS-BA. Using the State design pattern, we partition a Context objects behavior for different states, and distribute the state-specific behavior to the individual State objects, rendering the Context class itself simpler and easier to comprehend.
105
Figure 25: Mapping from state table to state object
Decentralizing the state-specific behavior also makes it easier to extend or modify the behavior by adding new State classes.
Implementational optimization
In order to minimize the number of State objects in the system, we have, in addition to the State pattern, applied the Singleton and Flyweight design patterns. The goal is to make each State object sharable across all Contexts. For instance, all AT2PCParticipants objects share the same Singleton AT2PCStateActive, etc. Making a State object Singleton and Flyweight Applying the Singleton pattern ensures that each State class has only one instance with a global point of access to it. In order to be sharable, the State objects must have no intrinsic state that hinges upon their Contexts. This is the very reason why a State class does not keep a reference to its Context object as an instance variable. While State objects do need to reference their Context objects, e.g. AT2PCParticipants, to do context-dependent work, this reference is externalized and supplied as a parameter in the Handle<Event> methods. By removing context-dependent intrinsic state from the State classes and converting it to extrinsic state passed in method parameters, we effectively turn the State objects into Flyweights [GoF, 1995, p.195], making them sharable among all Context objects.
106
Creating and destroying State objects There are two approaches to instance management of State objects: (1) Creating State objects only when they are needed and destroying them immediately afterwards, (2) Create State objects ahead of time and never destroying them. We have chosen the second approach. There are several reasons for this decision. First, since we have turned State objects into Singletons and Flyweights, the number of all State objects in the system is fixed rather than being proportional to the number of Context objects. Minimizing the number of State objects makes it possible to keep all State objects alive at all times without incurring an exorbitant storage penalty. The second reason for eagerly creating the State objects is: by paying instantiation costs once up-front, we avoid incurring these costs at the actual state transitions, making state transitions more efficient. This performance gain is of great value in our middleware, since state changes occur rapidly. Rapid state changes make object creation-and-destruction on the fly a costly alternative, especially when a destroyed State, e.g. AT2PCStateActive, can immediately be needed again by another Participant in the system. Formalizing state table representations Another option to optimize the State pattern implementation is to formalize the WS-AT and WS-BA state tables as either in-memory data structures or database tables. Then, instead of manually coding the event-handling and statetransition behavior in each State class, we can build either an interpreter that uses the digital table representation at run time, or a code generator that generates classes based on the table representation [Fowler, 2004, p. 112]. A state table representation has the advantage of being modifiable without requiring recompilation of code. Combining the State pattern with a formal table look-up can in the ideal case totally automate the generation of new State classes. Whats more, it centralizes the state transition logic without introducing messy conditional structures as in the nested switch approach, allowing us to reap the benefits from both worlds: a decentralized design pattern and a centralized state logic look-up. Due to time constraints, we have not pursued this optimization in our implementation.
A sample scenario of the communicating state machines
In the following we will pictorially illustrate a synchronization flow between two atomic transaction state machines in a 2PC scenario with successful
107
commit. In the upper half of Figure 26, we show the Coordinators view of a 2PCParticipants state machine, which is shown in the lower half of the figure. The numbering of events indicates a chronological ordering. In the walkthrough of the 10 steps in the following text, you will see how state transitions in the two state machines can temporarily get out of sync, but gradually regain sync with the help of inbound/outbound events. Some state transitions are triggered by internal events, a subset of which is also shown in Figure 26.
Figure 26: A sample scenario of the communicating state machines
1. The Coordinator starts the two-phase commit by issuing an outbound event, i.e. a Prepare() service invocation, to the Participant. The Coordinator propagates to the Preparing state. The two state machines are now out of sync. 2. The Participant receives the Prepare() input event and propagates to the Preparing state. The two state machines are now in sync, i.e., they are both in the Preparing state. 3. The Participant asks the Resource Manager (RM) to prepare its transaction branch by calling RM.Prepare(). The Participant then propagates to the Prepared state. The API between a Participant and a back-end Resource
108
Manger is covered in Section 5.5.3. Although the RM.Prepare() call could be issued to a remotely residing Resource Manager, it is shown as an internal event in Figure 26. Outbound/inbound events refer to the relative relationships between the Participant and Coordinator. Seen from the Coordinators point of view, the RM.Prepare() is a black-box event internal to the Participant. 4. The Resource Manager has successfully prepared its transaction branch by persisting data. A subsequent outbound Prepared() is sent to the Coordinator. The Participant then propagates to the PreparedSuccess state. 5. The Coordinator receives the Prepared() inbound event and propagates to Prepared state, still one state behind the Participant. 6. The persists the Participants vote by calling RecordOutcome() on the Persistence Manager (PM). This internal event propagates the Coordinator to the state PreparedSuccess, again in sync with the Participant. Coordinator
7. After ascertaining the guard that all Participants have returned a Prepared vote, the Coordinator issues an outbound Commit() event and migrates to the Committing state. 8. The Participant receives the Commit() decision as an input event and generates an internal event by forwarding the Commit() decision to the Resource Manager(RM). Notice this arrow is marked as both an input and an internal event. The Participant propagates to the Committing state. 9. The Participant generates an outbound event, i.e. a Committed() notification, to the Coordinator, migrating to the final state - None. 10. The Coordinator receives the inbound event Committed() and transitions to the None state. The two state machines are in the final synchronization step, completing the two-phase commit. 6.3.2. Model application-middleware interaction with Command pattern In Section 5.2.2, we mentioned that the WS-BA completion phase can only make progress via the synchronization between the application layer and middleware layer. We also mentioned the lack of WS-BA standard support for
109
the definition of such a synchronization interface. In the following, we will describe the efforts we, as framework designers, have made to lessen the burden placed on the service programmer by using the Command pattern. Instead of giving an abstract walkthrough of the Command pattern, well use the implementation of the eligibility evaluation service as a concrete example. To recap, eligibility evaluation is a WS-BA service orchestration, where the elaborate workflow and looping logic interleaves the completion phase of the transaction.
Figure 27: Modeling the interaction between application and middleware layer
Motivation
Figure 27 is a high-level documentation of the major components of the Command pattern. To sustain clarity we leave out some details such as interface and class hierarchies.
EligibilityEvaluationContainerService belongs to the service faade layer as shown in Figure 19, p.90. It stores a reference to an EligibilityEvaluationCommand object of type ICommand. ICommand defines the interface for all concrete Command objects, such as the Execute() method. Seen from a service clients perspective, a service method is implemented by the service it invokes. Behind the scenes, however, the service operation is handled by a Command object. The Container Service is merely a simple faade, delegating all business and protocol logic to its
110
Command object by calling CommandObject.Execute(). As GoF [1995] maintains, using the Command pattern decouples the entity that invokes the operation (in our case the service client) from the entity having the knowledge to perform it (in our case the Command object). The first advantage of this decoupling is that the Service Container classes can be kept uniform and simple, making it possible to automatically generate the service faade code, thereby relieving the service programmer from this duty. We have not implemented code-generation for the Container Services, which is why the Container Service class is still marked as service programmers responsibility in Figure 27. Implementing code generation for the Container Services should, however, be pretty straightforward. A second advantage of this decoupling is that the Command object can have a lifetime independent of the Container Service instance. This provides the opportunity to replace the object-oriented synchronous method invocation paradigm with a temporally more loosely coupled paradigm, such as message queue. The corollary to this is that the Container Service instance can be garbage collected immediately after having received the clients invocation and forwarded it to the Command objects address space, i.e. a message queue or object store.
Roles and collaboration
In the following, we walk you through the components and their collaborations in the context of the eligibility evaluation service. Upon receiving an incoming message, the service faade places the invocation parameters and transaction configuration parameters in the ServiceParamWrapper and TxConfigWrapper objects, respectively. These wrappers encapsulate, respectively, domain-specific and protocol-related input transfer from the Container Service to the Command object. For instance, if a parent transaction context exists, this information will be placed in the TxConfigWrapper object as a hint for the Command object to create a Participant to join the parent scope. The zero-parameter Execute() method defined by ICommand necessitates the use of a ServiceParamWrapper to forward service invocation parameters. The EligibilityEvaluationCommand is a very knowledgeable object. It is acquainted with the workflow defined for the eligibility evaluation service. This workflow knowledge is preconfigured as a WorkflowRuleset object, which is
111
an object-oriented representation of the workflow logic written in any formal process language, e.g. BPEL. In our prototype, there is only a skeleton implementation of the WorkflowRuleset class. Apart from the workflow knowledge, the EligibilityEvaluationCommand also possesses the protocol-specific knowledge, namely that the eligibility evaluation service should not only join the parent WS-BA transaction, but also start a new nested transaction managed by an interposed Coordinator. Being configured with all this knowledge, the major task of the EligibilityEvaluationCommand is to interweave the workflow and protocol-flow, in order to drive both to termination. The EligibilityEvaluationCommand object has the following responsibilities: Create a Participant that joins the parent transaction by sending a message to Registration service in the Coordination Framework. Start a new BA transaction for the current service orchestration. This entails sending an activation message to the Activation service. The new Coordinator created for this current transaction scope becomes the interposed Coordinator. Start an execution thread for the two child services, sampling and control and fund reservation. These execution threads run concurrently. Synthesize the partial results of the child service executions and drive the workflow to gradual progression, based on predefined branching rules in the WorkflowRuleset object. After the child services execution threads have terminated, communicate status of the nested transaction to the upper-level service orchestrator and participate in the completion protocol of the parent transaction, i.e. subsidy application. Each child service is executed in a ChildServiceInvocationThread object, which is responsible for invoking the proxy (i.e., the Container Service) for the corresponding legacy child service. The proxy will create a Participant for a child service, enlist this Participant in the current coordination context, and then invoke the actual legacy service residing on the remote server. Proxy invocation logic is intentionally factored out of the Command object and placed into the invocation thread object. While a Command object has the global knowledge of each child service invocation thread, a thread object, on the other hand, only has local knowledge related to a single child service and its corresponding Participant object. In order for the workflow to make progress, the Command object needs to know
112
the execution status of each child service invocation thread. Because a business-activity could be potentially long running with periods of inactivity, it is inappropriate to let the Command object constantly poll the status of each invocation thread, since this will waste a lot of processing resource. Instead, each invocation thread is responsible for synchronizing with the Command object, after having arrived at a state when they cannot unilaterally decide what to do next.
Synchronization points in BA completion
For the BA Coordinator completion protocol, we implement two synchronization points. The first synchronization occurs when a service has performed its business operations and its Participant is still Active. The second synchronization occurs when the Participant is Completed. In the Active state, the child service has succeeded executing all business logic and is ready to complete. But the complete-or-exit decision may depend on the behavior of other Participants. For instance, the sampling and control Participant should only be allowed to complete if the fund reservation Participant has not already failed, in which case the Participant for sampling and control should just exit and propagate to the Ended state. In the Completed state, the child services Participant has received notice that no more orders will be issued to it, for which reason it has persisted local resources and propagated to the Completed State. But the close-orcompensate decision depends on whether the global transaction will eventually succeed. In the eligibility evaluation service, the workflow rule stipulates that the fund reservation Participant has to wait in the Completed state if sampling and control has not yet returned a result of approval or rejection yet. In other words, the fate of the global transaction is in an uncertain state. If this is the case, the execution thread for fund reservation will release the monitor lock on its Participant by calling Monitor.Wait(). This wait is harmless since the Participant has no locks on legacy resources in the Completed state in BA. It just cannot be allowed to enter the End state, because the transaction branch is incompensatable in that state. If the sampling and control service later rejects the fund reservation, the fund reservation service has to compensate. Technically, when the sampling and control service has finished its business logic, it will synchronize this outcome with the Command object in the Active state. As soon as this synchronization occurs, the Command object will check if the fund reservation invocation thread is
113
waiting inside the Monitor of its Participant. If this is the case, the Command object calls Monitor.Pulse(), thereby notifying the fund reservation thread of the availability of a compensate-or-close decision. Weve handled the implementation of the above-mentioned synchronization by asking each BA-related Command object to override two call-back methods, BASyncAfterBusinessOperations() and BAsyncAfterProtocolStateCompleted(). A child service invocation thread will use the first callback method to report to the Command object, that its Participant is in Active state and needs a decision to complete or to exit. The second callback method is used to indicate a Participant is in completed state and needs a decision to close or compensate. In other words, the invocation threads call back on the Command object to solicit a decision based on the global state at the synchronization point. Since each service orchestration has its own decision-making logic, determined by the WorkflowRuleset, the service programmer should make sure that these two callback methods are overridden properly to reflect this. See the EligibilityEvaluationCommand class for an implementation example.
Consequences
The synchronization mechanism can be rather intriguing if the workflow rules are involved. Our idea of formalizing the synchronization decision between the Command objects and the ChildServiceInvocationThreads is to make the latter as generic as possible by not giving it any global knowledge of the workflow status. Apart from invoking a service proxy, all that an invocation thread has to do is slavishly synchronize state with, and carry out the decision made by the Command object. With such a design, the ChildServiceInvocationThread class has a general implementation, and thus ceases to be the responsibility of the service programmer. Similarly, very little effort has to be devoted to making a new subclass of Container Service, since it delegates all processing to the corresponding command. And because the dependency injection of Command object into a Container Service object is through the ICommand interface, changing the workflow or implementation of a preconfigured Command class does not lead to the recompilation and deployment of the Container Service. A possible way to improve the programmability of the logic in a Command objects Execute() method is to use the Template Method design pattern [GoF, 1995, p.325], which enforces an invariant for all subclasses of Command by providing abstract operations/hooks.
114
Despite our efforts to factor out generic and plumbing functionality out of the Command classes, our own experience of writing a new Command class for each new Container Service has shown, that the task of programming such an intelligent Command class is quite challenging. Minor oblivion could lead to a protracted debugging ordeal. Programming a Command subclass requires the programmer to be conversant with both application and protocol logic, including transaction framework API, the workflow API, etc. In our view, this programming challenge is partly explainable by the high cohesion between a Command class and its collaborator classes, which again can be attributed to the necessary interweaving of protocol and application logic.
6.4. Other implementational aspects

This section sketches a number of implementational aspects that are not covered by the preceding sections. WCF has built-in support for WS-AT and offers a high-level annotation-based API for transactional service programming. We have chosen not to use this built-in plumbing, but implement the Coordination and Participant Services as light-weight and stand-alone components. This means that the clients do not need to be deployed on the WCF platform to use the Coordination and Container Services. To interpose a WCF managed coordinator in a nested parent scope, we need to interface with the WCF coordinator, i.e. a DTC (Distributed Transaction Coordinator) component. Such platform-specific interfacing can turn out to be very time-consuming, distracting us from the projects main focus. This is another reason why we have bypassed WCFs WS-AT implementation. The WS-BA standard does not define a service contract between a WS-BA completion initiator and the transaction coordinator. We have added this service contract as an extension in our implementation. Clearly, this proprietary extension will not be interoperable with other peoples extension of WS-BA. We attribute this shortcoming to the lack of standard support. 6.4.1. Logging OO-style logging such as the writing to console technique is not portable to logging service interactions. Output of Console.WriteLine()statements magically disappears when web services cross the wire, posing a challenge to
115
debugging service interactions. In order to monitor exactly what happens at the back-end core that is hidden behind a web service faade hosted in IIS, a new way of logging is required. In our prototype, we have resorted to the Windows Event Log, which is a central logging mechanism used by the Windows OS, device drivers, Windows applications etc. An API exists for publishing and accessing events in an event log. The Windows Event Viewer is a graphical tool that allows for item-for-item event inspection. To debug and monitor service interactions for our middleware application, we have programmatically created an event log that is capable of source-specific fine-grained logging. Different event sources are used for different components, such as the Coordination Framework, Container Framework, etc. Overloaded convenience methods are provided for logging to different event sources, and for indicating the severity of the logged event (e.g. Warning, Error, Information, etc.). 6.4.2. Fault Handling Exception and fault handling mechanisms in the prototype are designed to be centralized and generic. On the OO level, customized exceptions are created for handling back-end errors that do not need to be propagated to the service invoker. On the service-oriented level, WS-TX standards define a series of faults (e.g. InvalidProtocolFault) that should be propagated as SOAP faults over the network. In our implementation, these pre-defined SOAP faults are objectified as Singleton Flyweight objects, and preconfigured with all needed information such as FaultCode and FaultReason. When a fault is thrown and needs to be propagated, a Factory Method [GoF, 1995] is used to generate a WCF FaultException<ExceptionDetail> object, wrapping our Singleton Flyweight object as the exception detail. WCF then propagates the fault to the invoking client, without faulting the communication channel. 6.4.3. Instance Management We have already touched upon object-oriented instance management, e.g. the use of Singleton Transaction Managers, Singleton and Flyweight State objects, as well as the use of one Coordinator object per transaction. At the servicelevel, instance management affects scalability, performance, and throughput, and must therefore be designed with care. WCF services can be configured with three different instance modes: Per-call services (a new service instance is created and destroyed for each service invocation), Sessionful services (a service instance is used per client connection/session) and Singleton service (a service
116
instance is shared by all clients). The instance mode in WCF is annotated for each service implementation class by use of the ServiceBehavior attribute. We do not elaborate upon the WCF instance management basics, which merits a chapter for itself. Instead, we briefly argue for our choice of per-call services.
Scalability
According to Lwy [2007], scalability is the major reason for using the per-call mode, as the system only needs to create and maintain as many objects as there are concurrent clients, not as many as there are outstanding clients, as in the per-session mode. In our middleware, the outstanding clients comprise all participants and coordinators for all ongoing transactions. These clients spend most of their time performing back-end computation. They only need the service instances for very short durations to send one-way notifications such as I am prepared. Choosing the per-session mode and earmarking one service instance for each of those clients will not scale well. In the WS-BA setting, allocating a service instance per participant session can be disastrous, since a long-running participant can potentially hold on to a service instance for days or weeks.
Performance
Since a per-call service is created and destroyed repeatedly, it must save its state for every invocation. This can incur a performance penalty due to the necessity to reconstruct the instance state from a persistent storage upon every invocation [Lwy, 2007]. To avoid this performance penalty, we have designed all our service classes to be stateless. In other words, our middleware services merely function as a faade, delegating all stateful computation to the back-end objects. How the service faade correlates a received notification message with a back-end entity is described earlier in Section 6.2.3.
Throughput
Because our services are stateless and use the per-call mode, it is possible to use a load-balancing mechanism to increase the throughput at the service faade layer.
117
6.4.4. Concurrency Management Concurrency management is managed at two-levels in our prototype: 1. Configuring the WCF services concurrency mode 2. Providing thread-safe access to critical back-end data structures Concurrency at the service faade layer is harmless, since all services are stateless. Therefore, we use the Mutiple concurrency mode in WCF, using no locks on service instances. WCF offers two other alternatives for concurrency management. The Single concurrency mode disallows concurrent calls to a service instance by associating that instance with a synchronization lock. The Reentrant mode enables reentrant locking, meaning that if the service calls out to another service or a callback, and that call chain finds its way back to the service instance, then that call is allowed to reenter. The Single and Reentrant modes use WCF-level locking which is overkill in our prototype. Concurrency management for the back-end data structures is a much more complicated matter. Data structures such as ActivityMap and ParticipantMap are subject to a high-degree of concurrent access. Participant objects can be accessed from both Transaction Manager and State objects, and must be protected with the right level of locking, in order that illegal state transitions do not occur. When designing the back-end data structures and the operations to access and modify them, we have systematically analyzed the potential critical regions (e.g. state transition methods, etc.). We have also analyzed the access patterns to these critical regions, focusing on conflicting read-write or write-write operations, as well as the performance consequences of enforcing mutual exclusion. We use mutex (mutual exclusion) locks to protect the critical regions, while at the same time trying to lock as few lines of code as possible, allowing for reasonable concurrency.
6.5. Deployment instructions

This section describes two deployment scenarios for the prototype: Set up a development environment that mirrors the one we have used to develop and test the prototype Deploy the Coordination Framework as a pluggable stand-alone component
118
6.5.1. Set up a development environment If you wish to view or run the full prototype as a Visual Studio solution in a development environment, your environment should meet the following requirements.
Requirements
- Windows XP with Internet Information Services (IIS) activated and running - Visual Studio 2005 or 2008 - .NET Framework 3.0 Redistributable Package 1 - Visual Studio 2005 extensions for .NET Framework 3.0 (WCF & WPF)2 - NUnit 2.4.0 3 or newer for running the tests
Procedure
Use the following steps to open the solution, inspect the code, run the objectoriented state machine simulation, or the service-oriented NUnit tests. 1. Extract athena.zip to the local hard drive, e.g. c:\athena 2. Create an environment variable in Windows named ATHENA_DIR and set it to point to the path to the athena folder. 3. Open athena.sln, located in the root of the athena folder, in Visual Studio to inspect the code. 4. Create virtual folders in IIS with the following names and paths: a. BusinessServices - athena/BusinessServices. b. ContainerService - athena/ContainerService. c. CoordinationService - athena/CoordinationService. 5. Install Web Host Script Mappings in IIS for WCF by executing athena/servicemodelreg.cmd. This will run the ServiceModelReg.exe tool, distributed as part of the .NET 3.0 Redistributable Package, to register the new virtual folders with WCF. 6. Right click on the EventLogInitializer project in the Solution Explorer and choose Debug -> Start new instance. This will create the athena event log and the predefined event sources. 7. To run the object-oriented simulation, select the Debug solution configuration and rebuild the solution. Right-click on the StateMachineSimulation project and choose Debug -> Start new instance.
1 2
http://www.microsoft.com/downloads/details.aspx?familyid=10cc340b-f857-4a14-83f5-25634c3bf043 http://www.microsoft.com/downloads/details.aspx?familyid=f54f5537-cc86-4bf5-ae44-f5a1e805680d 3 http://www.nunit.com
119
You will be prompted to enter the desired simulation type. For instance, entering AT will execute a WS-AT 2PC successful commit scenario and print the execution flow to the screen. 8. To run the NUnit service tests select the Release solution configuration and rebuild the solution. Open NUnit GUI and point it to athena/ContainerService.tests/bin/ContainerService.tests.dll. Run the desired tests, e.g. T31_CSAccountPayableSuccessTest for a flat atomic transaction scenario or T91_CSSubsidyApplicationSuccessTest for a full-blown WS-BA scenario with interposed coordinators. 6.5.2. Deploy the Coordination Framework as a stand-alone component Although all components are currently included in one single Visual Studio solution in our development environment, they are designed to be individually deployable components. In order to deploy e.g. the whole Coordination Framework to provide an activation/registration service and the protocol services for WS-AT and WS-BA, you need to build the Common, CoordinationFramework and CoordinationService projects. This will create three class libraries (dlls) with the same name as the projects. In IIS Administration console, create a new virtual folder named CoordinationService and make it point to the folder with these three dlls. The web.config and the *.svc files from the athena/CoordinationService project also need to be in the same virtual folder. You are now ready to request e.g. the activation service to create a new CoordinationContext at this endpoint http://[machine-name-orip]/CoordinationService/ActivationService.svc. We skip the instructions for deploying a stand-alone Container Framework, since it also follows the above principle.
6.6. Summary
This chapter describes how the architectural designs presented in the last chapter are translated into software-level designs. The development environment is introduced, and the structural and behavioral modeling of software components is explored. Other implementational aspects such as logging, error handling, instance management and concurrency management, are briefly covered. In the next chapter, we will provide a systematic evaluation of the design and implementation of the middleware prototype.
120
7. SOA transaction middleware prototype: test and evaluation

In order for any prototype to qualify as a proof-of-concept implementation, a thorough and systematic evaluation of it must be conducted and documented. In this project, we define evaluation as the systematic determination of the prototype quality. Evaluation includes the following two aspects: (1) setting up quality factors of evaluation, and (2) using appropriate evaluation approaches to arrive at convincing conclusions. Evaluation is the focal theme of this chapter. The chapter also emphasizes testing, it being one of the most important evaluation approaches we have used to determine quality factors such as correctness, reliability and robustness. We begin by introducing the evaluation and test strategy in Section 7.1 and 7.2, making explicit what should be evaluated (i.e., quality factors) and how we would go about evaluating (i.e., evaluation approaches). Sections 7.3 through 7.6 give examples of test cases in three major test domains and the related findings. Finally, in Section 7.7 and 7.8, we provide reflections on testing as well as a synthetic evaluation of the overall quality of our software production.
7.1. Evaluation strategy

Implementing integration middleware like ours is characterized by a high degree of complexity. This complexity stems partly from the needs to accommodate the heterogeneity of the systems that the integration middleware purports to integrate. It is also an extremely mission-critical task since the malfunctioning of the middleware has ripple effects on all the software applications involved in distributed transactions. Being complex and missioncritical, the transaction management middleware must be tested and evaluated with a higher degree of thoroughness. However, given a time frame of six months in which hands-on implementation constitutes only 50% of the project goal, we cannot expect to test the prototype as thoroughly as its complexity and mission-criticality would have justified in a commercial setting. Given the complexity barrier and resource constraint, we must devise a way to maximize the effectiveness of our evaluation efforts with minimum cost (in terms of time-use). Our approach to achieving this goal is an evaluation strategy based on two principles:
121
1. Focus on testability, not the exhaustion of test cases 2. Focus on quality factors with the highest priority 7.1.1. Design for testability The first principle has motivated us to work on a testable design. In Software Fault Tolerance: A Tutorial, a NASA technical research paper, testability is subdivided into the ease of test development (i.e., controllability) and effect observation (i.e., observability) [Torres-Pomales, 2000]. Given that testing is potentially endless, Pan [1999] argues that an ideal time to stop testing is when the benefit from further testing no longer justifies the incremental testing cost. However, we were aware from the very beginning that our testing activities would have to stop long before this optimal time was reached, due to resource constraints. Yet by designing the software with testability in mind, we hope to achieve better software quality, even when only a subset of the test cases could be carried through. Below are a few examples of how we have designed for testability: 1) By separating the service-faade layer from the back-end protocol layer, we have made it easier to test back-end logic independent of the service front-end. Section 7.2.4 documents how back-end state machines are designed to work in conjunction with a simulation application that can turbo-test protocol implementations in objectmode without letting them go into the slow-motion service-mode. 2) Within each layer, by designing components to be as loosely coupled and modular as possible, we aim to minimize the systems internal dependencies and make it easier to test each component independently. 3) Our tests have covered the mainline success and failure scenarios according to the requirement specifications mentioned in Section 5.1. For domains which have less importance for our project scope, we have strived to develop a generic test setup so that future tests could easily be designed and performed in a similar fashion. (Section 7.3). With a testable design, the probability of potential new bugs being found in a systematic manner, and as a logical continuation of the present test setup, increases.
122
7.1.2. Prioritize quality factors Evaluating a piece of software is to say that we want to know something about its quality. There is a plethora of quality factors applicable for software production evaluation. Clearly, it is impossible for us to try and evaluate every imaginable aspect. Thus, we wish to identify quality factors that are most important and may be assessed efficiently with the given resources. Hetzel [88] suggests a mental grouping of software quality factors along three dimensions 1 : functionality, engineering and adaptability. This grouping has instigated the following presentation of our prioritized quality factors in Table 5, where the priority level is basically proportional to the amount of resources that have been committed to establishing a certain property.
Functionality (exterior quality) Correctness Interoperability Robustness Engineering (interior quality) Structure Modularity Comprehensibility Adaptability (future quality) Extensibility Reusability Maintainability
Priority 1 Priority 2 Priority 3
Table 5: Prioritized software quality factors
(With inspiration from [Hetzel, 88])
Functionality criteria: correct, interoperable and robust
In the functionality dimension, we would like to establish three properties. Correctness is the minimum requirement, and has been defined in more precise terms in Table 4, p.75. Interoperability is the primary motivation for using service-oriented transaction processing. We have tried to establish this property by adopting a standardbased approach to implementation. Although all applications are currently implemented and tested with Microsoft Windows Communication Foundation (WCF), the legacy services can, in principle, reside in any platform, since their invocation is purely via no-frill XML messages. The only requirement we impose on the legacy services is that they expose a minimal Resource Manager API (Section 5.5.3) whose definition is also standard-based. Since the Resource Manager API is service-oriented and standard-based, interoperability is guaranteed by design. Ideally, a systematic interoperability test would be performed by deploying the legacy services in different platforms, e.g. some in .NET and some in Java, in order to run distributed transaction protocols
By dimension, we mean the three columns in Table 5, and not the vertical and horizontal dimensions in this table.
1
123
across platforms. We have not had the time to perform such a test but did work on the testability of it by decoupling the legacy services from the middleware framework. The Robustness criterion has had the lowest priority in the functionality dimension. IEEE [1990] defines robustness as the degree to which a software component can function correctly in the presence of exceptional inputs or stressful environmental conditions. Pan [1999] points out that robustness testing differs from correctness testing in the sense that the functional correctness of the software is not of concern and that a robustness test only watches for problems such as machine crashes, process hangs or abnormal termination. We have evaluated robustness by running software-level stress tests or load tests (Section 7.7). We have not committed time to perform hardware-level load tests or performance tests, nor have we used quantitative metrics to examine throughput, etc. We share the viewpoint with Pan [1999] that robustness and stress testing tools are likely to be made generic and commercially available, whereas correctness testing tools are often specialized with limited generality, which is why we have primarily focused on the latter aspect.
Engineering criteria: structured, modular and comprehensible
The three quality factors (structured, modular, comprehensible) all stem from the testability principle. It is our view that however many tests we do, these software engineering qualities cannot just be tested into the software. Our approach to achieving these properties is a top-down decomposition of these quality factors, first at the operational level and then at the quantitative level. This process is illustrated in Figure 28, which is heavily inspired by the GoalQuestion-Metric (GQM) methodology used by NASA to manage their missioncritical software projects. GQM is basically a top-down method, which operationalizes and measures qualitative goals (See more details in [Solingen, 1999]. To achieve the quality factor structure, we have used one single interface library to contain the contracts for all components and their interactions. We also infuse more structure into a complex middleware system by using a layered architecture and well-documented design patterns. In achieving the quality factor modularity, the layered architecture plays a central role, while we also focused on loose coupling, e.g. by doing dependency injection through configuration files or interface binding rather than tight association of classes. Concrete examples of our use of the layered architecture, design patterns,
124
dependency injection, etc. can be found in the architecture Chapter 5 and implementation Chapter 6.
Figure 28: Top-down decomposition of quality factors
To achieve the quality factor comprehensibility, we have striven to write selfdescriptive code whenever possible, e.g. by replacing protracted if-then-else structure with switch-constructs, etc. We also use best-practice design patterns to increase the communicativeness of the components internal relationship, and adhere to exposing course-grained service contracts in order to make the service orchestration explainable through business process terms. Furthermore, we document complicated back-end logic in a systematic manner. Ideally, the operational level should be further decomposed to quantitative measures. To this end, we experiment with an automatic tool Source Monitor, for which we include a screen capture in Figure 29. As an example, we use the max complexity metric as a rough measure of selfdescriptiveness, which in turn supports the comprehensibility quality factor. In Source Monitor, max complexity is the complexity value of the most complex method in a code unit. The complexity metric is defined by Steve McConnel, and measures the number of execution paths through a specific function or method. Each function or method has a complexity of one plus one for each branch statement such as if, else, for, or while. Arithmetic if statements (Boolean ? ValueIfTrue : ValueIfFalse) each add one count to the
125
complexity total. A complexity count is added for each '&&' and '||' in the logic within if, for, while or similar logic statements. Switch statements add complexity counts for each exit from a case (due to a break, goto, return, throw, continue, or similar statement), and one count is added for a default case even if one is not present (See more details in [McConnel, 2004].) Figure 29 shows that the max complexity metric increased from a low level of 8 when we started using the Source Monitor tool, to a high of 25 in mid July, and was then brought down to a final, moderate level of 14 through conscious refactoring. It is worth noting that we do not commit resources to systematically mapping the metric repertoire offered by this tool to our operational criteria. The lack of such a mapping is a weakness in this project, but the existence of tool support increases our confidence in the feasibility of such mappings in similar projects with fewer resource constraints.
Figure 29: Source Monitor metrics: a selected subset
Adaptability criteria: extensible, reusable and maintainable
These criteria are considered future quality factors, indicative of how our software implementation is able to evolve and adapt over time. Our general approach to achieving these properties is again to weave them proactively into the design, rather than reactively testing the possible lack of them. As mentioned in earlier sections, we have placed emphasis on designing an extensible Coordination Framework. At the same time, the transaction protocol modules are implemented as pluggable software components that can be integrated with the Coordination Framework with minimal configuration. With respect to the reusability property, we focus especially on making the interaction API between
126
the protocol layer and the service orchestration layer modular. Use of layered architecture and the Command design pattern both purports to render interaction components (e.g. ChildServiceInvocationThread) and utility components (e.g. ServiceAgent classes) reusable and extensible. To enhance maintainability, we focused strongly on comprehensibility, structure and documentation, since one cannot possibly maintain a piece of software without a good understanding of its structure. Automatic roundtrip UML tools in Visual Studio also come in handy in offering an overview of the classes mutual relationships. It may be noted that criteria such as security, scalability and efficiency are conspicuous in their absence from our list of prioritized quality factors. All three are very important quality measures for any middleware implementation, but we have given them a lower priority than the nine factors listed in Table 5 since they only relate peripherally to our main project goal.
7.2. Test Strategy

In the previous section, we introduce what our evaluation strategy is and how we tackle the individual areas of evaluation. Indisputably, testing is our predominant evaluation methodology. In this section we review the test strategies chosen in relation to various testing methods and techniques, leaving out test-related aspects that have been touched upon in the previous section. 7.2.1. Structural test Structural tests are also known as white-box tests, focusing on statement coverage, branch coverage and logical condition coverage [Parrington, 89]. Due to the time restriction, we choose not to perform fine-grained stepwise language structure coverage tests as described in [Sestoft, 1998]. Code inspection, walkthroughs, and peer reviews are used as alternatives to systematic structural tests. As suggested by [Hamlet, 1994] and [Myers, 1979], there is no definite proof that code coverage tests correlate with better software quality. These authors also argue that qualitative human testing techniques, used wisely, can sometimes contribute more to improving software quality than traditional structural tests, and in a more cost-effective manner. In Appendix D, we have attached a picture of a peer review session that took place on July 4th, 2007. During this peer view, we did a whiteboard walkthrough of the WS-AT abort scenario which involves the collaboration between eight objects/roles. The goal of this peer review was to validate that our design would lead to correct state transitions.
127
7.2.2. Unit test From this projects initiation we have been aware of the beneficial results available through automated unit testing. However, we soon recognized that a true test-driven development methodology is an unrealistic a goal for this project, given the multitude of modules that should be rapidly prototyped. Therefore, we only perform unit testing for a subset of the most missioncritical modules, in particular the error-prone back-end transaction protocol layer in both frameworks. Our unit testing does not cover constructors, property methods (a.k.a. accessor and mutator methods) and what we have subjectively deemed as trivial methods. Instead, we concentrated resources on unit testing the methods that are pivotal to the protocol logic and service orchestration logic. For instance, there are both success and failure tests for the CreateCoordinationContext() method in the TransactcionManager class in the Coordination Framework. The failure test purports to assert that the method handles invalid protocols properly by propagating a well-encapsulated SOAP fault InvalidProtocolFault to the invoker in line with the WS-Coordination standard. It is not stringent to call the collection of such tests a unit testing framework, for even though testing is done on method-by-method basis, every single method is not tested. Nonetheless, the subset of unit tests has revealed many bugs at a very early stage. Bug fixing in the development phase repeatedly forces us to sharpen our thoughts and rethink our initial design. In many cases, these unit tests provide motivation for effective code refactoring. We have used the open source NUnit framework to conduct unit testing. 7.2.3. Functional test Functional tests, also known as black-box testing, are used extensively in our project. All functional tests derive from the requirement specifications. Our functionality tests entail two subcategories depending on the test scope, component test and system-level tests. As a rule of thumb, we choose to match each architectural layer with a component (i.e. a .NET class library) as well as a corresponding component test library in NUnit. For instance, for the ContainerService component in the servicefaade layer, we have a corresponding ContainerService.tests library. In terms of cross-component system tests, we give primary focus to testing the run-time execution of communicating state machines in both flat transactions and transaction hierarchies. One major challenge here has been the design and
128
prioritization of test scenarios. Attempting to exhaustively test the input space in a single machine sounds impractical, let alone considering random factors like timing, ordering, invalid inputs and communication failures when transaction protocols are interleaved and nested in a distributed setting. As state machines form the back-bone of the transaction protocol layer, functionality tests in this regard are instrumental in proving or disproving the softwares correctness and reliability. Hence, we must again think of a way to maximize the testing profit while minimizing cost. The selected test strategy can be described as a combination of:
1. Object-oriented state-machine simulation, and 2. Equivalence partitioning
7.2.4. Object-oriented state-machine simulation Before service-testing the protocol layer, where state machine communications play a central role, we test mainline scenarios in an object-oriented fashion. In this way, automated test suites became faster to process since we avoid the web service invocation latencies after each recompilation. By separating the test of back-end logic from front-end service-interaction, we also constrain the error space to back-end implementation alone, making error-detection potentially easier and quicker. A small executable simulation application, StateMachineSimulation.exe, is developed to allow the back-ends to bypass their respective service-faade layers and communicate with each other directly. Components in the Container Framework and Coordination Framework are placed in a series of wrapper threads in order to simulate concurrent execution. Passing of XML messages is simulated by method-calls. To this end, it is necessary for the invoker and invokee objects in the two frameworks to know each other. In other words, objectoriented dependency injection is needed. To avoid introducing gratuitous pairwise coupling, dependency injection is done by giving all invoker and invokee objects a reference to a common Singleton IContext object. IContext is inherited by two subinterfaces I_ATContext and I_BAContext defining minimal context APIs for the simulation of WS-AT and WS-BA state machines. A successful state machine simulation test usually precedes the corresponding service-oriented test, which then focuses on uncovering bugs in the servicefaade layer, standard compliance issues or interoperability problems. When
129
service-enabling the components, the IContext references will be disabled, since back-ends should not have direct references to each other. 7.2.5. Equivalence partitioning Having chosen to apply a set of object-oriented simulation tests matched by the corresponding service-oriented tests, we are now ready to answer the important question - what test scenarios should we cover?. The interleaving of all legal paths in all state machines, given all types of nesting possibilities, will give rise to a combinatorial explosion. In other words, it is impossible to exhaust the input space. Pan [1999] and Beizer [95] suggest using equivalence partitioning as a technique to exhaustively test the input space. The idea is to partition the input domain into regions and consider the input values in each domain as an equivalence class.
Figure 30: Equivalence partitioning of protocol test input space
In our case, the input space comprises the whole set of legal state transition paths for each pair of communicating state machines. Both input and expected output variables are discrete. We have chosen to equivalence partition the input space using a divide-and-conquer approach as captured by Figure 30. Note that this figure is for illustration purposes only and does not completely capture all equivalence classes.
130
Instead of looking at possible paths through the state machines as purely mathematical permutations, our partitioning treats the test case input space as a tree of semantic domains. In Figure 30, each un-striped box represents a decomposable test domain, and each striped box contains a specific test case, or equivalence class. The set of child test domains and test cases are sometimes collected in a dotted-outlined box. For leaf-level test cases, those closer to the root have higher priority. Right beneath the root, we have the three main test domains: atomic transaction, business activity and Coordinator interposition. The three branches are then further divided to test domains and leaf-level test cases. For the atomic transaction test domain, equivalence partitioning covers the commit scenarios, the abort scenarios not (directly) caused by timeout or process crash and other scenarios (e.g. timeout, process crashes, etc.). Commit and abort test cases are given higher priority than the timeout and process crash test cases, determined by their distances to the root. Likewise, the business activity test domain is subdivided into the close domain, the compensate domain and the other domain (covering other termination possibilities and recovery scenarios in case of process crashes). In terms of the Coordinator interposition domain, the first test domain covers the scenario when the interposed Coordinator has the same type as the parent Coordinator. This same type can in our implementation only be a business-activity transaction, since nested atomic transactions are excluded from the prototype scope. The second scenario is when the parent Coordinator is of the business activity type and one or more of the interposed Coordinator(s) is/are an atomic transaction, as in the subsidy application service orchestration. We will not explain the specific semantics of every equivalence class here. Sections 7.3 through 7.6 will detail the test cases with concrete examples, zooming in on the test setup, findings, etc. 7.2.6. Continuous integration As we are a two-person team, continuous integration has been used from the onset of our programming process. Continuous integration is a wide-ranging term [Fowler, 2006]. It entails the following set of enforced disciplines in our project:
131
Use the version control system Subversion as the source-code repository. Use only a mainline development stream for source code (rapid prototyping). Use a single trunk version for all source code and no branches. Automate the build of source code by writing a NAnt build script executable with one click. Include all unit tests in the automatic build process. Allow the flexibility to initiate a build with back-end tests only, i.e. without running the slower web service tests. The build script gives the programmer an optional switch between build all, build webservice, or build no webservice. Before a commit, a team member must check out all source code from Subversion, merge and resolve possible conflicts with his/her local version, initiate the build all test script and get rid of all bugs (if any). We experience continuous integration as one of the most helpful software engineering techniques in our coding process. Pre-commit regression tests are valuable in revealing bugs that break existing code. Since the programmer has only changed only a small part of the system, he/she does not have to look far to find the cause. Bug-chasing is also easier while the coding steps are still fresh in ones memory. In this project, we rely on each team members discipline to do the integrationtime bug removal. But nothing prevents a committer from committing buggy code. In a larger project, it would be beneficial to use a continuous integration server that notifies the committer automatically of an integration error and postpones the actual commit until corrections have been made and integration tests passed.
7.3. General approach to test setup

The following three sections cover the three major test domains resulting from the equivalence partitioning shown in Figure 30. Semantically, each test domain corresponds to a protocol type, or protocol nesting type. For each domain, we introduce a generic test setup for testing all equivalence classes, and exemplify message flows and state transitions for one or two representative equivalence classes. We then give a status overview of completed and non-completed test scenarios as well as test findings.
132
All test scenarios are presented in a service-oriented setting, although every test has a simulated object-oriented counterpart. Methodologically, the setup of test cases in our project is inspired by two white papers published by OASIS -- [AT-Interop, 2004] and [BA-Interop, 2006] -although our use of test applications differs a great deal from what is described in these documents.
Test applications
All tests are set up with the following three test applications:

Test application A: Service Container hosting the parent service, e.g. subsidy application service. Test application B: Coordination service, including the collection of ActivationService, RegistrationService and typed CoordinatorService(s). This application is deployable as a stand-alone protocol component or integrated into an existing service bus. Test application C: Service Container hosting a child service, e.g. eligibility evaluation service.
Roles in test applications
All equivalence test classes use the following roles, as illustrated in Figure 31, Figure 32 and Figure 33 below. The abbreviated names will be used henceforth. 1. 2. 3. 4. 5. PA (Parent Service Application) CA (Child Service Application) CS (Coordination Service) PS1 (ParticipantProtocolService for parent service) PS2 (ParticipantProtocolService for child service)
It is impractical and tedious to list every test scenario in a short chapter like this one. We therefore only mention the most representative test cases, e.g. mainline success and failure scenarios. In the following walkthrough, common details like those mentioned above are omitted for the sake of brevity.
133
7.4. Atomic transaction test

Figure 31 captures the general flow of all atomic transaction test cases. This illustration clarifies the communication between the three applications described in the preceding text. The flow of time is represented from top to bottom in Figure 31. 7.4.1. General message flow
Figure 31: Atomic transaction: generic test setup
Phase 1 (common to all WS-AT test cases): On behalf of PA, PS1 in test application A initiates a coordinated WS-AT activity by sending an activation request to CS in test application B. PS1 then registers for the completion protocol in WS-AT. PA1 sends an application message to CA in test application C, propagating also the CoordinationContext. On behalf of CA, PS2 joins the coordinated WS-AT activity by sending a register request to CS. The coordination protocol that PS2 registers for is scenario specific (e. g. Durable2PC or Volatile2PC). Having completed all business logic, CA sends an application response message to PA indicating that the first phase is completed.
134
Phase 2 (scenario specific): PS1, PS2 and CS execute the scenario-specific WS-AT protocol. Protocol message flows are exemplified below. These message flows utilize the endpoint references (as defined in WS-Addressing) exchanged during the enlistment steps to direct the messages to the correct Coordinator or Participant instance. 7.4.2. Test case examples In this section, we describe a success-scenario test case and a failure-scenario test case. In the discussion below, we include the stepwise state transitions as part of the success criteria. Strictly speaking, these are white-box criteria seen from the end users point of view. However, since these state transitions are an important part of our requirement specification, we have chosen to include them in the system-level black-box test.
Test case AT-1: Commit
Description PS2 registers for Durable2PC. PS1 initiates Commit. A secure communication channel is assumed, meaning all messages will be received and delivered. Success criteria The transaction is committed successfully per 2PC protocol. CS, PS1 and PS2 generate/receive events and exhibit correct state transitions in keeping with the requirements in Table 6. Message exchange, state transition and test result Abbreviations: Comp: Completion protocol 2PC: Durable2PC
Message/Event
1. PA initiates Comp::Commit to PS1 2. PS1 sends 2PC:::Prepare to PS2 3. PS2 sends 2PC::Prepared to PS1 4. PS1 sends 2PC::Commit to PS2 5: PS2 sends 2PC::Committed to PS1 6. PS1 sends Comp::Committed to PA
State transition
Comp: Active Completing 2PC: Active Preparing 2PC: Preparing Prepared PreparedSuccess 1 2PC: PreparedSuccess Committing 2PC: Committing None Comp: Completing None
Result
Ok Ok Ok Ok Ok Ok
Table 6: Test case AT-1: Commit
The transition from the Prepared to PreparedSuccess state indicates the success of back-end persistence, which is omitted from our walk-through but included in the actual tests.
1
135
Test case AT-2: Phase2 Abort
Description This test case tests the case of a Coordinator aborting a transaction due to an Aborted vote during the prepare phase. The test setup is a slightly modified version of the generic case, utilizing two child applications with Participants PS2 and PS3 both registering for Durable2PC. PS1 initiates Commit. PS2 votes to commit, while PS3 votes to abort. Back-end sequencing logic simulates the delivery of PS2s prepared vote before the delivery of PS3s aborted vote to ensure a deterministic state transition. The transaction is aborted. A secure communication channel is assumed, meaning all messages will be received and delivered. Success criteria The transaction is aborted successfully per 2PC protocol. CS, PS1, PS2 and PS3 generate/receive events and exhibit correct state transitions in keeping with the requirements in Table 7. Message exchange & state transition and test result Abbreviations: Comp: Completion protocol 2PC: Durable2PC
Message 1. PA initiates Comp::Commit to PS1 2. PS1 sends 2PC::Prepare to PS2 3. PS1 sends 2PC::Prepare to PS3 4. PS2 sends 2PC::Prepared to PS1 State transition Comp: Active Completing 2PC: Active Preparing 2PC: Active Preparing 2PC (Participant): PreparingPrepared PreparedSuccess 1 2PC (Coordinator): PreparingPrepared 2PC (Participant): PreparingNone 2PC (Coordinator): PreparingNone 2PC (Participant): PreparedSuccessNone 2PC (Coordinator): PreparedNone 2PC: No transition Comp: Completing None Result OK OK OK OK
5. PS3 Resource Manager failure. PS3 sends 2PC::Aborted to PS1 6. PS1 sends 2PC::Rollback to PS2 7. PS2 sends 2PC::Abored to PS1 8. PS1 sends Comp::Aborted to PA
OK
OK OK OK
Table 7: Test case AT-2: Phase2Abort The transition from the Prepared to PreparedSuccess state indicates the success of back-end persistence, which is omitted from our walk-through but included in the actual tests.
1
136
7.5. Business activity test

The general flow of all business activity transaction test cases is captured by Figure 32.
7.5.1. General message flow
Phase 1 (common to all WS-BA test cases): On behalf of PA, PS1 in test application A initiates a coordinated WS-BA activity by sending an activation request to CS in test application B. In contrast with WS-AT, there is no registration for any completion protocol. PA1 sends an application message to CA in test application C, propagating also the CoordinationContext. On behalf of CA, PS2 joins the coordinated WS-BA activity by sending a register request to CS. The coordination protocol that PS2 registers for is scenario specific (e.g. CoordinatorCompletion or ParticipantCompletion). CA sends an application response message to PA indicating that the first phase is completed.
Phase 1 Phase 2
Figure 32: Business activity: generic test setup
137
Phase 2 (scenario specific): PS1, PS2 and CS execute the scenario-specific WS-BA protocol. Protocol message flows are exemplified below. These message flows utilize the endpoint references (as defined in WS-Addressing) exchanged during the enlistment steps to direct to the correct indicator or Participant instance.
7.5.2. Test case examples We list a success-scenario test case and a failure-scenario test case.
Test case BA-1: Close
Description PS2 registers for the CoordinatorCompletion protocol. The Coordinator initiates complete and then close. The transaction should be closed successfully. A secure communication channel is assumed, meaning all messages will be received and delivered. Success criteria The transaction is closed successfully per CoordinatorCompletion protocol. CS, PS1 and PS2 generate/receive events and exhibit correct state transitions in keeping with the requirements in Table 8. Message exchange & state transition
Message 1. PS1 asks CS to complete per proprietary protocol. 19 . CS sends wsba::Complete to PS2 2. PS2 sends wsba::Completed to CS 3. PS1 asks CS to close PS2 per proprietary protocol. CS sends wsba::Close to PS2 4. PS2 sends wsba::Closed to CS State transition wsba: Active Completing wsba: Completing Completed wsba: Completed Closing wsba: Closing Ended Result OK OK OK OK
Table 8: Test case BA-1: Close
The WS-BA standard does not define a communication API between the initiator and Coordinator.
19
138
Test case BA-2: Compensate
Description PS2 registers for the CoordinatorCompletion protocol. The Coordinator initiates complete and then compensate. The transaction should be compensated successfully. A secure communication channel is assumed, meaning all messages will be received and delivered. Success criteria The transaction is compensated successfully per CoordinatorCompletion protocol. CS, PS1 and PS2 generate/receive events and exhibit correct state transitions in keeping with the requirements in Table 9. Message exchange & state transition
Message 1. PS1 asks CS to complete per proprietary protocol. CS sends wsba::Complete to PS2 2. PS2 sends wsba::Completed to CS 3. PS1 asks CS to compensate PS2 per proprietary protocol. CS sends wsba::Compensate to PS2 4. PS2 sends wsba::Compensated to CS State transition wsba: Active Completing wsba: Completing Completed wsba: Completed Compensating wsba: Compensating Ended Result OK OK OK
OK
Table 9: Test case BA-2: Compensate
7.6. Interposed Coordinator test

The general flow of all business activity transaction test cases is captured by Figure 33. Phase 1 (common to all WS-BA test cases): On behalf of PA, PS1 initiates the parent transaction scope. PA1 sends an application message to CA in test application C, propagating the CoordinationContext for the outer scope. On behalf of CA, PS2 joins the parent scope and initiates a new child transaction scope. PS2 drives the inner transaction to a successful termination. Then, CA sends an application response message to PA indicating that the first phase is completed. The protocol type for the outer scope is one of the two BA protocols, CoordinatorCompletion or ParticipantCompletion. The protocol type for the inner scope is scenario specific and could be any BA or AT protocol.
139
Phase 2 (scenario specific): PS1, PS2 and CS execute the protocol for the outer scope. Protocol message flows are exemplified below. These message flows utilize the endpoint references (as defined in WS-Addressing) exchanged during the enlistment steps to direct to the correct indicator or Participant instance. 7.6.1. General message flow
Imp A: Service Container 1 Imp B Imp C: Service Container 2
ParentServiceApp (PA)
ParticipantProtocolService (PS1)
CoordinationService (CS)
ParticipantProtocolService (PS2)
ChildServiceApp (CA)
outer:Activation+Registration
app: Request outer: RegisterRequest outer: RegisterResponse
7.6.2. Test case examples The two test cases listed below are differentiated by the types of the interposed Coordinator as well as the parent transaction outcome.
Test case IC-1: Parent Type = Child Type = WS-BA, Close
Description Both outer scope and inner scope run the BA CoordinatorCompletion protocol. Transaction should be closed successfully. A secure communication channel is assumed, meaning all messages will be received and delivered.
Phase 1 Phase 2
inner: Activation+Registration
inner: Protocol
app: Response
outer scope: wsba
inner: Protocol
Inner scope
application message
Figure 33: Interposed Coordinator: generic setup
140
Success criteria The transaction is closed successfully per CoordinatorCompletion protocol. CS, PS1 and PS2 generate/receive events and exhibit correct state transitions in keeping with the requirements in Table 10. Message exchange & state transition
Message Precondition: PS2 has successfully committed the inner WS-AT transaction. 1. PS1 asks CS to complete per proprietary protocol. 20 . CS sends wsba::Complete to PS2 2. PS2 sends wsba::Completed to CS 3. PS1 asks CS to close PS2 per proprietary protocol. CS sends wsba::Close to PS2 4. PS2 sends wsba::Closed to CS State transition Result
wsba: Active Completing wsba: Completing Completed wsba: Completed Closing wsba: Closing Ended
OK OK OK OK
Table 10: Test Case IC-1: Interposed Coordinator Close
Test case IC-2: Parent Type = BA, Child Type = AT, Compensate
Description Both outer scope and inner scope run the BA CoordinatorCompletion protocol. The transaction should be compensated successfully. A secure communication channel is assumed, meaning all messages will be received and delivered. Success criteria The compensating operation is successfully completed in a new atomic transaction scope. CS, PS1 and PS2 generate/receive events and exhibit correct state transitions in keeping with the requirements in Table 11.
Since the WS-BA standard does not define a communication API between the initiator and Coordinator.
20
141
Message exchange & state transition

Message
Precondition: PS2 has successfully committed inner WS-AT transaction. CS2 has a corresponding compensating operation CS2-Compensate. 1. PS1 asks CS to complete per proprietary protocol. CS sends wsba::Complete to PS2 2. PS2 sends wsba::Completed to CS 3. PS1 asks CS to compensate PS2 per proprietary protocol. CS sends wsba::Compensate to PS2 4. PS2 creates a new WS-AT transaction for CS2-Compensate, but does not join the parent scope this time. PS2 drives the compensating AT transaction to a successful commit. 5. PS2 sends wsba::Compensated to CS
State transition
Result
wsba: Active Completing wsba: Completing Completed wsba: Completed Compensating wsat: Ative None wsba: Compensating Ended
OK OK OK OK OK
Table 11: Test Case IC-2: Interposed Coordinator Compensate
7.7. Stress testing

We have used stress testing to reveal faults that are attributable to insufficient computational resources or unusually high concurrency. Software-level stress tests have been designed by simulating heavy loads in individual unit test methods. To stress the test environment, we place the execution of individual transaction protocols in WS-AT and WS-BA, as well as the interleaving of them, in successive iterations. The majority of the stress tests produce correct results. However, some stress tests with Coordinator interposition resulted in abnormal termination of WCF in the Coordination Framework. WCF throws a spurious timeout error which, after being propagated to the client (the Service Container), causes the clients state machine to halt. Logging and debugging suggests that this timeout tends to occur when a BA enlistment request is received by the Coordination Framework while more than one AT two-phase commit phases are under way. Due to the use of multi-threading, it has been hard to reproduce the error or trace the precise pattern for the timeout, for which only scanty error information is provided by WCF. Since our tests of multi-typed Coordinator interposition work in a nonstressed setting, we suspect that it could be the WCF service engine running on our test laptop that simply gives up, when too many concurrent service invocations are being handled with the per-call instance management mode.
142
Investigations should be made into how WCF handles service throttling in different instance management settings and concurrency modes, as well as what exactly happens in the WCF inbound and outbound service pipelines. Another possible source for this error is the Coordination Frameworks multithreading logic, although we have carefully designed mutually exclusive access to shared resources, such as the CoordinatorMap and PariticpantMap data structures. In the development iterations we have completed in this project, stress tests are used for fault-revealing, rather than fault-masking or fault-tolerating purposes. In particular, our exception handling strategy is of a destructive nature, propagating all faults to the top-level and provoking the receiver of a service response to a halt. In future development iterations, stress tests should also be designed to be fault-tolerating in addition to fault-revealing. For instance, instead of letting a single failing BA participant contaminate the whole system, we should seek to detect, isolate, contain and mask the fault. We define the following criteria for future stress tests that target fault-tolerance, using some of the guidelines mentioned in [Randell, 1978]: No single point of failure. For instance, critical components such as the Coordination Framework back-end can be replicated. In the event of a failure, the exception handling mechanism can trigger a switch to a less-stressed or spare component instead of causing an overall system failure (failover). In addition to this, a performance monitoring mechanism can be used to detect the most stressed component and distribute the load before an error occurs (load-balancing). Fault containment. By containing the fault to a restricted scope, we can avoid the unfortunate situation observed in the present stress test, that a failed Coordination Component propagates the fault to the rest of the system. Ideally, mechanism should be devised to prevent rogue components from spreading the virus to other parts of the system. Graceful degradation. The systems overall performance degrades proportionally to the severity of the failure. Quantitative metrics should be used to measure throughput as a function of system load and failure criticality. Then, either software optimization or hard-ware replication could be used to attain a gracefully degrading performance.
143
Of course, benefits of using fault-tolerating measures to mask failures can be offset by the risk of reducing the importance of repairing the failures. For the stress tests failures we have observed, the first priority should be determining whether they are indeed caused by a stressed operating environment.
7.8. Feedback from Ementor Danmark

As a final milestone, we invited Ementor Danmark to an evaluation session August 23rd, 2007, during which we also made a presentation of the entire project process as well as the prototypes architecture and design. The feedback from Ementor Danmark is positive. Encouraging remarks have been made in relation to the project goal being ambitious, the subject areas being characterized by both breadth and depth, and the prototypes design being theoretically and architecturally well-founded, etc. We have also received compliments for striking a good balance between the academic and the practical aspects and for showing professionalism and dedication to the project. With respect to quality assurance of the prototype, Ementor Danmark has given us many suggestions as to how supplementary tests, e.g. beta tests, pilot tests, robustness tests, etc, could be conducted, before launching a framework solution like ours in a real-world project. Among Ementor Danmarks test proposals is the use of programmer tests to assess the reusability and maintainability of the middleware. Such programmer tests could provide framework designers (i.e. us) with concrete suggestions for improvements, e.g. with regards to how the API between the application layer and the middleware layer can be made more user-friendly and accessible for service programmers.
7.9. Summary: Overall quality evaluation

In this chapter, we provide a systematic evaluation of the prototype implementation. The evaluation process is a balancing act between quality and time. Roughly speaking, we use two different evaluation approaches, testing and qualitative design evaluation. Testing is used to ensure hard quality factors such as correctness, interoperability and robustness. Qualitative evaluation focuses on soft quality factors such as structure, modularity, comprehensibility, reusability, extensibility, and maintainability, which we believe cannot be tested into the software. Instead of being the last phase in the traditional waterfall-style software development life-cycle (analysisdesignimplementationtestevaluation), both testing and
144
qualitative evaluation play an integral part in the entire development process, permeating the analysis, design and implementation phases. Table 12 summarizes the conclusions we have drawn from test findings as well as through the qualitative evaluation of the software design against the quality factors shown in Table 5, p. 123. Our correctness testing covers mainline equivalence classes in three major test domains: atomic transaction, business activity and Coordinator interposition. These tests produced satisfactory results. After testing the individual domains, a larger-scale test was customized for the CAP application and our findings show that the requirement specifications are met. Interoperability is taken into account when designing the solution, but we do not have the time to perform a systematic interoperability test. Robustness is evaluated through stress tests. Some of the stress tests result in WCF abnormal termination, requiring further debugging efforts. These failures indicate insufficient stability of the prototype.
Quality dimension
Functionality
Quality factor
Correctness
Evaluation approach
Test
Testing status
Mainline equivalence classes tested & tests passed N/A
Evaluation
OK.
Functionality
Interoperability
Design for testability Software stress test only Design for testability Design for testability Design Design, Dev. Design, Dev. Design
Functionality Engineering Engineering Engineering Adaptability Adaptability Adaptability
Robustness Structure Modularity Comprehensibility Extensibility Reusability Maintainability
Stress test WCF abnormal termination Easier component and system test Easier component and system test N/A N/A N/A N/A
Design OK. Interoperability tests pending Not OK. OK OK OK OK OK API between app & protocol layer needs improvement
Table 12: Summarized evaluation of quality factors
To achieve structure and modularity, we follow the design for testability principle, aiming at a loosely coupled design to make component and system tests easier to construct. We use code inspection, peer review, and scenario walkthroughs as alternative evaluation approaches to traditional testing methods. To increase comprehensibility of the software we have striven to write self-descriptive and well-documented code. We also use tool support for identifying overly complex methods and classes, and distribute responsibilities
145
between classes using refactoring and design patterns. We are satisfied with the overall structure, modularity and comprehensibility qualities. To evaluate how well the software is geared for future adaptation, we aim extensibility, reusability and maintainability. Extensibility and reusability tests are partly integrated in the development process. For instance, after developing the WS-AT protocol component and plugging it into the generic coordination framework, it is relatively easy to extend the middleware offering with the WSBA protocol component. The implementation of the WS-BA state machines is also a reusability test for the State/Flyweight design patterns. In general, we are happy with the middlewares extensibility and reusability. Regarding maintainability, however, we evaluate the programming API for coding new Container services to be burdensome, in the sense that it assumes protocol knowledge on the part of the service programmer. This is in our view a compromise on the protocol layers transparency and constitutes an area for future improvement. Software testing can never prove that a program contains no errors, but it can strengthen one's faith in the program. [Sestoft, 1998]. Qualitative design evaluation can never prove a bullet-proof design, but it can strengthen our faith in the program. In general, we are confident in the overall quality of our prototype implementation. The failing stress tests can be attributed, in our view, to an implementation error, not a design error. Software evaluation and selection is a learning process. It has not always been easy to practice constructive thinking (as programmers) and destructive thinking (as testers) at the same time. However, during this journey of discovery, our biggest finding has been that engineering the design process to make the end-product associated with less potential defects can be just as effective as, if not more effective than, engineering the testing process.
146
8. Conclusion
8. Conclusion
This chapter synthesizes the analyses and sub-conclusions presented in the preceding chapters into a concluding response to the two questions framed by our problem definition. The synthesis completes the final step in our method triangulation.
Question 1: How can classical distributed transactional processing models be adapted to meet transaction management requirements in web service enabled system integration? To answer this question, three successive theoretical analyses are conducted. First of all, we have analyzed the classical transaction model and surveyed four variants of the commit mechanism: single-machine one-phase commit, distributed two-phase commit, distributed three-phase commit and nested twophase commit. We conclude that the classical model is a good fit for singlemachine transactions and short-lived distributed transactions across closely integrated trust domains. The classical transaction model has the strength of providing the safe and strict ACID guarantees. However, its use of exclusive long-duration locks (pessimistic concurrency control) or tentative update versions (optimistic concurrency control) is in disharmony with asynchronous, long-running transactions, or transactions that require interim results to be visible to concurrent users. As a second step in our theoretical analysis, we have explored the extended transaction model. In essence, the extended transaction model is not one single
147
8. Conclusion
model. Rather, it is a collection of proposals to relax the classical ACID properties. The purpose of ACID relaxations is to adapt the classical model to meet more diversified transactional requirements. In particular, the need for flexible, business-logic driven commit mechanisms challenges the universal applicability of the traditional all-or-nothing atomic commit, which solely depends upon the success or failure status of the Resource Managers. We have argued that the extended transaction model is the first step toward adapting the classical ACID model to the web service world, albeit with the following limitations: (1) only considering the synchronous communication paradigm, and (2) the lack of a unified, alternative model of reliability guarantees to replace the ACID model. As the third step in our theoretical analysis, we have aimed to spell out the new challenges SOA has posed in terms of the management of distributed transactions. This impact analysis is operationalized into eight criteria for designing a web service transaction management middleware. The most important design criteria include the use of a generic context coordination framework to turn stateless legacy services into stateful Participants, the accommodation of both classical ACID transactions and the long-running business transactions, the adoption of counter measures (such as compensational operations) as alternative reliability guarantees, the support for composable transaction models, as well as the leveraging of existing transaction support in legacy systems. The three-step analysis culminates in the setup of a mental reference model that combines the contributions of pre-SOA transaction models and the eight design criteria for SOA transactions, into an overarching mental framework. This mental framework summarizes our answers to the first question in the problem definition. Question 2: What are the key challenges in translating the adapted, theoretical model into a service-oriented middleware solution whose architecture is aligned with the best-practice design principles? We seek to answer this question through the hands-on design and implementation of a service-oriented middleware prototype. The whole process is driven by a top-down approach, entailing three levels. At the first level, we decompose the mental reference model to a logical architectural design, including a Coordination Framework and a Service Container Framework. At the second level, we map the logical architectural framework design to finer-grained software component design. At the third level, we conduct a
148
8. Conclusion
systematic evaluation of the prototype implementation, using complementary approaches: testing and qualitative design evaluations.
two
We combine the object-oriented and component-based design principles with service-orientation. In terms of OO and component-based design, we apply the layering technique, design patterns (State, Command, Flyweight, Faade, etc.), and the Container-principle, to achieve separation of concerns, modularity, comprehensibility and maintainability. Regarding serviceorientation, we adopt a standard-based and contract-driven design approach to attain interoperability. Additionally, we adhere to SOA design principles such as loose coupling, autonomy, reuse, abstraction, and composability, as well as the interceptor principle. The key challenge in the middleware design is to determine the right degree of coupling between the middleware layer and the application layer. In contrast with a traditional middleware layer, which typically can be designed independently of the application layer, the web service transaction middleware layer depends on business rules to determine the transaction outcome. Because of the bidirectional dependencies between the middleware layer and the application layer, we have found it very challenging to design a generally applicable API between the two layers. In our implementation, we have tackled this challenge by utilizing the Command design pattern, formalizing the interweaving between the business workflow at the application layer and the transaction workflow at the middleware layer as synchronization methods. Another design challenge has been the lack of web service standard support for the service contract between a WS-BA completion initiator and the transaction coordinator. We have extended the WS-BA standard with a new service contract in our implementation. However, our ad-hoc extension of the WS-BA standard would not be interoperable with other implementations. The existence of such grey zones in web service standards underlines the fact that just because two SOA middleware solutions both speak XML does not necessarily mean that they are interoperable. With respect to designing interoperable system-level functionalities such as transaction management, the standardization challenge requires a lot of joint effort in the SOA society. Among other design challenges are the construction of back-end tests for communicating state machines, the necessity to distinguish between back-end and service-faade layer concurrency design, and the challenge of making the transaction processing middleware fault-tolerant with gracefully degrading performance.
149
9. Reflections
9. Reflections
It is not part of the project plan to conduct a comprehensive in-depth survey of related work. By making this choice we risk repeating others work or not basing our work sufficiently on earlier contributions. Yet given the time and resources in this project we have tried to strike a balance between theoretical research and hands-on implementation of the prototype. Our major target has not been to present breakthrough academic findings, but rather on a fruitful learning process and exploration into an exciting subject area. Choosing a new technology like WCF means also choosing its immaturity and weaknesses. And because WCF is a high-level programming model, we have to accept all its built-in infrastructure plumbing. This could have performance impact on the implemented software, since it is difficult to configure the WCF infrastructure fine-grained enough to keep only the features we want and nothing more. However, the choice of implementation platform is considered an issue of minor importance in this project. And driven by our ambition to develop a relatively complex prototype to support generic frameworks and pluggable protocols, we have weighed the rapid application development benefit of WCF more than the uncertainty resulting from being early adopters of a relatively new technology. Transaction management in SOA-based system integration straddles theoretical, technological and business domains. Our choices of theories and reference literature is guided by the assumption that industry practitioners have contributed as much to this interdisciplinary area as academic researches. To give the subject an all-round coverage, we have consciously included a palette of theories from both industry and academia, including classical distributed system theories, transaction processing theories, platform-dependent technological contributions, SOA theories and best-practice documentations, as well as system integration related theories and industrial experience summaries. Sometimes, we have taken a theory or best-practice out of its original application context to support our analysis. For instance, some of the SOA theories we have selected stem from industrial white papers and are originally intended as guidelines for implementation projects. It is extremely exciting to journey through a multi-dimensional theoretical space. But needless to say, the ambition to achieve an analytical convergence between academic theories and industrial best practices, between classical theories and practical experience documentations, has been a great challenge to our academic discipline.
150
10. Future work
10. Future work

As we aim to implement a prototype, many possible areas of focus have been left out of the present project scope. Among the counter measures mentioned in Chapter 4 as alternative reliability guarantees to the ACID model, our prototype solely implements the compensating operation counter measure. Future work can be done to deal with the more subtle issue of handling cascading compensations, as well as the possibility of incorporating other counter measures (commutative updates, semantic lock and pessimistic view, etc.) into the transaction middleware. A number of protocols/aspects in the WS-TX standards are intentionally filtered from the scope of this project and are considered areas for future extensions: WS-AT Volatile2PC Protocol and WS-BA ParticipantCompletion Protocol, as well as web service security models defined in the standards. The following implementational aspects are of secondary importance in this project and have, therefore, only warranted a minimal, skeleton implementation: the RecoveryManager component, the WorkflowRuleSet component, as well as data persistence and garbage collection in general. A full-blown implementation of these components is imperative to making the prototype fully functional and industrially deployable. Our hands-on experience shows that the business process workflow and underlying transaction flow are very closely related, risking spaghetti interactions between them. Hence, another exciting area for evolutionary work in the future is how a service orchestration layer which is governed by standard-based workflow definitions (such as BPEL) can be integrated with the transaction processing middleware layer without compromising loose coupling. Resource constraints have made it impossible to test the prototype as thoroughly as its mission-criticality justifies. Pending tests include a subset of protocol sub-domain functionality tests, performance and robustness tests, standard compliance tests, programmer tests, and fault-tolerance tests.
151
11. References
11. References
[Andersen, 2003] Ib Andersen. Den Skinbarlige Virkelighed om vidensproduktion inden for samfundsvidenskaberne. Samfundslitteratur, Frederiksberg, Denmark. [AT-Interop, 2004] OASIS. WS-AtomicTransacdtion Interop Scenarios. November, 2004. [BA-Interop, 2006] OASIS. WS-BusinessActivity Interop Scenarios 1.1(Working Draft). November, 2006. [Beizer, 1995] Boris Beizer. Black-box Testing: techniques for functional testing of software and systems. Wiley, New York, USA. [Bernstein, 1997] Philip A. Bernstein and Eric Newcomer. Principles of Transaction Processing. Morgan Kaufman. [BTP] Business Transaction Protocol (BTP) Committee Specification (2002). OASIS
http://www.oasis-open.org/committees/business-transactions
[Cabrera, 2005] Luis Felipe Cabrera and Chris Kurt. Web Service Architecture and Its Specifications: Essentials for Understanding WS*. Chapter 14. Designated Coordinators Example. Microsoft Press. [Casavant, 1990] Thomas L. Casavant and Jon G. Kuhl. A Communicating Finite Automata Approach to Modeling Distributed Computation and its Application to Distributed Decision-Making. IEEE Computer Society, Washington, DC, USA [Coulouris, 2005] George Coulouris, Jean Dollimore and Kim Kindberg. Distributed Systems Concepts and Design. Pearson Education Limited, Essex, UK. [Chappell, 2004] David A Chappell. Enterprise Service Bus Theory in Practice. OReilly, USA. [Denzin, 1978] Norman K. Denzin. The Research Act. McGraw Hill. [Ementor, 2006-a] Ementor. Arkitekturrelateret kravopfyldelse Tilbud til Direktoratet for FdevareErhverv. [Ementor, 2006-b] Ementor. Use Case Model CAP. [Ementor, 2006-c] Ementor. Data Model CAP. [Ementor, 2006-d] Ementor. Systembeskrivelse Bruttoliste.
152
11. References
[Ementor, 2006-e] Ementor. Presenatation slides to DFFE. [Erl, 2004] Thomas Erl. Service-Oriented Architecture A Field Guide to Integrating XML and Web Services. Prentice Hall. New Jersey, USA. [Erl, 2005] Thomas Erl. Service-Oriented Architecture Concepts, Technology, and Design. Prentice Hall. New Jersey, USA. [Erven, 2007] H. Erven, H. Hicker, C. Huemer and M. Zapletal. The Web ServicesBusinessActivity-Initiator (WS-BA-I) Protocol: an Extension to the Web ServicesBusinessActivity Specification. Presented at the IEEE Int'l. Conference on Web Services, March 2007. [Fowler, 2003] Martin Fowler. UML distilled A brief guide to the standard object modeling language. Pearson Education Inc., Boston, USA. [Fowler, 2006] Martin Fowler. Continuous Integration.
http://martinfowler.com/articles/continuousIntegration.html
[Frank, 2006] Lars Frank. Databases with Relaxed ACID Properties. Copenhagen Business School Press. [Godskesen, 2007] Jens Chr. Godskesen. Formal Concepts in Computer Science. Course compendium at the IT University of Copenhagen, 2007. [GoF, 1995] Erich Gamma, Richard Helm, Ralph Johnson and John Vlissides. Design Patterns Elements of Reusable Objected-Oriented Software. Addison-Wesley. [Hagel, 2002] John Hagel III, Scott Durchslag and John Seely Brown. Orchestrating Loosely Coupled Business Processes: The Secrete to Successful Collaboration. Harvard Business School Press, USA. [Hamlet, 1994] Dick Hamlet. Foundations of Software Testing: Dependability Theory. Proceedings of the second ACM SIGSOFT symposium on foundations of software engineering. [Hasan, 2006] Jeffrey Hasan and Mauricio Duran. Expert Service-oriented Architecture in C# 2005. 2nd ed. APress, California, USA. [Hetzel, 1988] William C. Hetzel. The Complete Guide to Software Testing. 2nd ed. QED Information Sciences, Massachusetts, USA. [IEEE, 1990] IEEE. Standard Glossary of Software Engineering Terminology. (IEEE Std. 610.12-1990). IEEE Computer Soc.
153
11. References
[JBoss, 2006] JBoss. JBoss Transaction Service 4.2.2 Web Service Transaction Programmers Guide.
http://www.redhat.com/docs/manuals/jboss/jboss-eap-4.2/doc/jbossts/JTA_Programmers_Guide.pdf
[Jick, 1989] Todd D. Jick. Mixing Qualitative and Quantitative Methods: Triangulation in Action. Administrative Science Quarterly, Vol. 24, No. 4, Qualitative Methodology (Dec., 1979), pp. 602-611. [Kandula] Apache Kandula
http://ws.apache.org/kandula/
[Kaye, 2003] Doug Kaye. Loosely Coupled - the Missing Pieces of Web Services. RDS Press, California, USA. [Little, 2003-1] Mark Little and Thomas Freund. A Comparison of Web services transaction protocols.
http://www-128.ibm.com/developerworks/webservices/library/ws-comproto/
[Little, 2004] Mark Little, Jon Maron and Greg Pavlik. Java Transaction Processing. Prentice Hall. [Lwy, 2005] Juval Lwy. Introducing System.Transactions. Microsoft Press. [Lwy, 2007] Juval Lwy. Programming WCF Services. OReilly, USA. [Lublinsky, 2003] Boris Lublinsky and Dmitry Tyomkin. Dissecting Service-oriented Architectures. Business Integration Journal, USA. [Mathiassen, 2000] Lars Mathiassen, Andreas Munk-Madsen, Peter Axel Nielsen and Jan Stage. Object-oriented Analysis and Design. Marko Publishing ApS. Aalborg, Denmark. [McConnel, 2004] Steve McConnell. Code Example. Microsoft Press, Redmond, USA. [Moran, 1990] Michael G. Moran. (University of Georgia, Athens) Renaissance
Survey Techniques and the Mapping of Raleigh's Virginia. The Newberry Library, 1990.
http://www.newberry.org/smith/slidesets/ss19.html
[Myers, 1979] Glenford J. Myers. The Art of Software Testing. Wiley, New York, USA. [Mller, 2004] Kasper Mller. Integration of Applications. Thesis report. IT University of Copenhagen, Denmark. [Newcomer, 2004] Eric Newcomer, Greg Lomow.Understanding SOA with Web Services. Chapter 10. Transaction Processing. Addison-Wesley Professional, UK.
154
11. References
[Pan, 1999] Jiantao Pan. Software Testing. Course material, Carnegie Mellon University.
http://www.ece.cmu.edu/~koopman/des_s99/sw_testing/
[Parrington, 1989] Norman Parrington and Marc Roper. Understanding Software Testing. John Wiley & Sons, USA. [Project Tango] Suns Project Tango - Web Services Interoperability Technologies (WSIT)
http://wsit.dev.java.net/
[Randell, 1978] Brian Randell, P.A. Lee, P. C. Treleaven. Reliability Issues in Computing System Design. ACM Computing Surveys. Computing Laboratory,
University of Newcastle upon Tyne, Newcastle upon Tyne, UK.
[Roberts, 2001] J. Roberts and K. Srinivasan. Tentative Hold Protocol Part 1: White Paper. W3C Note 28 November, 2001. [Sestoft, 1998] Peter Sestoft. Systematic Software Test. Teaching Note. Royal Veterinary and Agricultural University, Denmark
http://www.itu.dk/people/sestoft/programmering/struktur.pdf
[Silberschatz, 2002] Abraham Silberschatz, Henry F. Korth and S.Sudarshan. Database System Concepts. McGraw-Hill Higher Education, New York, USA. [Sipser, 1996] Michael Sipser. Introduction to the Theory of Computation. 1st Edition. Course Technology, USA. [Solingen, 1999] R.Solingen and E.Berghout. The Goal-Question-Metric Method. McGraw-Hill Publishing Company, USA. [Tanenbaum, 2006] Andres S. Tanenbaum and Maarten Van Steen. Distibuted Systems Principles and Paradigms. Pearson Prentice Hall, New Jersey, USA. [Torres-Pomales, 2000] Wilfredo Torres-Pomales. Software Fault Tolerance: A Tutorial. Langley Research Center, Virginia, USA. [WCF] Microsoft Windows Communication Foundation framework
http://msdn2.microsoft.com/en-us/netframework/aa663324.aspx
[Webber, 2005] Jim Webber and Mark Little. Introducing WS-Coordination.

http://www2.sys-con.com/ITSG/virtualcd/WebServices/archives/0305/little/index.html
155
11. References
[Weerawarana, 2005] Sanjiva Weerawarana, Francisco Curbera, Frank Leymann, Tony Storey and Donald F. Ferguson.Web Services Platform Architecture: SOAP, WSDL, WS-Policy, WS-Addressing, WS-BPEL, WS-ReliableMessaging and More. Chapter 11. Transactions. Pearson Prentice Hall, New Jersey, USA. [WS-AT] Web Services Atomic Transaction (WS-AtomicTransaction) v1.1. OASIS, April 2007
http://docs.oasis-open.org/ws-tx/wsat/2006/06
[WS-BA] Web Services Business Activity (WS-BusinessActivity) v1.1. OASIS, April 2007
http://docs.oasis-open.org/ws-tx/wsba/2006/06
[WS-CAF] Web Services Composite Application Framework Committee Draft (2003). OASIS
www.oasis-open.org/committees/ws-caf
[WS-COOR] Web Services Coordination (WS-Coordination) v1.1.

OASIS, April 2007
http://docs.oasis-open.org/ws-tx/wscoor/2006/06
[WS-TX] Web Services Transaction Technical Committee, OASIS

http://www.oasis-open.org/committees/ws-tx/
[Yin, 2002] Robert K. Yin. Case Study Research Design and Methods. 3rd Edition. Sage Publications Inc. California, USA. [Younas, 2006] Muhammad Younas and Kuo-Ming Chao. A tentative commit protocol for composite web services. In Journal of Computer and System Sciences, Volume 72, Issue 7, Pages 1226-1237, November 2006. [Zhao, 2005] Wenbing Zhao, L. E. Moser and P. M. Melliar-Smith. A ReservationBased Coordination Protocol for Web Services. In Proceedings of the IEEE International Conference on Web Services (ICWS05). IEEE Computer Society, 2005.
156
12. Appendices
12. Appendices
A. List of acronyms
.NET 2PC ACID ADO API BMT BPEL BTP CAP CMT CORBA CRM CRUD DFFE DTC EAI EJB ERP ESB ESDH GQM GUI IIS IP Java EE JAX-RPC JAX-WS JDBC JMS JTA JTS MOM MQ OASIS Microsoft's runtime and development environment Two-phase commit Atomicity, Consistency, Isolation, Durability
ActiveX Data Objects

Application Programming Interface Bean-managed transaction demarcation Business Process Execution Language Business Transaction Protocol Common Agricultural Policy Container-managed transaction demarcation Common Object Request Broker Architecture Customer Relationship Management Create, Read, Update and Delete
Directorate for Food, Fisheries and Agricultural Business (Danish)

Distributed Transaction Coordinator in .NET Enterprise Application Integration Enterprise Java Bean Enterprise Resource Planning Enterprise Service Bus Electronic Case and Document Handling System (Danish) Goal-Question-Metric Graphical User Interface Internet Information Services Internet Protocol Java Enterprise Edition Java API for XML-based RPC (defined in J2EE 1.4) Java API for XML Web Services (defined in Java EE 5) Java Database Connectivity Java Message Service Java Transaction API Java Transaction System Message Oriented Middleware Message Queue Organization for the Advancement of Structured Information
157
12. Appendices
Standards OO Object orientation | Object-Oriented RAID Redundant Array of Independent/Inexpensive Disks RM Resource Manager RMI Remote Method Invocation RPC Remote Procedure Call RUP Rational Unified Process SDK Software Development Kit SLA Service Level Agreement SOA Service-Oriented Architecture SOAP Simple Object Access Protocol TCP Transmission Control Protocol | Tentative Commit Protocol THP Tentative Hold Protocol WCF Windows Communication Foundation WPF Windows Presentation Foundation WS-AT Web Services Atomic Transaction WS-BA Web Services Business Activity WS-BA-I Web Services Business Activity Initiator WS-CAF Web Services Composite Application Framework WS-CF Web Services Coordination Framework (part of WS-CAF) WS-COOR Web Services Coordination Framework (part of WS-TX) WS-CTX Web Services Context (part of WS-CAF) WSDL Web Services Description Language WSIT Web Services Interoperability Technologies WS-TX Web Services Transaction WS-TXM Web Services Transaction Management (part of WS-CAF) X/Open CAE X/Open's Common Application Environment XA X/Open's specification for distributed transaction processing XML Extensible Markup Language
158
12. Appendices
B. State machine diagrams: WS-AT
WS-AT Completion protocol abstract state diagram
WS-AT 2PC protocol abstract state diagram
159
12. Appendices
C.State machine diagrams: WS-BA
BusinessAgreementWithParticipantCompletion abstract state diagram
BusinessAgreementWithCoordinatorCompletion abstract state diagram
160
12. Appendices
D. Example of peer review: walkthrough of WS-AT abort scenario

Date: July 4, 2007 Place: ITU (2A42)
161
12. Appendices
E. Source code
The source code is separately attached, both in print and as a CD. Page references are given below for the top-level namespaces in each C# project. Common
Common.DataContracts Common.FaultContracts Common.Imp Common.Interfaces Common.Logging Common.ServiceContracts Common.Util ContainerFramework ContainerFramework.Participant ContainerFramework.State ContainerFramework.Threads ContainerFramework.Tx ContainerFramework.Util
1
2 26 38 51 89 91 103 112 113 128 191 199 212
ContainerService
ContainerService.BusinessServiceProxy ContainerService.Commands ContainerService.ProtocolService ContainerService.Threads
239
240 262 299 307
CoordinationFramework
CoordinationFramework.AT CoordinationFramework.BA CoordinationFramework.COOR CoordinationFramework.State CoordinationFramework.Threads CoordinationFramework.Util
316
317 328 338 352 428 434
CoordinationService BusinessServices
BusinessServices.domain
446 460
471
162

Distributed Transaction Management in SOA-based System Integration

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Distributed Transaction Management in SOA-based System Integration

Hochgeladen von

Copyright:

Verfügbare Formate

Distributed Transaction Management in SOA-based System Integration

M.Sc. Thesis September 3, 2007 Advisor: Arne John Glenstrup

Xin Yao xin[at]itu.dk

Mikkel Byrs Dan-Rognlie decorus[at]itu.dk

1.3. THE CAP CASE ......................................................................................................................... 15

1.4. RELATED WORK ....................................................................................................................... 20

2. SOA-BASED SYSTEM INTEGRATION .............................................................. 28

2.3. COMMUNICATION PARADIGMS IN SOA ............................................................................... 34 2.4. SUMMARY .................................................................................................................................. 36

3. CLASSICAL TRANSACTION PROCESSING MODELS .......................................... 37

3.2. CLASSICAL TRANSACTION MODEL ......................................................................................... 39

3.3. EXTENDED TRANSACTION MODEL ....................................................................................... 42

3.4. SUMMARY .................................................................................................................................. 46

4. WEB SERVICE TRANSACTION PROCESSING MODEL ....................................... 47

4.2. WS-TRANSACTION STANDARD ............................................................................................... 61

4.3. A REFERENCE MODEL FOR WEB SERVICE TRANSACTION MANAGEMENT ........................ 68

4.4. SUMMARY .................................................................................................................................. 73

5. SOA TRANSACTION MIDDLEWARE PROTOTYPE: ARCHITECTURE ................ 74

5.6. SUMMARY .................................................................................................................................. 94

6. SOA TRANSACTION MIDDLEWARE PROTOTYPE: IMPLEMENTATION ........... 96

6.3. MODELING BEHAVIOR .......................................................................................................... 102

6.4. OTHER IMPLEMENTATIONAL ASPECTS ............................................................................... 115

6.5. DEPLOYMENT INSTRUCTIONS .............................................................................................. 118

6.6. SUMMARY ................................................................................................................................ 120

7. SOA TRANSACTION MIDDLEWARE PROTOTYPE: TEST AND EVALUATION ...121

7.2. TEST STRATEGY...................................................................................................................... 127

7.5. BUSINESS ACTIVITY TEST....................................................................................................... 137

7.6. INTERPOSED COORDINATOR TEST ...................................................................................... 139

1.1. Problem definition

The theoretical focus

Figure 1: Overall research strategy: multi-method triangulation

1.3. The CAP case

8. sampling & control service

[precondition:case registered and evaluated]

Figure 2: Web service landscape in the CAP case

Figure 3: A zoom-in on the eligibility evaluation service workflow

1.4. Related work

GlassFish is an Open Source, Community Based implementation of Java EE 5.

1.6. Assumptions about the reader

1.7. Report structure

2. SOA-based system integration

2. SOA-based system integration

2. SOA-based system integration

2.2. System integration approaches

2. SOA-based system integration

2. SOA-based system integration

2. SOA-based system integration

2. SOA-based system integration

Figure 4: ESB architecture - a simplied view

(source: [Chappell, 2004], adapted from [Mller, 2004])

2. SOA-based system integration

2.3. Communication Paradigms in SOA

2. SOA-based system integration

2. SOA-based system integration

3. Classical transaction processing models

3. Classical transaction processing models

3.1. Failure models

3. Classical transaction processing models

Redundant Array of Inexpensive/Independent Disks.

3. Classical transaction processing models

3.2. Classical transaction model

3. Classical transaction processing models

3. Classical transaction processing models