Sie sind auf Seite 1von 108

Final year project

Objects in the Cloud

Revision : 185

Author: Supervisor:
Geerd-Dietger Hoffmann Ruth Pitman

May 21, 2009


Abstract

Cloud computing is rapidly gaining the interest of service providers, programmers and the
public as no one wants to miss the new hype. While there are many theories on how the cloud
will evolve no real discussion on the programmability has yet taken place. In this paper a
programming language named objic is described, that enables programs to run in a distributed
manner in the cloud. This is done by creating an object orientated syntax and interpretation
environment that can create objects on various distributed locations throughout a network
and address them in a scalable, fault tolerant and transparent way. This is followed by a
discussion of the problems faced and an outlook into the future.

i
Legal

Copyright
The copyright is held by Hoffmann Geerd-Dietger, (May 21, 2009)

This paper is licensed under the Creative Commons “Attribution-Share Alike 2.0 UK: England
& Wales” License. To view a copy of this license, visit http://creativecommons.org/licenses/by-
sa/2.0/uk/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco,
California, 94105, USA.

The code is published under the P-BSD license that can be found under Appendix B on
page 70

Clarification
This document reflects solely the opinion and views of the author stated above and does not
represent the views, opinions or standpoint of the University of Bournemouth in any way or
form.
For simplicity this paper always uses the masculine form. This has nothing to do with the
gender of the people that are talked about. Apologies if this insults the reader.

University rights
This report is submitted in partial fulfilment of the requirements for an honours degree at
the University of Bournemouth. The author declares that this report is their own work and
that it does not contravene any academic offence as specified in the university regulations.
Permission is hereby granted to the University to reproduce and to distribute copies of this
report in whole or in part.

Signature:

Hoffmann Geerd-Dietger, Bournemouth May 21, 2009

Word count: 16481

ii
Acknowledgments

I would like to acknowledge and extend my gratitude to Ruth Pitman for her support, ded-
ication towards the students and ongoing advice. Further I would like to thank my family
without their understanding, patience and encouragement I would not be where I am now in
life.

I am also grateful to my girlfriend for her unconditional love, even in rough times.

Further my appreciation goes to Dan, Tom, Edd, Dave, Elliot, Ivan, Cornelius, Laurie, David
and everyone who hoped his name would be here.

iii
Contents

Abstract i

Legal ii

Acknowledgments iii

1 Introduction 1
1.1 Statement of Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Project Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.2 Personal Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3.1 Project Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3.2 Tools Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.3 Programming Language . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4.1 Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4.2 Change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4.3 Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4.4 Choice of Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4.5 Operating System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Good Practise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5.1 Licensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5.2 Backup Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5.3 Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.6 Development Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.7 Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Literature review 10
2.1 Definition: “What is the Cloud?” . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Definition: “What is an Object” . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Objects in the Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Compilers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5 Similar Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5.1 SOAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5.2 CORBA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

iv
CONTENTS

2.5.3 dSelf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.6 Solution Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3 Requirements 19

4 Design of System 21
4.1 Discussion on Antlr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Grammar Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3 Standard Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3.1 ME keyword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3.2 ARGS Keyword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.4 objic Class Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.5 Object Location Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.6 ObjectServer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.7 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.8 Object Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.9 Object State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.10 Garbage Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.11 Known Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.11.1 Object Version Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.11.2 Inheritance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.11.3 Local / Public variables, methods, classes . . . . . . . . . . . . . . . . . 31

5 Implementation 33
5.1 Discussion of Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.1.1 objectServer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.1.2 objicc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.1.3 orun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.2 Object Server Class Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.3 Class descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.3.1 Initializer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.3.2 Interpreter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.3.3 MethObj . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.3.4 ObjConnection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.3.5 ObjManager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.3.6 oClass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.3.7 oObject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

v
CONTENTS

5.3.8 oInt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.3.9 oString . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.3.10 oVm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.3.11 RequestHandler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.4 Logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.5 Error Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.6 VOID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.7 Duck-Typing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.8 Abstract Syntax Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.9 Keywords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.10 Implementation Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.11 Finished Artifact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.11.1 Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.11.2 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

6 Testing 47
6.1 Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.2 Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.3 Code Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.4 Code Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.5 Recording Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

7 Critical Evaluation / The objic Language 52


7.1 Evaluation of Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
7.1.1 Language Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7.1.2 Language Behaviour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
7.2 Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
7.3 Development Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.5 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.6 Project Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
7.7 Personal Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

8 Future Work 57
8.1 Short Term . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
8.2 Long Term . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

vi
CONTENTS

9 Conclusion 61

10 List of Abbreviations 63

A Appendix 69

B License 70

C Antlr Syntax 71

D Gantt Chart 76

E INSTALL 77

F CD Content 79

G Backup Script 80

H For Loop 81

I SOAP/HTTP Comparison 82

J Programming the Cloud 83

K Grammar Description 87
K.0.1 block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
K.0.2 call . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
K.0.3 classdef and methdef . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
K.0.4 whileloop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
K.0.5 forloop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
K.0.6 newvar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
K.0.7 paramlist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
K.0.8 stat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
K.0.9 NAME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
K.0.10 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

L Man Pages 93

M Code Example 97

N Design Diagrams 99

vii
“The World Wide Computer”
Nicholas Carr

1
Introduction

This chapter will introduce the reader to the central problem discussed in this paper, followed
by the approach and methods used in trying to solving it.

1.1 Statement of Problem


Cloud computing is seen to bring together many services that are provided through the “world
wide computer” [Carr, 2009]. A trend to multifunctional environments is currently taking
place on the operating system kernel level encouraged by new virtualization techniques (see
XEN, VMware, OpenBox). On the other hand, on the highest level of abstraction, object
orientated notations and ideas are mostly used [Hayes, 2008]. The general concept is that
once the cloud provider is chosen, a lock-in to their techniques and libraries occurs. Service
compatibility is then achieved by adding specific output filters to the program (see SOAP,
REST in section 2.5.1 on page 15), which emulate object usage. This results in that every
Software as a Service (SaaS) provider creates his own format. Other programs then have to
retrieve this information and parse it accordingly and create local object representations, if
they want to communicate with this service. This creates many difficulties especially when the
format has to change [Emmerich, 2000]. By these methods, both ends of a cloud service stack
have become scalable, or in a nutshell “cloud enabled” [Beard, 2008]. Since the important layer
of compilers and interpreters and as such the program constructs, have been neglected in the
past few years, it is still the case that to use other services of a cloud provider, the programmer
has to include some specific library or write the interface himself [Haggholm, 2007]. Efforts to
make compilers and/or interpreters more “cloud friendly” have only resulted in non-complete
products (see dSelf in section 2.5.3 on page 17) and are not generally used. As seen by
the success in the usage of SOAP and the object orientated paradigm, an object oriented
distribution approach bears many advantages for the cloud, but has not been implemented in
the layer of programming languages yet.

1
CHAPTER 1. INTRODUCTION

1.2 Objectives

1.2.1 Project Objectives

• The project will create an innovative Turing complete [Abelson, 2001] object orientated
programming language that enables and promotes distribution of objects throughout
a network. The core principle of the language will be that it will make no difference
to the syntax of the code if the object is initialised locally or on an unknown resource
indicated by an URL (Uniform Resource Locator). The syntax of the language should
seem familiar to any C, Python or Java programmer.

• Provide the basis for a discussion of how and if distributed objects can be used for cloud
programming purposes.

1.2.2 Personal Objectives

• Gain a sound understanding of compilers, interpreters and the technology involved.

• Understand the issues and problems associated with distributed computing and try to
find solutions.

• Define cloud computing and gain knowledge about the general topic.

• Become familiar with Python and the tools linked to it.

1.3 Methods

1.3.1 Project Aim

The project tries to create a novel object orientated programming language that acts as a
layer of glue between the hardware cloud providers and the presentation of the user interface
where objects are already emulated and used. It should be possible to use an array of services
provided in the cloud, through published objects, in an independent and transparent way. It
should further encourage people to offer a service to other users, through letting other people
instantiate the objects they have written. In the current situation, if someone has written
a good encryption library, for example, he is forced to use non standard methods to write
a web service that makes this library usable. By using the language created in this project,
publishing this library through a well defined interface and securing the intellectual property
by keeping all execution in the server, should be enabled and encouraged. A further aim is to
make it easy to incorporate services provided by different providers in a scalable, fault tolerant

2
1.3. METHODS

and traceable way. Despite that no attempt, known to the author, has been made so far to
implement anything in this way. A discussion of similar techniques is needed to enable an
objective perspective. This evaluation will be followed by an outlook into the future.

1.3.2 Tools Used

(All links were accessed on May 21, 2009)

• ANTLR (http://www.antlr.org)
ANTLR (ANother Tool for Language Recognition) is a parser and lexical analyzer gen-
eration tool. This tool will be used to generate an AST (Abstract Syntax Tree) from the
source code. To do this, it uses LL(*) parsing and has proven in many industry projects
to be highly reliable. The syntax is specified in a EBNF (Extended Backus-Naur Form)
like form. Then, different tree walking algorithms will be used to optimize and execute
the code statements.

• Subversion (http://subversion.tigris.org)
Subversion (SVN) is a version control system. It can easily manage modifications, recov-
ery and versioning of files. It is considered to be one of the industry standards next to the
Concurrent Versions System (CVS). It will be used to keep track of the project changes
and synchronise everything on different computer systems. Everything produced in the
process of the final year project will be imported into this system. Once the report is
finished this will also allow other contributors to add their code and ideas to the project.

• Trac (http://trac.edgewall.org)
Trac is a project management tool that supplies a wiki, issue tracing, roadmap and SVN
front-end. This will be used to record the project milestones, create a wiki of pages that
have influenced decisions and the timeline will be used to estimate the completion of
tasks. A further application will be to record faults and errors that need to be fixed in
the written report and in the source code.

• Eclipse (http://www.eclipse.org)
Eclipse is an integrated development environment (IDE) which will be used with the
pydev plug-in to write the source code. Some of the main features that influenced the
decision to use this tool were automatic code completion, error checking, test coverage
checking and platform independency.

• GNU/Linux, CentOS (http://www.kernel.org ,http://www.centos.org) Linux is an Open


Source Unix-like operating system that uses the GNU libraries and programs to create

3
CHAPTER 1. INTRODUCTION

a fully functional operational environment. There are many Linux flavours and one
of them is CentOS which is derived from the Red Hat Enterprise distribution. Linux
offers a full development environment and runs all the programs needed to develop this
project.

• LATEX (http://www.latex-project.org)

LATEX is a typesetting environment that is based on TEX. The idea behind TEX is that
the author should concentrate on the content and not on the mark-up. It can also auto
generate indexes, bibliographical references and content pages.

• pylint (http://www.logilab.org/857)

Pylint is a static source code analysing tool. It looks at the program syntax and tries
to find errors and coding standard violations. It further looks for code smells [Fowler
et al., 1999a]. A very useful feature is that it rates your code and gives it a mark out of
10. This will be used to analyse the code quality.

• PyChecker (http://pychecker.sourceforge.net)

PyChecker is a dynamic runtime checker. It executes the code and looks for errors that
might occur but are not caught based on that Python is a very dynamic language. It is
very useful for development but will not help in evaluating the code.

• Tools also used Vi, Emacs

All benchmarks obtained for this paper were created by running the program discussed 25
times and using the average execution time (using the atime program). The system used was
an AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ with 2GB of main memory running
CentOS 5.3 GNU/Linux.

1.3.3 Programming Language

Normally compilers and interpreters used to be written in C [Aho et al., 2006] and all ma-
jor programming languages still are (Java, C, Python, etc). However in recent years, al-
ternative languages have been eroding this monopoly, based on their teaching at Univer-
sities [Appel, 2002]. Because of the time constraints, the decision was made to write the
project in Python. Python is a multipurpose, multi paradigm, high level, object oriented
language. By enabling rapid application development and having many built-in modules (see
http://www.Python.org) Python increases the output a programmer can produce [Ousterhout,
1997].

4
1.4. CONSTRAINTS

1.4 Constraints

1.4.1 Time

Based on the complexity of the problem and the amount of research needed, time can be
considered to be one of the toughest constraints. Because of this, the application will merely
be a proof of concept and not include any run time optimizations. It can be assumed that
the decision to use Python as a programming language will also influence the runtime of the
compiler and interpreter drastically. As the time frame is so narrow only a Turing complete
language will be created that will not have many libraries for disposal.
A Gantt-chart was used to document progress and account for slack and tasks ahead (see
Appendix D on page 76).

1.4.2 Change

As the topic is still an active field of research and not well understood, the background reading
and design will change while developing the report and artefact. It can be expected that some
new techniques will be published while work on the project progresses. Due to this, the project
will be developed in an iterative approach (see “Development approach” in 1.6 on page 7)

1.4.3 Knowledge

Based on that many areas the compiler will be using are not fully researched yet, there will
be problems in clearing some hurdles encountered. Distributed memory management, for
example, is something that is not well understood but has to be performed. This may inflict a
constraint on the project as some solutions have to be implemented that may not be complete.

1.4.4 Choice of Language

Based on previous research, Python was chosen as implementation language which might cause
some problems. Python is known not to be as memory efficient as C and program execution
time might increase due to not being able to optimize certain constructs. As execution time
and memory size are not vital for the success of the project this is a small constraint.

1.4.5 Operating System

As all the development is done on UNIX / Linux machines it cannot be assumed that the
software will run on any other Operating System.

5
CHAPTER 1. INTRODUCTION

1.5 Good Practise


It is an aim to comply with the BCS good practise code (see http://www.bcs.org/upload/pdf/cop.pdf)
throughout the whole project.

1.5.1 Licensing

Great care has been taken throughout this project to only use Open Source Software. No
“non-free” software has been used to create any part of the project. This is also reflected
in the fact that the software produced is licensed under the 5 paragraph P-BSD (Pacifist
Berkeley Software Distribution) license. This is a derivate of the original BSD (Berkeley
Software Distribution) license. The BSD license is one of the oldest licences around (released
1990) and is considered to be quite close to publishing it under the public domain. The license
allows proprietary use and the source derived does not have to made public. The licence used
in this project has following main points:

• Copyright is held by Geerd-Dietger Hoffmann.

• Copyright notice is not allowed to be removed from code.

• Binary form must include copyright notice.

• Advertisement mentioning the software must contain acknowledgment.

• The author holds no responsibilities for any damages caused by the software.

• The software is not allowed to be used to harm any other human being.

The full license is printed in Appendix B on page 70

1.5.2 Backup Strategy

As mentioned above all work produced was held in a SVN repository. Through this approach
backups are made with every repository update as all the data is copied and updated in all
repositories held on the computers that the work was done on. To further strengthen the
security of the backup strategy the script found in Appendix G on page 80 was used. This
first checks that everything is up-to date and will exit with a warning if not, otherwise it will
copy all the data to European Organization for Nuclear Research (CERN) and to another
off-site location. By applying this strategy it is possible to hold seven copies of the data in
geographical and system independent locations.

6
1.6. DEVELOPMENT PROCESS

1.5.3 Documentation

Documentation has two roles in this project: Firstly the code should be well documented and
secondly the external project or program documentation should be concise. In both cases it
is important to have precise descriptive comments and all written material should be easy to
understand.

1.6 Development Process


As the topic is of a highly complex nature, an iterative development process was chosen. This
approach [Bittner and Spence, 2006] consists of four main stages and builds upon reworking,
refactoring [Fowler et al., 1999b] and extending the already existing source code. Methodolo-
gies like Waterfall and V were considered, but not used as they were seen to be too inflexible.
The four main stages can be repeated as many times as needed and build upon each other.
Therefore, the output of the last iteration is the input of the next. The SVN tools were very
useful here, as it was possible to log the output of every iteration and see the continuous
change of the project.
The four stages can be defined as:

1. Requirement gathering

In this stage it has to be clarified what part or artefact should be produced in this
iteration. The general guideline was that no iteration should take longer than one
day. After defining the piece of code that should be written, a little description of the
functionality was made as source code comments.

2. Design + Implementation

In this stage the modules that needed changing were identified and comments were
placed in the appropriate locations in the code. Further, files were created and filled with
comments describing the functionality. After reviewing the changes and the implication
on the system, the comments were replaced with code.

3. Testing

As the iterative steps were well defined, testing was done with a little input file that was
extended to include the functionality added.

4. Reworking

After verifying the correctness of execution, Pylint and PyChecker (see section 1.3.2 on
page 4) were used to evaluate the quality of the code and optimization possibilities were

7
CHAPTER 1. INTRODUCTION

explored. Further the code was reviewed before committing to the source tree. This was
done through a shell script that showed all the changes made, to the code, before the
iteration.

The iterations can be grouped in following continuous main steps:

1. Parsing

2. Interpreting

3. Object instantiation

4. Objet Server

5. Distributed object instantiation

6. Base object

7. Classes

8. Stacks

9. Conditionals

10. Memory management

11. While

12. Break

13. For

14. User functions

15. Return

16. Everything running in Object server

17. User classes

1.7 Layout
• Chapter 1 on page 1

In this chapter a brief introduction of the problem domain and the methods used to
solve it are given.

8
1.7. LAYOUT

• Chapter 2 on the next page

This section explores the literature associated to the project and explains the major
terminology used.

• Chapter 3 on page 19

Some very high level requirements are discussed in this chapter.

• Chapter 4 on page 21

This section introduces the design decisions taken and discusses the issues involved.

• Chapter 5 on page 33

The implementation details are introduced and the class structure is described in more
detail in this section of the report.

• Chapter 6 on page 47

In this chapter the testing strategy is introduced and the methods used are described.

• Chapter 7 on page 52

A critical evaluation of the program and the personal performance is carried out in this
chapter.

• Chapter 8 on page 57

In this section an outlook into the near and far future is provided.

9
“It does not matter how many
books you have, but how good
the books are which you have”
Seneca

Literature review
2
This chapter will start in explaining the terms the paper title consists of, followed by an
introduction to similar technologies and ways of thought which will conclude in a discussion
of the proposed solution.

2.1 Definition: “What is the Cloud?”

Cloud computing is said to be one of the biggest shifts ever seen in the way computers
are used [Carr, 2009], but first it has to be clarified what “the cloud” stands for and how
a cloud can compute. The term “cloud” was coined based on the image of a cloud for the
internet which should resemble a large amount of anonymous, interlinked computers [Miller,
2008] (Figure 2.1).

Figure 2.1: A typical network diagram using a cloud

In essence this means that a “cloud” of computers and/or servers acts and reacts as a single
computer [Breitter and Behrendt, 2008]. These computers can be owned by a big company and
as such be housed in big server farms, can be personally owned home machines or virtualized
resources [Buyya et al., 2008]. The important thing is that this conglomerate of machines
can be accessed via the internet. Lots of synonyms have been associated with the cloud like

10
2.1. DEFINITION: “WHAT IS THE CLOUD?”

Utility Computing (UC), Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and
Software as a Service (SaaS) [Armbrust et al., 2009]. To discuss the topic in more detail, the
ambiguous term “cloud computing” has to be divided into two categories:

• Storage

Data storage forms the base of all computing, this is one of the main requirements to be
able to process anything. In terms of cloud computing “cloud storage” can be defined as
data being saved on multiple third party servers [Beard, 2008]. The storage appears to
the user as one coherent block of space that he has for his use. One of the most used stor-
age providers is the Amazon S3 (Simple Storage Service)
R service, which charges the user

dynamically based on usage by a metric consisting of upload/download and data held.


For the user, the storage seems unlimited and is only bound by the amount of money he
can pay. The user does not know where the data is housed, and it is known that Amazon
holds redundant copies in different countries (http://aws.amazon.com/). This, of course,
holds some risks for companies as laws and regulations might change from country to
country, but this issue will not be discussed here. Further, as currently seen with the
Google document error, by uploading data into the cloud it might become involuntarily
accessible to the whole world (http://googledocs.blogspot.com/2009/03/on-yesterdays-
email.html). However the feature of being able to share all documents that “live” in the
cloud is seen as one of the great advantages [Hayes, 2009].

By uploading data into a cloud storage service, data security (loss, corruption, access)
is outsourced to the storage provider. There are many such storage providers but they
all conform in that they offer online accessible storage with the actual implementation
hidden from the user [Hayes, 2008].

• Software

Cloud programs are very similar to Software as a Service in that they are hosted on-
line and mostly accessible through a web browser. However, they are different in the
respect that the underlying hardware is not always provisioned by the creator of the
service. Software as a service is a well researched area [Menken, 2008], but Utility Com-
puting is just at the beginning. By having the administration of the services outsourced
maintenance and software installation are greatly simplified. There are two main ar-
eas of thought here: The first one is the way Amazon is taking. It is possible to buy
time on a virtual machine which can then be installed and configured as needed. If
the service needs more calculation power more time can be bought on that virtual ma-
chine. Further scaling horizontally, with only money posing as a boundary, is possible by

11
CHAPTER 2. LITERATURE REVIEW

adding machines through a web interface. The other way of thought is the way Google
R

and Microsoft
R are proposing (see http://googleenterprise.blogspot.com/2009/04/what-

we-talk-about-when-we-talk-about.html), in this context the developer has to write his


program in a special programming environment and using vendor specific libraries. This
makes the maintenance, scalability and installation very easy as it is independent of the
users (see http://www.ibm.com/developerworks/linux/library/l-cloud-computing).

So cloud computing can be defined by accessibility on the internet; mostly through a


browser; nearly infinitive pool of resources; horizontal scalability and dynamic payment [Foster
et al., 2008].
Finally, it has to be stated that cloud computing is not grid or grid computing. Clouds
may have properties similar to the grid and can internally use grid software to manage the
underlying architecture but the cloud consists of a stack of services whereas grid computing
is one layer of this stack [Delic and Walker, 2008].

2.2 Definition: “What is an Object”


In “traditional languages” like C or FORTRAN a data structure is derived and then functions
are created to modify this structure in some predefined way. There is a clear separation
between a data structure and a function [Holmes, 1994]. Functional programming can be seen
as a list of statements to execute on a data structure. This methodology is changed in the
object orientated way of thinking [Parnas, 1972]. Here the data structure and the functions
are shielded or “encapsulated” by an interface from any other part of the program [Parsons,
1997], the internal state and structure is “hidden” [Kuechlin and Weber, 2000]. The object
is modified by the publicly invokable methods and remembers the state it is in after those
methods are executed. Objects normally correspond to real life artefacts and try to model
their behaviour [Jacobson, 1992]. A “car” object for example might have a “drive” method but
adding another wheel would not be possible, as there is no method provided to do so. An
object oriented program can be imagined as many objects calling methods from each other
and having more or less “intimate” relationships, an object may also consist out of a range of
other objects [Aho et al., 2006]. Many objects have the same “blueprint” or internal structure
so they can be grouped into a “class”. A class is the definition of an object which is then
instantiated to be able to hold the state. A class might be called “Car” where the instances
are “Fred’s VW”, “Bob’s Ford” and “Didi’s Aston Martin”.
There is no discussion of Polymorphism or Inheritance included here as this would extend
the topic too much. Good literature about this is [Holmes, 1994] [Abelson, 2001] [Arlow and
Neustadt, 2005] [Pilone and Pitman, 2005]

12
2.3. OBJECTS IN THE CLOUD

2.3 Objects in the Cloud


The object oriented view is one of the programming paradigms most used today, further it is
very suitable for distribution. As it is possible to view the components of a system as objects,
which are the smallest entity of data and functionality that possess a strictly defined interface,
the communication between them can be easily modified. This idea is nothing new [Emmerich,
2000] but moving objects into the cloud is a new idea about which nothing, to the current
date, has been published. In the history of distributed objects it was always important where
the objects resided, based on the strong research area of big cooperate companies. Further,
execution time was seen as a crucial evaluation point. This focus resulted in a limited adoption
of distribution or only in very specific cases. Distribution ideas were further misused as output
filters for already existing programs and two such areas of thought were mixed, resulting in
the CORBA technology.

2.4 Compilers
A compiler is a program that reads a well defined source language and outputs a related
target language. This target language can be an executable, that can be run directly on an
architecture or a byte code, an Abstract Syntax Tree, or similar which can be interpreted. To
be able to translate there are four distinctive steps as seen in Fig. 2.2.

Figure 2.2: Translation of a print statement

1. Lexical Analysis

13
CHAPTER 2. LITERATURE REVIEW

In this step, the Lexical analyzer or “scanner” reads the source program and creates
meaningful tokens out of the characters. This means it tries to split the input up into
little lexemes.

2. Syntax Analysis

The parser uses the tokens to create a tree-like structure which is normally called a Parse
Tree. This is created based on a set of rules which describe how the syntax is recognized
and how the tree should be created. The regular appearance of such a tree is that the
operator is the root and the children are the parameters. This can be seen in 2.2 on the
preceding page after the step named “PARSER”.

3. Semantic Analysis

The semantic analyzer checks if the syntax tree has the correct semantic form and
might perform some optimizations. This means that the input is correct and can be
understood by further steps (semantic rules). Some compilers also do type checking and
other changes to the tree like type conversions. The output is then called an Abstract
syntax tree (AST).

4. Code Generation

This is the final step were the intermediate representation is then converted into actual
output code. If this output is some form of Assembly language the registers are allocated
and the output is generated. This step can vary from implementation to implementation:
some compilers split it into three sub areas Intermediate Code Generation, Code Opti-
mization and Code Generation, whereas other compilers optimize based on the syntax
tree.

In this project an Abstract Syntax Tree interpreter will be used. The output of the Lexical
analysis and the Syntax analysis will be optimized through a tree walking algorithm [Appel,
2002] and the optimized tree will be saved in a file. A virtual machine (VM) then loads the
file and executes every statement [Shi et al., 2008]. This approach was chosen because of the
highly distributed nature of the environment in which the code will execute. Thus no usable
binary file could have been created [Rowledge, 2001].

14
2.5. SIMILAR TECHNOLOGY

2.5 Similar Technology

2.5.1 SOAP

SOAP stands for the Simple Object Access Protocol and was initially based on HTTP and
developed at Microsoft
R with the two main targets of “providing a standard object invocation

protocol built on internet standards, using HTTP as the transport and XML for data encod-
ing. And creating an extensible protocol and payload format that can evolve.” [Scribner and
Stiver, 2000]. In summary, the main purpose is to provide a structured packaging protocol for
messages that have to be shared between applications [Snell et al., 2001]. It defines a set of
rules by which data can be encapsulated in XML and transferred over a network. It has a fault
reporting mechanism and routing protocol built in. By using XML as an envelope for all the
data, SOAP is operating system and programming language independent, which is of great
value in the heterogeneous environment the internet is at the present date. For completeness,
in this paper, it has to be stated that SOAP can be used for two main applications; it is used
for RPC (Remote Procedure Call) and for EDI (Electronic Document Interchange), however
only the former usage will be discussed. SOAP messages have to obey very strict formatting
rules to enable the understanding of type, encoding and procedure of the information (see
Code 1 on the next page). In order to make the example easier to understand the header has
been left out. By convention, every SOAP message should have a header but is not required
to. In the header information for the processing of the message is stored, this includes key-
words like “mustUnderstand” which tells the parser that all content of the message has to be
fully understood or “transactionID” which can be used to keep track of multiple transactions.
There are too many keywords to explain, a good source is the SOAP specification that can be
found under “http://www.w3.org/TR/2007/REC-soap12-part0-20070427/”.
SOAP does not include processing instructions, memory management features, pipelining,
objects by reference or remote object invocation [Scribner and Stiver, 2000]. It is further often
criticized for using Port 80 which is normally reserved for HTTP servers. However, it may be
argued that this use could be a valid choice [Mueller, 2001]. By using plain text and XML, it
is quite bloated in comparison with some binary formats and parsing is slow.
A discussion on SOAP as a protocol for this project can be found under section 4.8 on
page 28

2.5.2 CORBA

In this section, the discussion will centre around CORBA (Common Object Request Broker Ar-
chitecture), this is only done as an example for an object orientated middleware. Microsoft’s c

15
CHAPTER 2. LITERATURE REVIEW

Code 1 Standard SOAP message call

1 POST /StockQuote HTTP/1.1


2 Host: www.stockquoteserver.com
3 Content-Type: text/xml; charset="utf-8"
4 Content-Length: nnnn
5 SOAPAction: "Some-URI"
6
7 <SOAP-ENV:Envelope
8 xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
9 SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
10 <SOAP-ENV:Body>
11 <m:GetLastTradePrice xmlns:m="Some-URI">
12 <symbol>DIS</symbol>
13 </m:GetLastTradePrice>
14 </SOAP-ENV:Body>
15 </SOAP-ENV:Envelope>

COM and Sun’s Java RMI would have been exactly as suitable but the author is more knowl-
edgeable about CORBA.
CORBA was released in 1991 by the OMG (Object Management Group) and should enable
many software components written in an array of different languages to be able to interchange
data with each other [Henning, 2008]. This is done by using an IDL (interface definition
language) to define externally visible interfaces and the mapping to the underlying source
code. It is often seen to replace or extend RPCs (see RFC707). Each application initialises
an Object Request Broker (ORB) which then takes care of the communication details like
reference resolution, access policies and etc. CORBA uses a method of stubs and skeleton
code to emulate objects towards the source code on the client or server and then handles
the calls to these. It is a defined goal to hide the distribution as far as possible from the
programmer [Emmerich, 2000]. Every object has a unique reference and is statically typed
as defined by the IDL. CORBA has many benefits, it is fairly language independent and all
big languages have bindings even when nearly none implement the full specification. Error
handling is implemented in the form of 25 system exceptions. Because the broker compiles
the code in the source language CORBA can be used on all operating systems on which the
specific language can run. It tries to be high level and masks as much distribution from the
programmer as possible. Since it is an open standard many companies have adopted and use
it, most noticeably the GNOME project has used it for inter-process-communication [Orfali
et al., 1995].
When discussing CORBA, some problems often mentioned are that distributed objects

16
2.5. SIMILAR TECHNOLOGY

are handled differently to local instances. Because there were so many companies involved in
creating the standard, many different biases have created an ambiguous description [Emmerich,
2000], this led to incomplete and sometimes error prone implementations. Based on this, the
documentation is sometimes quite confusing and writing CORBA enabled applications can
be very tedious as the author experienced multiple times. SOAP is often criticized for using
port 80 whereas CORBA is criticized for not using it, as a lot of requests get filtered out
by firewalls. This is still a hot topic, and in the authors opinion is unlikely to be resolved
ever [de Jong , and others].

2.5.3 dSelf

dSelf is a object orientated programming language proposed by Kai Knubben in his “Diplo-
marbeit” in December 2000 [Tolksdorf and Knubben, 2002]. dSelf is the distributed variant
of the language self which is a classless language (delegation based or prototype based lan-
guages) that was developed at Stanford University and Sun in the 1980s. The main difference
to a traditional object oriented language like Smalltalk is that instead of classes and their
instantiated objects, everything is a self prototype object that consists out of specific “slots”,
to create a new instance once a base object is cloned. A slot is a pointer to a data or method
object which can be added and removed dynamically. Through this no class hierarchy is
produced and the self objects enable a flexibility that could not be achieved with class based
syntax. An interesting fact about self is that the programming is done graphically; this has
been continued in dSelf (see http://www.smalltalk.org.br/movies/).

“Distributed” in dSelf means that slots can point to objects that are located in another
dself virtual machine that is connected to a network (see Verteilte Implementierung der ob-
jektorientirten Programmiersprache SELF), connections are always to virtual machines so all
objects contained in thus become accessible. This enables distributed inheritance and dis-
tributed instantiation. Accessing a remote slot is in no way different to accessing a local
one. Primitive objects like string and integer will be copied to the host VM whereas com-
plex objects will be referenced by a pointer, this is done for speed increase but also causes
problems with race conditions and updates not propagating properly. It is worth mentioning
that dSelf is not 100 % compatible with self based on some syntax extensions that needed
to be made (http://www.ag-nbi.de/research/dself/dSelf-Diplomarbeit.ps.gz). Unfortunately,
self and as such dSelf are not maintained anymore and dSelf has never left the prototype stage.
Class based languages have never made it “mainstream” and therefore both projects can be
considered as “completed research”.

17
CHAPTER 2. LITERATURE REVIEW

2.6 Solution Discussion


The idea behind cloud computation, i.e. not caring about locality and horizontal scalability,
enables to see the topic of distributed objects and computation under a different angle. Now
the location and the execution times can be neglected as they can be offloaded “into the
cloud”. Software as a Service is further gaining more and more importance by the surge
in internet and network speeds. It is well understood that the current bottleneck is the
last so called “copper mile” [Li et al., 2005], through calculating everything online and only
transferring the actual needed output to the end-user this can be utilised more efficiently, as
the main “cloud servers” are connected to the internet backbone which enables greater transfer
speeds. It also enables a service to offer numerous output filters, while keeping the underlying
calculation the same, which can be understood by an array of devices. The paradigm of object
oriented programming has become the probably most used technique to express procedures
(see http://www.tiobe.com/index.php/content/paperinfo/tpci) and is well understood and
researched because objects are the smallest confined entity in a language it is desirable to
distribute on this level of abstraction [Ostrowski et al., 2008].
These factors enable a complete new discussion of object based distribution, which this
paper is trying to start.

18
“Engineers are all basically
high-functioning autistics who
have no idea how normal people
do stuff.”
Cory Doctorow

Requirements
3
It would be possible to name numerous requirements for this project, but as it is a current
research topic and not intended for production, only very high-level requirements are listed and
discussed. The ones chosen are the most commonly used terms with distributed computing
and distributed object [Emmerich, 2000].

• Scalability

Scalability has been a hot topic for a long time as serving all the data from one computer
has not been possible for many years. Especially as one of the cloud computing corner
stones is horizontal scalability, this is one major requirement for the project. However,
this scalability should be hidden from the programmer’s side.

A further idea is that as an object provider notices that a service is nearing its capacity
he should be able to easily add new hosts.

• Openness

Having an open interface definition is vital for the success of the project. It is important
that anyone can see the definitions and can understand the workings. This also affects
the protocol that is used to communicate which has to be well documented and easily
implementable.

• Heterogeneity

The project should enable many systems with different setups to communicate with each
other in a standardised way. The underlying architecture or implementation should not
affect the higher levels like the object interface.

A further idea is that the language created should be able to embed and parse other
source code formats, like Python for example.

19
CHAPTER 3. REQUIREMENTS

• Resource Sharing

It should be possible for one object to be used by many clients (other objects). This
incorporates the requirement of scalability and enables sharing of data in an easy efficient
way

• Fault-Tolerance

One major requirement is the graceful handling of faults and errors. This is especially
important as the project is so distributed and the transport medium is not reliable.

Obvious requirements have been intentionally left out like security, usability and similar
as these are not discussed in this paper. For a complete discussion, this would be required,
but it is not possible to achieve in the time frame given.

20
“Computer language design is
just like a stroll in the park.
Jurassic Park, that is. ”
Larry Wall

Design of System
4
This chapter will show a high-level view of the design decisions taken, discuss the grammar
and known problems. As development name for the project objic (objects in the cloud) was
chosen. This will be used throughout the paper. A rough overall design can be seen in
Figure 4.1, in respect to the byte-code.

Source-Code Compiler Optimizer


Byte-Code
Object Server VM Interpreter

Figure 4.1: The environment in relation to the byte-code

4.1 Discussion on Antlr


As described in methods (see section 1.3.2 on page 3) Antlr was chosen as a parser generator.
Other programs were evaluated such as Coco/R, Yacc and the Python internal parser “spark”,
which was used in the first prototype. While all powerful in their own domain Antlr had some
major advantages. First of all, through the completeness of the development stack a rapid
prototyping approach was possible. The toolset especially AntlrWorks, which is a GUI for
writing grammars, integrated very nicely (see Fig. 4.2 on the next page).
Another main point was the huge adoption in industry and Universities which secures
future development and complete references and manuals [Parr, 2007]. The possibility to
generate numerous output languages, enables that many implementations of the objic compiler
can be created, which will become helpful in further development. One of the main points

21
CHAPTER 4. DESIGN OF SYSTEM

Figure 4.2: The antlrWorks editor

though, was the ability to create and modify an AST in the parsing stage in a standard and
predefined way. This stops the usage of the error prone and inflexible in-lining technique.

4.2 Grammar Description


The syntax for the language designed should be easily understandable for anyone with previous
knowledge of programming. The grammar was influenced by C, Python and Java which the
author considers to be the most commonly known languages. The first draft of the language
was presented in the paper “Programming the Cloud” (see Appendix J on page 83) and was
then extended and modified to meet the project requirements. A full description of the
grammar can be found in the Appendix under K on page 87 and the listing of the Antlr file
which was used to create the Tokenizer and Parser can be found under Appendix C on page 71
This is a summary of the most important syntactical rules, in Antlr syntax:

Construct Description

block ⇒“{” <stat>* “}” Everything between “{” and “}” is defined
as a block and has its own stack frame
class <expr> <block> Defines a new class
<stat> Everything that can be contained in a
block, a complete list can be found in Ap-
pendix K.0.8 on page 91

22
4.3. STANDARD VARIABLES

return “(” <expr>“)” Returns from a block, this can be a


method or a program
def <expr><block> Defines a new method in a class
call ⇒<expr>“.” <expr><parameters> Calls a method with the given parameters
print “(” <expr>“)” Prints the expression given
while “(” <boolean> “)” <block> Executes the block while the boolean is
true
break Exits from a recursion
if “(” <boolean> “)” <block> (else If the boolean expression is true the first
<block>)? block is executed otherwise the optional
else
<expr>“=” new <expr><parameters> Creates a new object and assigns a vari-
(@ server)? able name to it, with an optional server
parameter
<expr>“=” <expr> Assigns one variable to another
<expr>“=” <call> Creates a new method objects and assigns
it to the variable name
for “(” <newvar>; <boolean>; <call> “)” Creates a for loop like known from C
<block>

4.3 Standard Variables

There are two variables defined when an object is created, this is done before the constructor
of the object is called.

4.3.1 ME keyword

The “ME” variable is a pointer to the object itself, similar to “this” or “self” in Java or Python.
This is used when calling methods that reside in the object.
As soon as the VM loads in the byte-code for a new object it also creates a unique name
and path. Then it loads the base object constructor (see section 5.3 on page 36) which then
creates the “ME” variable on the lowest level of the stack, this could be called the class level.
Through this ME is always a valid pointer in the object block which resides one stack level
higher.

23
CHAPTER 4. DESIGN OF SYSTEM

Code 2 ME usage example

1 class PrintMe{
2 def doSomething{
3 print("Doing something")
4 }
5 ME.doSomething()
6 }

Figure 4.3: Diagram showing the relationship between object and class stack frames

In Figure 4.3 it can be clearly seen that ME points to the object itself and one stack level
above the variable “A” points to some other object in the cloud.

4.3.2 ARGS Keyword

The second variable that is defined by the VM is the “ARGS” pointer. This holds the value of
the parameters passed in to a class or method. As a class does not have a specific constructor,
it executes everything that is not in a method at creation (see section 4.4 on the facing page),
it is passed parameters too as shown in example code 4 on the next page. When a method
is then invoked in this class with parameters, a new pointer “ARGS” is created in a higher
stack frame so it will be returned first. This enables every method to have its own “private”
parameters while maintaining the global class parameters. Further as the variable is initialised
on the stack level of the method when the method exits execution and pops the stack, the
pointer is lost, which enables efficient memory management. Even when there is no parameter
specified like on line 2 in example 4 on the facing page where “value” is called, this empty
space will be filled with a default “VOID” token, to specify that an empty parameter list was
created. This is the default behaviour for all methods and classes.

24
4.4. OBJIC CLASS STRUCTURE

4.4 objic Class Structure


When a class is initialised all statements that are not encapsulated in a method definition are
executed. This bears the risk of someone creating a new instance of the class itself somewhere
in the code, which will result in an endless loop (see code 3), but the gain in flexibility and
the compliance to Java and Python syntax style outweigh the disadvantage.

Code 3 endless loop example

1 class LoopMe{
2 a = new LoopMe()
3 }

Further all methods that are defined in the class are registered with the VM so calls are
possible. The methods are only registered but not executed, it is also not possible to nest
method definitions in each other.
At creation the class also creates the ARGS variable with a pointer to the passed in
parameters, this is discussed in section 4.3.2 on the facing page.

Code 4 class ARGS example

1 class PrintArgs{
2 print(ARGS.value())
3 }

4.5 Object Location Specification


There are three different ways of specifying where an object should be created. In the code,
where through a specific syntax the interpreter is told the location, in a configuration file
which is valid for all objects of that type and the default value which will normally be the
local server. The ordering is cascading, so specifying an object in the code has the highest
precedence over the configuration file and the default. This is done to enable greater flexibility
and with maintainability in mind. The ideal case would be to have all objects specified in the
configuration file but this might not always be possible. Sometimes it might also be required
to have different objects of the same type reside on different hosts.
In the following the object location specifiers are described in more detail:

• In the code

25
CHAPTER 4. DESIGN OF SYSTEM

This is the least dynamic way of specifying the server. Here the object location is
appended to the “new” construct by adding an “at” symbol followed by the server. This
will only create the prepending object on the server. This is a good choice if it is exactly
known on what server the object should reside on and this is not likely to change. Once
the source code is compiled this cannot be modified anymore.

Code 5 Object location in Code

1 a = new Int() @ someserver.org

Future features might include the possibility to specify multiple servers than can be run
as backup mirrors. See discussion on this in section 8.1 on page 57.

• In the server configuration file

The object server has a “ObjLocFile” file which specifies the object locations globally
for all VMs running in that server. The syntax is very similar to the definition in the
code. This is very useful as changing one line will influence all objects, especially when
deploying the program written.

Code 6 ObjLocFile

1 Int@localhost
2 String@someserver.net

The first value is the type of object for which this should be valid followed by an at and
a server name. More than one object type can be assigned to different servers, as can
be seen in example 6 where the object type Int is assigned to localhost and String to
someserver.net.

Future features might include the possibility to specify the object locations with the
scope of a class.

• Default

If none of the above rules are specified the object server will assume locally “localhost” for
the specified object in its class-path. This can be seen as the default behaviour. This is
quite useful as user-classes will be located in the class-path and by this the user does not
have to specify his own computer. Further by having no path hard-coded but relevant
to the running server deployment of a bunch of classes can be achieved very easily, even
if distributing the objects would only involve adding one line to the ObjLocFile.

26
4.6. OBJECTSERVER

In the implementation phase this will be managed by one central object location module
that will read and parse the ObjLocFile and create objects in appropriate places.

4.6 ObjectServer
The object server is the main runtime environment. A very high level view can be found in
Fig. 4.4.
The object server waits on a specific port for connection requests. There are two main
requests it can receive: “CREATE” which creates a new instance of the requested object and
returns the URL to the VM or “CONNECT” in which case the object server connects the
socket to the existing VM for that object.

Figure 4.4: A high level diagram of the server

4.7 Security
As the server handles all the requests security is a major issue. It is important that every
process is decoupled from all other processes as otherwise it could be possible that one process
could read the data of other users or even corrupt it. Therefore the object server should only
handle a minimal amount of communication, as this could be exploited. There is no port
mapping involved, like in RPC, so there has to be a server listening to a specific port. The
solution is to have a separate VM for every connection and have one Object reside in this VM.
If this object wants to connect to another object on this server it has to connect through the

27
CHAPTER 4. DESIGN OF SYSTEM

main server port like all other objects in the cloud. So as soon as there is a connect to the
server, a new network thread is started that only handles this one connection and implements
the main functions, noticeably:

• CREATE This creates a new object in its own Virtual machine which has a connection
handler.

• CONNECT This creates a new VM instance but points it to an already existing ob-
ject that was created with CREATE. Based on this mechanism it should be near to
impossible to access any other object that resides on the server.

4.8 Object Communication

Note: All work was done on TCP/IP. UPD was not examined as reliability and ordering are
vital. In the future, UDP connections could be incorporated for streaming binary data from
objects.

A first attempt was made to extend SOAP to incorporate the object management features
needed to implement the language, like Object instantiation and memory management. After
extensive testing, this approach resulted in too much parsing overhead. The time required to
generate a message that would be understood by the other side dynamically did not match
the performance criteria needed. Further parsing a message was very tedious and required
extensive amount of memory and CPU cycles. To be able to parse a simple SOAP method
(see Appendix I on page 82) invocation request, it requires 2334 system calls and about 0.133
seconds to execute. This already involves heavy optimizations including pre-caching and not
trying to read and understand the whole message. Extending this would have increased these
problems and would have resulted in a slow and bloated system. More research is needed in this
area but initial testing shows that SOAP does not perform adequately for the requirements of
distributing objects throughout a big network, in a time critical environment, which program
execution is.

28
4.8. OBJECT COMMUNICATION

Figure 4.5: Comparison between SOAP and objic protocol

Figure 4.6: Parse time comparison

Instead the decision was made to design a new protocol (see Appendix N on page 99).
Initial benchmarking showed that parsing HTTP (Hypertext Transfer Protocol) is very quick
and more memory efficient. HTTP is widely used throughout the web for serving websites and
has shown to be very reliable. This led to the extension of HTTP for object communication.
As the syntax is very simple, linear parsing time can be achieved with near to no overhead,
unlike SOAP. To further increase speed and memory utilization HTTP persistent connection
was used which was introduced in HTTP/1.1 and formalises a keep-alive mechanism. As an
object will normally communicate with a set of other objects by calling methods, lag can be
reduced in not having to connect to these objects repeatedly. This also enabled the design to
incorporate a “ConnectionObject” in the objic VM which keeps a session to the other object
alive. To explain the extension a discussion of the CONNECT request will take place. To
be able to communicate with an object the VM first has to connect to the object. This will

29
CHAPTER 4. DESIGN OF SYSTEM

reside somewhere specified through the URL, which is in the form of


1 Host/hash
2 =>
3 Bigi.home/osdfo7w3r46yoewhfdjpf9384y6rfh

The first part incorporates the host on which the object server resides, that holds the object.
The “/” notation is borrowed from HTTP. After the slash a unique hash to that server
is specified. This is a globally unique pointer to this specific object. Sending a connect
request to a server will map the initialised connection to the requested object or fail with the
appropriate HTTP error code. If for example no object can be found, specified through the
hash a “404 Not Found ” Error code will be returned. Through this approach it is very easy
to write object server clients. As all the commands and data are in clear text it is possible
to connect to the server via telnet or similar and invoke methods. Further existing HTTP
libraries can be extended to be used with objic. There is a current effort in completing and
standardising this format.

4.9 Object State


As thread safety is one big issue in distribution, the decision was made to have all base objects
as immutable objects. This means that once initialised the internal state cannot be changed.
It is possible for an object to return a pointer to a new object and change the referring pointer
if needed; this is done with the NEWPTR request. The Int object for example, if the “add”
method is invoked will return a new pointer to an object containing the new correct value.
Further optimizations can be made as the value always stays the same it makes no difference
if there is only one instance of the value or numerous. The Int object with the value “1” will
only have to exist once in the system as it will always contain one.

4.10 Garbage Collection


Garbage collection is very hard to perform distributed [Plainfossé and Shapiro, 1995] and there
is only few literature on this topic. For this project a counting based technique was chosen.
The VM holds a reference to all the connections currently associated with all objects. The
counter is updated dynamically based on an object requesting the connection to be closed or
connection failure. If the connection count equals zero the object is moved to a temporary
pool called the “old-pool”. Here all objects that have no current references are collected. If
a connection to this object is requested it can be reborn back into the “active-pool”. When
moving to the old-pool a timestamp is associated with this object, which can then be used
to delete the object after a certain time. This reduces search times for active objects as the

30
4.11. KNOWN PROBLEMS

“active-pool” is searched first for a hash. Further unused objects can be saved in the swap
space of the server, so they do not block up memory. This approach also has some drawbacks,
as circular pointing garbage will not be collected. Objects might also live longer, as they need
to timeout.

4.11 Known Problems


In this section problems that were noticed while designing the system are listed. Some attempts
have been made in solving them but based on the tight time schedule they could not be
implemented.

4.11.1 Object Version Problem

As the object behaviour can change on the server there must be some way of maintaining a
version of object for which the program was written. This is not an issue if the interface and
the return values do not change, so refactoring of the code has taken place, but if the external
view of the object changes it can be assumed that some programs will fail, as they depend
on some certain conditions to hold. Solving this is not as easy as it might seem. The first
major issue is how to handle this case should there only be a warning displayed or should it be
possible for the server to invoke old, maybe error prone objects. The general approach should
be that programs will always continue running and should not be affected by updates to the
object.

4.11.2 Inheritance

Inheritance is one of the most popular concepts of object orientation. The general under-
standing is that a derived object inherits functionality and data from a base class. It is not
clear if there will be enough time to implement this functionality as it will have to be dis-
tributed meaning that an object can inherit functionality from an object residing on a server
somewhere else.

4.11.3 Local / Public variables, methods, classes

Locality of data, classes and methods is something every language defines differently and there
are many ways of thought. Whereas Java has clear rules with public and private, Python does
not implement locality. Especially in a distributed environment this is very important to have
clear guidance, as publishing data to be global could be a security risk. Further, companies
might be worried that personal data could be leaked out. In objic every object and method is

31
CHAPTER 4. DESIGN OF SYSTEM

public, security is gained through the hashed name. Variables are always local to the object
and can never be accessed from outside the object’s scope. This enables the state of the
object to always be in a consistent and verified state, although this requires more “getters”
and “setters” that might have a slight performance hit.

32
“There is no programming lan-
guage, no matter how struc-
tured, that will prevent pro-
grammers from making bad
programs. ”
Larry Flon

Implementation
5
In this chapter the three main parts implemented and the major features of the language will
be shown.

5.1 Discussion of Programs

5.1.1 objectServer

ObjectServer is the program that will instantiate all the classes, meaning that it will load
an objic byte-code and execute it. It can also act as a client to other servers that “live” in
the cloud. This enables the object instantiated to communicate with other objects on other
servers and as such create a distributed environment, on the object level. It further provides
means for memory management and bookkeeping of object states. Also the definition of the
base types, like String and Int are built in, so by default they are the same on all servers,
which might change in future versions.

5.1.2 objicc

Objicc is the objic compiler, which simply means that it takes in source-code and generates
byte-code. It also checks if the code is semantically correct and does some simple optimizations.
If an error in the code is found it will try to generate a useful error message including line
number and reason the compilation stopped. This is done using many functions from the
Antlr libraries and Pythons cPickle library.
It is possible to run the compiler as a pre-compiler for the object server and thus making it
a source code interpreter, which would make runtime errors more understandable and dynamic
changes to the code during runtime possible. This was done in the first iterations but was
changed in favour of execution speed after seeing that one translation into an AST took about
0.5 seconds for a small file which is not acceptable on repeat object instantiations.

33
CHAPTER 5. IMPLEMENTATION

5.1.3 orun

It is important to understand that everything is executed in the object server environment. No


instance of the interpreter is initialised outside of a server. While being possible for debugging
purposes this should never be done. This behaviour can be demonstrated by the orun.py
script which is used to run the binary files. This script performs three main tasks

• Create object of requested type

It connects to the object server. This could be somewhere in the cloud, through this it
is possible to use a main program that is somewhere distributed and still retrieve the
output.

• Connects to the created object

To be able to communicate with the object a connection has to be established.

• Call the main method with parameters

In principal every object has a main execution method which has to be called. It is
possible to pass parameters and no restrictions are made on the naming of this method.
Further this method can return and output to the user’s shell.

This demonstrates that the script running on the client can consist of four lines of Python,
which is intentional. The main reason being that this can be implemented in any language
so different output devices can be easily created. For testing purposes a little Java Script
application was created which demonstrated that the object could output to a browser window.
Further all the processing can be offloaded onto an object server running in the cloud or can
be done on a local server. By this a very dynamic execution environment is created.

5.2 Object Server Class Structure


The class structure of the object server is quite simple (see Fig. 5.1 on page 36). The initialize
class starts a server “start_server” listing on a specific port. In this development cycle this is
port 8080 but can be dynamically changed, a further discussion on this is needed in the future,
if port 80 would be valid [Somogyi and Schneier, 2001]. Every time a connection is received
a new “RequestHandler” thread is used. This is a “proper” operating system thread and this
will have its own stack and therefore enhance security, creation might take a little longer but
as a state full connection model is used this thread will be running for an appropriate amount
of time. When setting up the request handler the “setup” method is called which does general
setup for the connection and error checking. When this is completed, the “handle” method

34
5.2. OBJECT SERVER CLASS STRUCTURE

is invoked. This starts an endless loop waiting for data and then processing it accordingly.
This could be the “CREATE” keyword or similar which would then create a new instance of
a VM by creating a new object oVm. The constructor of the oVm takes in the type of object
to load into itself and, if provided, parameters (see section 5.6 on page 43). The VM then
tries to load the object into its memory. This can be done in two ways. If the object is a
base class like Int or String an instance of this class is created. As the classes are written
in Python no interpretation of byte-code has to take place. The other possibility is that the
class requested is a user class, meaning that the byte-code has to be loaded and interpreted.
This is done through the Interpreter class. When creating an interpreter object it is pointed
at a binary class file which it will start parsing. Once the object server and VM have finished
loading everything up the constructor of the class is called. This setups the “ME” variable
which points to the class itself, adds the methods to the method lookup table and initialises
the object stack. Every Interpreter instance has an object manager class which takes care of
creating new objects. All the data of the running object is saved in the instance of the oClass,
this is what makes the difference between a class and an object. Through serialising the data
in the oClass instance it is possible to create an exact replica of the current state of the object
somewhere else.

35
CHAPTER 5. IMPLEMENTATION

Figure 5.1: the simplified class structure of the object server

5.3 Class descriptions

5.3.1 Initializer

The initializer class (see Fig. 5.1), on creation, generates a threaded TCP Server that listens
on port 8080. When the method start_server is invoked the server starts an infinite while loop
that creates as many handle threads as requests to this port are made. This is programmed
by using the Python socket, SocketServer and threading modules. Such a server has to run
on every system in the cloud to enable distribution. Further the class provides a “getServer”
method that is meant for debugging purposes.

36
5.3. CLASS DESCRIPTIONS

5.3.2 Interpreter

The interpreter class is the class that is responsible for parsing the byte-code and executing
meaningful functions based on this. There are many helper methods but the main one is the
“parseBlock” method. It is invoked with the root node of a branch of the AST and will execute
it. Every time the “parseBlock” method is called the stack frame is raised, so a block in an “if”
for example has its own stack frame that will be popped once the execution is over. Through
calling the method recursively nesting of statements is enabled by design.

5.3.3 MethObj

When creating a MethObj an object hash, a method and parameters are specified. MethObj
is a wrapper object for method calls. When the MethObj is used the value is automatically
retrieved. This is to emulate the behaviour of method objects in Python which enables a lisp
like programming. Further values can be associated dynamically which reduces network load
and enables flexibility. It inherits some functionality from the ObjConnection class.

Figure 5.2: The class diagram for the methobj class

5.3.4 ObjConnection

The ObjConnection class handles all the low level networking for connection to an object.
When created an object hash is passed in to which the object associates itself. Then messages
can be sent (“sendMsg”) and answers retrieved. This is done using the Python socket module.
Further it provides a wrapper for calling methods.

37
CHAPTER 5. IMPLEMENTATION

Figure 5.3: The class diagram for the ObjConnection class

5.3.5 ObjManager

The ObjManager class has two main functions, creating an object on a server and connection
to it. When initialised it parses the “ObjLocFile” which is described in code 6 on page 26
and then knows where to create objects, if not specified somehow else. Through providing
wrappers for these complicated tasks it simplifies the management of objects. It further
returns ObjConnection objects so a further layer of abstraction is added.

Figure 5.4: The class diagram for the ObjManager class

5.3.6 oClass

The oClass holds the data which makes a class an object. This includes all the variables
and pointers to the method an objects posses. For the variables a stack based approach was
chosen. This means that every time when a new block is parsed the stack level is raised,
when the block is finished with execution all the variables “drop off” the stack and the objects
pointed to can be garbage collected. It also offers methods to cleanly remove all variables on
the stack meaning that it will close all the connections properly. This class can be serialised
to transfer the object state from machine to machine. This feature is accounted for, but not
implemented.

38
5.3. CLASS DESCRIPTIONS

Figure 5.5: The class diagram for the oClass class

5.3.7 oObject

oObject is the base class for all VM objects. Through this similar behaviour can be collected
in one place and VM internal type checking is possible which again enhances error checking.
This is a common method in many languages like Java and Python. Further it defines methods
that all objects that want to be objects have to overwrite like the “frep” method which stands
for representation and returns a nicely formatted string of the object. It further has its own
instance of an object manager that all extending object inherit so creating new objects is
possible in all objects. As all objects exposed to the language are immutable calling min on
an Int will create a new object and return a pointer to this (see Appendix K.0.6 on page 89).

Figure 5.6: The class diagram for the oObject class

39
CHAPTER 5. IMPLEMENTATION

oObject

oInt oString oClass

Figure 5.7: The oObject base class

5.3.8 oInt

This is the internal representation of an integer number. This is a base type meaning that it is
implemented in Python and not in objic. It is possible to pass in parameters at creation which
should look like an integer (see section 5.7 on page 43) which will then be set as the value.
In the VM it is defined as a standard that all functions that are callable from the language
begin with a “f” and then the actual name, this means that “fmin” for example will be called
min in objic. All objects extend from oObject and such inherit its behaviour and methods.

Figure 5.8: The class diagram for the oInt class

5.3.9 oString

oString implements a String in objic. There are no limitations on length or complexity. Like
oInt it extends from oObject and inherits most of its behaviour.

40
5.3. CLASS DESCRIPTIONS

Figure 5.9: The class diagram for the oString class

5.3.10 oVm

As discussed above this is the actual virtual machine that encapsulates the object and defines
the interface to the object server. At creation an object type has to be specified which the
VM will then instantiate. If it is a base type it will create a new o{String,Int} object and if
it is a user class, it will start an byte-code-interpreter. Further a unique name is generated
at creation which consists out of numbers, lowercase and uppercase letters. No other symbols
are allowed in names as they may conflict with underlying operating systems. “\” for example
has a special meaning in file paths.

Figure 5.10: The class diagram for the oVm class

5.3.11 RequestHandler

The request handler is the thread that is created when a new request is received by the object
server. It is an operating system level thread and deals with the server side networking issues.
It mediates between the VM and the networking layer and decides what type of connection
it will become. This is defined by the request sent in the first package. If this is “CREATE”
it will assist in creating a new object, if this is “CONNECT” it will keep the connection alive
and delegate all responsibilities to the VM. It further takes care that the connection is closed
gracefully if so requested. This can be seen as the protocol implementation of the server.

41
CHAPTER 5. IMPLEMENTATION

Figure 5.11: The class diagram for the RequestHandler class

5.4 Logging
Through the whole application there is a logging facility. This is done through the log class,
which offers an array of different methods to log events. In the header of every class the class
name is told to the logging facility. Further a global loglevel is defined in the globalConf file.
This is best visualized by a source code example:

1 log.debug("This is a debug message")


2 log.error("This is an error message")

When the loglevel is set to DEBUG this will be generated

1 DEBUG:LogTestClass:This is a debug message


2 ERROR:LogTestClass:This is an error message

This is because the log levels build upon each other. The output consists of three main
parts, the first is the error level, the second is the class, which is very useful when debugging
big projects and the last is the message supplied. The main log levels are: FATAL, CRITICAL,
ERROR, WARN, INFO and DEBUG in ordering of severity. Through this approach it was
possible to create different kinds of output depending on the loglevel. While developing
DEBUG was enabled whereas in production only ERROR or worse are output.

5.5 Error Management


In program execution errors or faults are bound to happen on the user side. It should be
the aim of every interpreter / compiler to handle these faults as gracefully as possible. In
objic there are 2 layers that handle faults. In the first the compiler checks for syntactical
errors, whereas in runtime the Interpreter handles errors. When an error is found the program
execution will stop and the program will output a message what has happened (see section 5.4).
It is always the aim to specify as much information as possible.

42
5.6. VOID

5.6 VOID

The “VOID” keyword is quite important. As in networking sending nothing over a line will
keep the other side waiting for data and such lock, some standard term had to be found
to indicate that there was no data to be sent. The VOID keyword was chosen to represent
nothing. This can be seen when a method is called without parameters “print()”, for example
will be translated into “print(VOID)” to indicate that nothing was specified in the brackets.
This is a standard throughout the project, a method always returns something if nothing is
specified in the code it will be the keyword VOID.

5.7 Duck-Typing

In duck typing the type of the object is determined by its properties and is best described
by the quote “If it walks like a duck and quacks like a duck, I would call it a duck.” This
means that instead of defining the type of an object right from the beginning, like Java does
it, the type stays unknown until it is needed. Through this the code can be written in a far
more dynamic way and especially in distributed system the type of the object might not be
known. However it also has disadvantages of which type checking at compile time is one of the
biggest. Unfortunately because of the distributed nature of the objic language duck-typing is
the only way the code could be implemented as it can never be guaranteed that an Int object
on a remote server has the same properties as the local implementation.

5.8 Abstract Syntax Tree

The compiler generates an abstract syntax tree which is serialised in a byte-code form and can
then be loaded into the interpreter when needed. By this approach valuable data about the
structure of the program is preserved which enables the interpreter to understand more about
the programming of the modules. Simple optimizations are made on the AST at compile time
but nothing that would destroy the integrity of the tree (see Appendix K.0.5 on page 89 and
Appendix H on page 81). All optimization can be done at interpretation time. The AST is
saved in a binary form that can be loaded very quickly and is endian independent so multi
architecture run environments are possible and file size is smaller which enables fast transfer
rates if the class code needs to be distributed.
A simple hello world program as in code 7 on the next page will be translated into the
AST displayed in Fig. 5.12 on the following page

43
CHAPTER 5. IMPLEMENTATION

Code 7 A simple “Hello World” program

1 class SimpleHello{
2 print("Hello World")
3 }

1 (CLASS SimpleHello (BLOCK (PRINTSTM (OSTRING "Hello World"))))

CLASS

SimpleHello BLOCK

PRINTSTM

OSTRING

Hello World

Figure 5.12: Representation of a simple AST

5.9 Keywords
Because of the restrictions of the parsing method chosen it is not possible to name methods
like keywords. This can be seen for example that the syntax in code 8 on the next page is
not valid. As print is a built-in command this does not work. When parsing this syntax the
print is recognized and it is assumed that a parameter string will follow. Instead a block is
seen and this produces error-nodes that cannot be processed.
Following tokens are keywords and cannot be used as variable, method or class names
“class, >=, ==, new, >, ;, return, =, print, @, for, ., ), }, else, break, {, def, <=, ! =, <, if,
(, while”.

44
5.10. IMPLEMENTATION PROBLEMS

Code 8 Invalid method declaration


1 def print{
2 print ("Hello world")
3 }

5.10 Implementation Problems


• Language choice

By choosing Python as a language to implement the runtime environment some restric-


tions propagated into objic. As Python is garbage collected, the memory footprint of the
server could not be optimized as wanted. Further runtime is affected as objic can never
run quicker as the hosting language. This can be clearly seen when trying to calculate
a big Fibonacci number. Python cannot do this, as it runs out of stack frames, because
of this objic is also unable to calculate this number.

• Python networking support

Some problems were encountered when using the Python socket library that offers a
higher level of abstraction of the hardware. While this enables portability it also creates
some problems, as it is not possible to configure the networking socket in the exact way
as needed. This can be seen that when the server crashed that the port is still reserved
until it is freed by the operating system. Some improvement in this area is seen when
Python 3 is used.

• Using AST

An AST can be quite big in size and contain many blocks. As the implementation parses
blocks this has to be held in memory. Further many iterations over the tree are needed
of which some could be done at compile time, if a register based byte-code would have
been chosen.

• pylint

Some code and comments were added to conform with the rating system that pylint
implements. In most cases this is not a problem but in some special cases highly complex
code had to be expanded into more statements as otherwise warnings were raised. This
also includes some lines of documentation code that had to be added, while not being a
problem these would have normally not been added.

45
CHAPTER 5. IMPLEMENTATION

5.11 Finished Artifact

5.11.1 Documentation

Most of the documentation is included in the programs by calling the executable with the “-h”
parameter, this will display a help message and exit the program.
Further parameters are

• “-v”

This prints out the version number of the executable and exits. This is very useful infor-
mation to include in a bug report as the exact state of the SVN tree can be reproduced.

• “-d”

This enables the debug mode. This means that all messages that are outputted through
the log facility and flagged as debug messages will be printed. This is helpful for devel-
opment and testing

Further documentation is automatically generated out of the comments in the code and
published online. This is interesting as through this programmers can gain a deeper under-
standing of the inner workings of the compiler and avoid errors.

5.11.2 Installation

There is a complete package called objic that can be downloaded from the project website and
installed (see http://objic.ribalba.de). The setup is quite simple and only requires Python
and the Antlr libraries to be installed. Further a public SVN development snapshot can be
downloaded for people that want to contribute to the project. Because of the *NIX develop-
ment no graphical installer is available but a general familiarity with Python and the *NIX
operating system are enough to have a fully functional environment set up in a few steps (see
Appendix E on page 77 for further information).

46
“A class is a lot like an iceberg:
7/8 is under water, and you can
see only the 1/8 that’s above
the surface”
Steve McConnell
6
Testing

Especially in an iterative approach testing is vital [Myers, 1979], as further development relies
on the correctness of the code the changes built on [Hunt and Thomas, 1999]. Because of the
dynamic properties of Python a lot of testing can be done while coding.

6.1 Strategy
There were two major testing techniques used to measure software quality in this project.
Firstly static and dynamic code analysis was used to check for programming errors, du-
plicate code and further code smells, which can be called white-box testing. The tools
used were also adapted to check for compliance with the Python coding convention (see
http://www.python.org/dev/peps/pep-0008/). This enables the code to be put into the pub-
lic domain and be understood easily, by other programmers. Secondly, every module has a
specific test section, in which unit tests are performed (black-box testing), this assumes that
the programs are syntactically correct. Furthermore it checks the different methods through
their return values with specific inputs. This section might be removed in production code as
it will slow down execution and programming related error messages might confuse the user,
as the author experienced when giving the program to a friend and an assert failed.

6.2 Plan
As an iterative approach was used, running the test cases was the last stage of every iteration.
While the code analysis was performed throughout the whole development life cycle, test
cases were written and executed at the end. The code analysis was automatically performed
on every file save. The eclipse editor was modified to start pylint (see section 1.3.2 on page 4)
as soon as the source code changed. This enabled rapid error checking and ensured that the
coding convention was followed under all circumstances. Before every SVN commit the test

47
CHAPTER 6. TESTING

cases were run and checked that none of them had failed. Through this approach only tested,
convention conforming and documented code was submitted to the main tree and could be
replicated and reused [McConnell, 1993].

Figure 6.1: Picture of the development environment

Further keywords were defined to identify notes in the code. These are intended for the
programmer, following tokens are recognized by the editor: “TODO”, “FIXME” and “XXX”
and will be highlighted in the code and a list at the bottom of the editor window indicates
what has to be completed, see Fig. 6.1 “2”. For completeness “3” is the source code editor
window and “1” is an interactive Python shell that is used to test small segments of code, by
copy pasting them into this window and then executing only this fragment. This does not
take regression testing into account but this was considered not as important on a project this
size [Brooks, 1995].

6.3 Code Analysis


Code analysis is the process of automatically checking the code for program correctness [Beizer,
1990]. This can be done by an array of tools and techniques, but the two main areas are
dynamic and static analysis. In the first the code is executed whereas in the second only the
uninterpreted / compiled code is looked at, a process called linting [Spolsky, 2004].

48
6.3. CODE ANALYSIS

Two programs were used to implement the code analysis:

• pylint

As described in section 1.3.2 on page 4 pylint is a static code analysis tool. It was mainly
used to check that the code conformed with the Python coding convention and did not
contain any code smells, like duplicate code. This could be done on an automated basis
and was done on every file save. A further feature used was the code rating facility.
This takes errors or warnings found and calculates a number based on 10.0 − (((5 ∗
error) + warning + ref actor + convention)/statement) ∗ 10) to represent a “mark” for
the code quality. All code written had to achieve a mark over 9.5 out of 10, otherwise
refactoring was done and the errors corrected. The remaining 0.5% are slack for some
specific cases in which making the code comply would create more confusion than using
KISS. Whether using lines of code as a metric is up for discussion [Fenton and Pfleeger,
1998].

Code 9 The pylint results for the project files


globalConf.py 10.00/10
Interpreter.py 9.87/10
MethObject.py 10.00/10
ObjConnection.py 10.00/10
objicc.py 10.00/10
ObjManager.py 9.62/10
ObjServer.py 9.57/10
oClass.py 9.78/10
oInt.py 10.00/10
oObject.py 9.67/10
orun.py 10.00/10
oString.py 10.00/10
oVm.py 10.00/10

• pychecker

Pychecker works differently to pylint in the respect that it executes the code but it tries
to find similar problems. This has some advantages like that it can understand more
dynamic programs but it also creates the problem that some modules cannot be tested
as they are not meant for execution, instead they provide helper methods for other
functions. This was only used when committing the code to the SVN tree, to ensure

49
CHAPTER 6. TESTING

that pylint hadn’t missed anything. Another problem is that pychecker follows all the
library imports which is a problem as the Antlr Python libraries have to be set up in an
specific way and would produce many errors that are not important.

6.4 Code Coverage


The pydev code coverage tool (see http://pydev.sourceforge.net/codecoverage.html) was used
while developing to see if the test cases executed every line of code. This was not always
possible as some error conditions could not be simulated, especially when it came to network
errors. Therefore, a dynamic approach was chosen in which the tool was executed manually.
The execution generated a report as can be seen in code 10

Code 10 A code coverage test result

1 Name Stmts Exec Cover Missing


2 ------------------------------------------------------------------------------
3 oInt.py 68 46 67.6% 71-80,92-93,105-106,118-119,132-133,140
4 ------------------------------------------------------------------------------
5 TOTAL 68 46 67.6%

At first a coverage of 67% does not seem high but it should be noted that a large amount of
these lines are the “if” conditions that are actually testing the code and don’t error. Another
counter, that is not taken into account, are the relative trivial checks for the internal state
of the object which, if not correct, would have shown up on running the object and as such
should have been tested a second time, but because of the limited time this was not done. By
manually analyzing the reports coding errors could be spotted quite efficiently as it could be
seen if the program execution path was as expected [Kernighan and Pike, 1999].

6.5 Recording Faults


As trac (see section 1.3.2 on page 3) was used as an online development tool, faults could
be recorded in an online form (see Fig. 6.2 on the facing page). This required filling out a
summary, detailed description, priority, component and type of the ticket to raise. Through
this approach when an error was found it was recorded and not forgotten. At the end of every
coding iteration it was checked that all known errors were fixed. Further by colour coding
high priority faults were fixed as soon as possible. This also enabled other people, who were
using the code to record errors they had found.

50
6.5. RECORDING FAULTS

Figure 6.2: Trac showing the tickets on Tue 05 May 2009

51
“Perfection is achieved, not
when there is nothing more to
add, but when there is nothing
left to take away.”
Antoine de Saint Exupery

Critical Evaluation / The objic Language


7
This chapter will attempt to evaluate all of the major steps performed for this project and
highlight what could have been done better, in the opinion of the author and what was done
satisfactorily.

7.1 Evaluation of Language


As one of the aims was to make the language very similar to Java, Python and C three different
syntaxes and ideologies were mixed. While this being an advantage in the respect that the
best features could be picked from each this also had some drawbacks. This can be very
clearly seen in the method object. Because Python is very dynamic, method objects are a
useful addition but can be confusing, mixing this already confusing principle with a Java like
class syntax and C like constructs creates very hard to read syntax. This will require a major
change in the objic syntax and some clear guidelines have to be created. During the design
stage the idea was to give the programmer as much freedom as possible, because of this no
clear rules on naming conventions were established. This means that every time a value of an
object is needed the corresponding “value()” method has to be invoked or a method object has
to be created. This makes the language very verbose and seems “bloated”. After writing many
objic programs the realisation is, that defining a standard method call name for the value of
an object would have benefited the readability and the clarity of the language. This can also
be applied for representation “rep()”, used in print.
Objic is a Turing complete language that can compete with existing solutions. One of
the objectives was to create a language that can be understood by Java, C(++) and Python
programmers. For this two things have to hold: The syntax has to be familiar and the
behaviour of the language the programmer expects has to be similar. Runtime and memory
footprint are not analysed as this was not a primary concern and no real optimization has
taken place to enable this. Further the great advantage of distribution is not discussed as in

52
7.1. EVALUATION OF LANGUAGE

the ideal case the programmer is not noticing that he is writing a distributed application.

7.1.1 Language Syntax

To be able to compare the languages a syntactical discussion of the influential languages would
be needed. Since there is not enough space to discuss all three, only Python will be compared,
being the language all the implementation has been done in. This does not mean there has
been no influence from C and Java. The “for” for loop syntax, for example is a direct copy
from C.
In Python code does not have to be in the scope of a class as it is a scripting language
extended to implement classes and methods whereas objic was designed to be object orientated
from the start, based on the need to be able to distribute objects. One of the biggest differences
is that Python uses indentation to specify blocks, while forcing the programmer to indent his
code and so create code that is more readable, this has been a major point of criticism. In objic
the decision was made to use “{” and “}” as block delimiters like used in C based languages.
Another noticeable difference is the “main” method, this is copied from Java and is a general
advice to name the first method to be called in an object. However this is not enforced by
any part of the language. In objic it is further not possible to let the compiler decide which
data type a literal should be, this is different to all the languages discussed. In Python,
if the interpreter sees a statement like in line two in code example 11 it will create a new
object of type String and point the variable to it. This behaviour is built into the compiler or
interpreter. In objic this was not possible as it is not known where these objects will reside and
if they exist at all. So the “new” keyword has to be used in this context, simulating a similar
behaviour to Java but without the built-in types like “int” [Flanagan, 2005]. A similarity to
Python and C++ [Meyers, 2005] is the way that a method can be referenced like an object
through a method object or function pointer in C++, this is not implemented in Java.

Code 11 Python example

1 class HelloWorld:
2 text = "Hello World"
3
4 def printHello:
5 print str(text)
6
7 objClass = HelloWorld()
8 objClass.printHello()

From the examples seen in code 11 and code 12 on the next page it can be seen that the

53
CHAPTER 7. CRITICAL EVALUATION / THE OBJIC LANGUAGE

Code 12 Objic example

1 class HelloWorld {
2 text = new String("Hello World")
3
4 def printHello {
5 print(text)
6 }
7
8 def main {
9 ME.printHello()
10 }
11 }

syntaxes are very similar and it should be simple for a Python, C or Java programmer to
understand and learn objic fairly easily.

7.1.2 Language Behaviour

In the objic syntax there is no functionality for the characters “+”, “-”, “*”, etc. . . . As it is
a purely object orientated language this functionality was not implemented. This is a major
difference to its reference languages but was a conscious decision as it cannot be guaranteed
that the object that the operation is performed on will understand it. Instead there is the
guideline to call addition methods “add” and subtraction “min” as can be seen with the Int
and the String objects. In further versions this might change as over-riding is implemented
(see section 8.1 on page 57).
Otherwise it can be assumed that the language will behave very similar to its reference
languages.
The main question if it is possible to write a usable, familiar and distributed cloud pro-
gramming language while hiding the underlying networking, can be answered with a yes.
There are still many hurdles to overcome but the general theory has been proven with this
paper.

7.2 Research
To be able to form the initial idea into a theory a lot of research had to be done. Working
on the topic of cloud computing turned out to be very difficult as new papers were published
throughout the project with different definitions of the “cloud” which had to be followed by
rewrites of specific sections [Weiss, 2007]. In retrospect the development was aggravated by

54
7.3. DEVELOPMENT METHODOLOGY

choosing a cutting edge research topic that was not properly defined when the project started.
Because of this, the area of abstract syntax tree interpreters was not researched to the full
extent and some pitfalls in the implementation phase could have been avoided had more time
been spent on this. This is also linked to the realisation that a “normal” compiler would not be
able to achieve the flexibility needed, which was not clear from the beginning, but accounted
for.

7.3 Development Methodology

Choosing an iterative approach seemed the most logical thing to do, based on the clearly
defined iterative steps (see section 1.6 on page 7). However something that was not anticipated
is the amount of refactoring and rewriting needed to achieve good code quality. Analysing
the SVN commits about 37 percent of every iteration was replacing already existing code
lines. While some of this is error fixing, this raises the question if a traditional waterfall based
approach would have reduced this.

7.4 Implementation

A very critical decision for the success of a project is the choice of implementation language.
Using Python provided the high-level language features needed for the tight time frame, but
also allowed access to the low level operating system methods. This enabled to write quick
scalable code that can still be easily maintained. The slight increase in execution time and
memory footprint can be neglected as this was never an aim of the project. However, by using
the Python socket and TCP server libraries the underlying network could not be exactly
configured as needed and thus some stability issues came up that could not be fixed. This can
easily be fixed by writing this part of the system in a lower level language.

7.5 Testing

As discussed, testing was done as a step in the iterative approach. Because of the time frame
testing was cut short and only a few properly documented and tested test cases are in the
code. As the main aim of the project was to provide a proof of concept, detailed and thorough
testing was not considered a very important iteration. This would have to be improved in
future versions to achieve an industry grade quality. However tests were implemented to prove
that the concept works and that the main theory holds.

55
CHAPTER 7. CRITICAL EVALUATION / THE OBJIC LANGUAGE

7.6 Project Plan


The initial Gantt-chart created (see Appendix D on page 76) was mostly followed with the
exception of a four day overrun in the implementation phase because there was no clearly
defined end mile stone. This was not a problem as 14 days were planned as slack at the end
of the project, which was put in place for exactly this reason. The technique of updating the
chart with an estimated percentage of stage completion showed to turn out very valuable and
through the experience gained in doing this a clearer picture emerged were the project was
risking in running late and counter measures subsequently could be taken.

7.7 Personal Performance


Based on the experience with other tasks throughout the author’s career it has become clear
that design and tools used are vital to the completion of a project. Using source control, a
fault database and the appropriate language influence the outcome more than the initial idea.
Further it has to be said that it is important to plan for human error, with regular backups,
automatic saves and recurring checks.
Because a research based project was chosen, there was no clear end point that could
be reached, this was realised and accounted for by following the Gantt-chart rigorously. As
can be seen in the section on future work (see chapter 8 on the next page) the project can be
extended to a PhD scale thesis, as indicated by St Andrews. It would have been fully sufficient
to choose only the object communication protocol as topic and write a proof of concept around
this. This would have enabled a more in depth discussion on one particular aspect instead
of scratching many areas and actually would have provided a fixed end mile stone at which
coding could have stopped.
More time should have been spent on low-level specifications. Development started as
soon as a fairly high-level view of the system was formalized, this was based on the fear of not
finishing on time. By doing this some problems were triggered at the implementation phase
especially on the protocol level. While the benchmarks with XML, HTTP and others were
performed at the beginning and the decision to use a self developed version was incorporated
into the design, the protocol should have been specified in all detail too. Further the lack of
an appropriate design notation for distributed systems made designing the system as a whole
entity very difficult.

56
“Debugging is twice as hard as
writing the code in the first
place. Therefore, if you write
the code as cleverly as possible,
you are, by definition, not smart
enough to debug it.”
Brian Kernighan

Future Work
8
There are many more ideas that could extend this project. While some of them are in the
near future some others have not even been properly articulated.

8.1 Short Term


This is a list of the features that will be included in the next official release of the language.

• Fail-over

It is important that if a server is not reachable anymore a fail-over system is in place. This
means that an object is never solely instantiated in one location. This feature should
be easy to enable and should be used in production environments. Also “intelligent”
network error recovery should be in place so that the object is always in a consistent
state. If an object cannot be reached the interpreter should decide when the object is
next used and continue with the execution till this point and then retry, by doing this
small network errors can be fixed. To enable all this functionality the networking layer
has to be rewritten.

• Standardised protocol

While the protocol derived for this project is fit for its purpose, a standard has to be
found to enable inter project communication. This has already started in a proposal
that is currently being written by the author. This will hopefully provide an alternative
to SOAP and other existing non complete implementations.

• Inheritance / Polymorphism

An important cornerstone of object oriented programming is the possibility to inherit


from base objects. It has not been totally examined how this can be performed in the

57
CHAPTER 8. FUTURE WORK

cloud and further research is needed to clarify the interface between the objects. When
this is implemented polymorphism and overloading will naturally be the next step.

• Local caching

To enable execution speedups it has to be researched if local caching can be performed to


reduce network traffic. This can generate some problems like race-conditions and changes
not propagating correctly but at the current state the advantages seem to prevail.

• Effective memory management

The current memory management implementation is intentionally kept very simple and
no optimization has been performed. To achieve comparable execution times and mem-
ory footprint this has to be improved and a new memory model has to be derived in
which the garbage collector is aware of the usage of objects and can communicate with
the objects concerned.

• Object versioning

As already mentioned in section 4.11.1 on page 31 object versioning is a problem in a


distributed environment. A solution to this problem has already been found, by which
objects can communicate their versions and check for compatibility. At compile time
every object checks all the objects it should connect to and saves the version in the
byte-code. When connecting to this object in run time the version is requested which
incorporates a “backwards compatible” flag which has to be set to be able to connect to
the new version. Otherwise a new request is sent with the exact version needed which
the server should be able to provide.

• Modelling notation

There is no notation to model distributed cloud services. This was a problem in the de-
sign stage because UML (Unified Modeling Language) does not provide a notation to do
this. The requirements for such a language or notation are currently being investigated
by the author and Cornelius Ncube.

8.2 Long Term


These points are a listing of thoughts that might be researched and implemented in the long
term future.

• Enable private clouds

58
8.2. LONG TERM

Corporate companies might not want to offload all their internal data to some service
provider in the cloud. It might be desirable to build up a private cloud behind the
corporate firewall. This cloud might want to use some well defined services through the
internet. Defining a clear interface to the internet and other clouds should be enabled.

• Payment

As the language should enable the program to be distributed as a SaaS a payment system
would have to be implemented. It should be able to charge registered customers on a
method invocation basis. Through this it is possible to “sell” a program in the cloud.
There are many issues involved in this and no research has been carried out.

• Object authentication

Since with distribution security becomes a major issue there has to be some method of
allowing certain objects access to other objects in the cloud. In the current implemen-
tation this is done by “security through obscurity” [Anderson, 2001] in using a very long
hash that is very difficult to guess, but still possible. There has to be a protocol in which
an object can allow or grant access to its partner objects.

• Service search database

To enable the full potential of using distributed services a platform has to be created in
which providers can publish their objects and define the interfaces and functionalities.
Ideally this could be an automated process in which the programmer only specifies what
is needed and an object-host is found by the interpreter with alternatives and backup
services. In the short term a website is imaginable that lists all the services offered,
whereas a few major providers will be best known and mostly adapted.

• Different output run environments

A big aim of cloud computing is to enable applications to run in a device independent


environment. In objic this is enabled through implementing different “run environments”
(see section 5.1.3 on page 34). While some tests have been made with a Java Script
implementation a whole array of different clients has to be implemented, to achieve true
device independence.

• Encryption

At the current state it is not possible to encrypt the communication between objects. As
more and more personal data moves to the cloud this is a very important feature and the
requirement for strong encryption will grow. A certificate based approach seems to be the

59
CHAPTER 8. FUTURE WORK

most feasible whereas there are some problems associated with this like how to connect
certificates to objects and how to handle changes which have to be evaluated. A lot of
research has been done in this area and hopefully objic can build upon this [Schneier,
1995].

• Verification

For a distributed environment it is very important to verify that the service it is talking
to is really the one wanted. If the service is not verified spoofing becomes easily possible
and this would make the security strategy worthless. This goes hand in hand with the
point on encryption and should not be too hard to implement as a lot of research has
been done and libraries are available [Needham and Schroeder, 1978]

• Object / Server migration

When running an object server in a production environment is has to be possible to


migrate the running objects to another server without losing the associated information
and connections. While it is possible to serialise the object and instantiate in another
server instance, a “hot move” is not possible. This is a difficult topic as the event when an
object receives a request while being moved has to be covered to keep it in a consistent
state.

It can always be argued how relevant research is, but many lessons have been learnt and the
interest from various companies, individuals and universities has shown that the topic needs
more discussion. By proving that it is possible to create a distributed cloud programming
language, even if not fully complete, has laid the foundation for further research. The project
has also shown the great need for defined cloud computing standards to enable provider
independent computation and appropriate notations. Based on the particular interest of a
few individuals the project will be continued as an open source development effort.

60
“There is no greater mistake
than the hasty conclusion that
opinions are worthless because
they are badly argued”
Thomas Henry Huxley
9
Conclusion

In conclusion it can be said that the project was a success in that it is possible to distribute
objects over numerous machines in a cloud like setup by using a Turing complete, Java like
object orientated language. By doing this, it has been proven that it is possible to apply
distributed objects to a cloud setup and hide the distribution from the programmer.
Due to this project the author has also deepened his knowledge in compilers and inter-
preters and learned about distributed systems. As a high code standard was required Python
was adopted very quickly as were the tools supporting its development, which was also an
objective. This has helped the author develop his professional portfolio and experience the
issues involved with going through all the steps in creating a project with about ten thousand
lines of code.
Choosing a topic that has not even been properly defined turned out to be a problem
as the boundaries of the project kept changing due to new papers being published but this
caused the author to gain a deeper knowledge of the domain that otherwise would not have
been required. Because of the misconception of the scope and the limited word count for
the report, many areas could not be handled in the detail that was anticipated. Never the
less a high-level view was gained with many areas of further research opening up and being
documented. This enables many new areas of future work to be explored.
Based on the feedback and the interest from various sources a discussion is now starting
about how this idea can be used to enable “the cloud” to become more programmable. The
author feels very proud of, that two papers about this prototype have already been accepted
at international conferences (see Appendix J on page 83) and it may become a topic for a PhD.
It confirms the author’s view of the relevance of this topic. Therefore, the project objectives
seem to be fully met and even extend the expectations hoped for.

61
List of Figures

2.1 A typical network diagram using a cloud . . . . . . . . . . . . . . . . . . . . . . 10


2.2 Translation of a print statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.1 The environment in relation to the byte-code . . . . . . . . . . . . . . . . . . . 21


4.2 The antlrWorks editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3 Diagram showing the relationship between object and class stack frames . . . . 24
4.4 A high level diagram of the server . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.5 Comparison between SOAP and objic protocol . . . . . . . . . . . . . . . . . . 29
4.6 Parse time comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5.1 the simplified class structure of the object server . . . . . . . . . . . . . . . . . 36


5.2 The class diagram for the methobj class . . . . . . . . . . . . . . . . . . . . . . 37
5.3 The class diagram for the ObjConnection class . . . . . . . . . . . . . . . . . . 38
5.4 The class diagram for the ObjManager class . . . . . . . . . . . . . . . . . . . . 38
5.5 The class diagram for the oClass class . . . . . . . . . . . . . . . . . . . . . . . 39
5.6 The class diagram for the oObject class . . . . . . . . . . . . . . . . . . . . . . 39
5.7 The oObject base class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.8 The class diagram for the oInt class . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.9 The class diagram for the oString class . . . . . . . . . . . . . . . . . . . . . . . 41
5.10 The class diagram for the oVm class . . . . . . . . . . . . . . . . . . . . . . . . 41
5.11 The class diagram for the RequestHandler class . . . . . . . . . . . . . . . . . . 42
5.12 Representation of a simple AST . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

6.1 Picture of the development environment . . . . . . . . . . . . . . . . . . . . . . 48


6.2 Trac showing the tickets on Tue 05 May 2009 . . . . . . . . . . . . . . . . . . . 51

K.1 A diagram of the syntax for a new block of code . . . . . . . . . . . . . . . . . 87


K.2 A diagram of the syntax for a call statement . . . . . . . . . . . . . . . . . . . . 88
K.3 A diagram of the syntax for a new class definition . . . . . . . . . . . . . . . . . 88
K.4 A diagram of the syntax for a new method definition . . . . . . . . . . . . . . . 88
K.5 A diagram of the syntax for a while loop . . . . . . . . . . . . . . . . . . . . . . 89
K.6 A diagram of the syntax for a new for loop . . . . . . . . . . . . . . . . . . . . . 89
K.7 A diagram of the syntax for a new variable declaration . . . . . . . . . . . . . . 90
K.8 A diagram of the syntax for the parameters passed in to methods . . . . . . . . 91
K.9 A diagram of what can be included in a block . . . . . . . . . . . . . . . . . . . 91
K.10 A diagram of the syntax for the NAME token . . . . . . . . . . . . . . . . . . . 92

62
10
List of Abbreviations

SVN Subversion

CVS Concurrent Versions System

CERN European Organization for Nuclear Research

IDE Integrated development environment

RSS Really Simple Syndication

XML eXtensible Mark-up Language

URL Uniform Resource Locator

VM Virtual machine

BSD Berkeley Software Distribution

CORBA Common Object Request Broker Architecture

ORB Object Request Broker

EBNF Extended Backus-Naur-Form

HTTP Hypertext Transfer Protocol

KISS Keep it Short and Simple

UML Unified Modelling Language

AST Abstract syntax tree

63
Bibliography

Harold Abelson. Struktur Und Interpretation Von Computerprogrammen (Springer-Lehrbuch).


Springer-Verlag Berlin and Heidelberg GmbH , &, Co. K, 2001.

Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles,
Techniques, and Tools. Addison Wesley, 2006.

Ross J. Anderson. Security Engineering: A Guide to Building Dependable Distributed Systems


(Wiley Computer Publishing). John Wiley , &, Sons, 2001.

Andrew W. Appel. Modern Compiler Implementation in Java. Cambridge University Press,


2002.

Jim Arlow and Ila Neustadt. UML 2 and the Unified Process: Practical Object-Oriented
Analysis and Design (Addison-Wesley Object Technology Series). Addison Wesley, 2005.

Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy H. Katz, Andrew
Konwinski, Gunho Lee, David A. Patterson, Ariel Rabkin, Ion Stoica, and Matei Zaharia.
Above the clouds: A berkeley view of cloud computing. Technical Report UCB/EECS-
2009-28, EECS Department, University of California, Berkeley, Feb 2009. URL http:
//www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.html.

Haley Beard. Cloud Computing Best Practices for Managing and Measuring Processes for
On-demand Computing, Applications and Data Centers in the Cloud with SLAs. Emereo
Pty Limited, 2008.

Boris Beizer. Software Testing Techniques. Itp - Media, 1990.

Kurt Bittner and Ian Spence. Managing Iterative Software Development Projects (Addison-
Wesley Object Technology). Addison Wesley, 2006.

Joshua Bloch. Effective Java (Java Series). Prentice Hall, 2001.

Gerd Breitter and Michael Behrendt. Cloud computing concepts. Informatik Spektrum, 31
(6), 2008.

Frederick P. Brooks. The Mythical Man Month and Other Essays on Software Engineering.
Addison Wesley, 1995.

Rajkumar Buyya, Chee Shin Yeo, and Srikumar Venugopal. Market-oriented cloud com-
puting: Vision, hype, and reality for delivering it services as computing utilities. CoRR,
abs/0808.3558, 2008.

64
BIBLIOGRAPHY

N Carr. The Big Switch: Rewiring the World from Edison to Google. W. W. Norton , &, Co.,
2009.

Thomas M. Connolly and Carolyn E. Begg. Database Systems: A Practical Approach to


Design, Implementation and Management (4th Edition). Addison Wesley, 2004.

Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction
to Algorithms, Second Edition. The MIT Press, 2001.

Irmen de Jong (and others). Web services/soap and corba. OMG Whitepapers, 2002. doi:
http://www.omg.net/news/whitepapers/CORBA_vs_SOAP1.pdf.

Kemal A. Delic and Martin Anthony Walker. Emergence of the academic computing clouds.
Ubiquity, 9(31):1–1, 2008. doi: http://doi.acm.org/10.1145/1414663.1414664.

Jean Dollimore, Tim Kindberg, and George Coulouris. Distributed Systems: Concepts and
Design (4th Edition). Addison Wesley, 2005.

Wolfgang Emmerich. Engineering Distributed Objects. John Wiley , &, Sons, 2000.

Norman E. Fenton and Shari Lawrence Pfleeger. Software Metrics: A Rigorous Approach.
PWS, 1998.

David Flanagan. Java in a Nutshell (In a Nutshell (O’Reilly)). O’Reilly Media, Inc., 2005.

I. Foster, Yong Zhao, I. Raicu, and S. Lu. Cloud computing and grid computing 360-degree
compared. Grid Computing Environments Workshop, 2008. GCE ’08, pages 1–10, Nov.
2008. doi: 10.1109/GCE.2008.4738445.

Martin Fowler, Kent Beck, John Brant, William Opdyke, and Don Roberts. Refactoring:
Improving the Design of Existing Code (Object Technology Series). Addison Wesley, 1999a.

Martin Fowler, Kent Beck, John Brant, William Opdyke, and Don Roberts. Refactoring:
Improving the Design of Existing Code (Object Technology Series). Addison Wesley, 1999b.

Petter Haggholm. Pyremote: Object mobility in the python programming language. OMG
Whitepapers, 2007. doi: http://www.cs.ubc.ca/grads/resources/thesis/Nov07/Haggholm_
Petter.pdf.

Brian Hayes. Cloud computing. Commun. ACM, 51(7):9–11, 2008. ISSN 0001-0782. doi:
http://doi.acm.org/10.1145/1364782.1364786.

James Hayes. Clout of the cloud. Engineering and Technology, 2009.

65
BIBLIOGRAPHY

Michi Henning. The rise and fall of corba. Commun. ACM, 51(8):52–57, 2008. ISSN 0001-0782.
doi: http://doi.acm.org/10.1145/1378704.1378718.

Jim Holmes. Object-Oriented Compiler Construction. Pearson US Imports , &, PHIPEs, 1994.

Andrew Hunt and David Thomas. The Pragmatic Programmer. Addison Wesley, 1999.

Ivar Jacobson. Object-oriented Software Engineering: A Use CASE Approach (ACM Press).
Addison Wesley, 1992.

Brian W. Kernighan and Rob Pike. The Practice of Programming (Addison-Wesley Profes-
sional Computing Series). Addison Wesley, 1999.

Wolfgang Kuechlin and Andreas Weber. Einfuehrung in die Informatik. Objektorientiert mit
Java (Springer-Lehrbuch). Springer-Verlag Berlin Heidelberg, 2000.

Ningning Hu Li, Li erran Li, Zhuoqing Morley Mao, Peter Steenkiste, and Jia Wang. A
measurement study of internet bottlenecks. In In Proc. IEEE INFOCOM, pages 1689–
1700. IEEE Press, 2005.

Steven C. McConnell. Code Complete: A Practical Handbook of Software Construction. Mi-


crosoft Press,U.S., 1993.

Ivanka Menken. SaaS - The Complete Cornerstone Guide to Software as a Service Best
Practices Concepts, Terms, and Techniques for Successfully Planning, Implementing and
Managing SaaS Solutions. Emereo Pty Limited, 2008.

Scott Meyers. Effective C++: 55 Specific Ways to Improve Your Programs and Designs
(Addison-Wesley Professional Computing Series). Addison Wesley, 2005.

Michael Miller. Cloud Computing: Web-Based Applications That Change the Way You Work
and Collaborate Online. QUE, 2008.

John Paul Mueller. Special Edition Using Soap. QUE, 2001.

Glenford J. Myers. The Art of Software Testing (Business Data Processing). John Wiley , &,
Sons, 1979.

Roger M. Needham and Michael D. Schroeder. Using encryption for authentication in large
networks of computers. Commun. ACM, 21(12):993–999, 1978. ISSN 0001-0782. doi:
http://doi.acm.org/10.1145/359657.359659.

Robert Orfali, Dan Harkey, and Jeri Edwards. The Essential Distributed Objects Survival
Guide. John Wiley , &, Sons, 1995.

66
BIBLIOGRAPHY

Krzysztof Ostrowski, Ken Birman, Danny Dolev, and Jong Hoon Ahnn. Programming with
live distributed objects. In ECOOP ’08: Proceedings of the 22nd European conference on
Object-Oriented Programming, pages 463–489, Berlin, Heidelberg, 2008. Springer-Verlag.
ISBN 978-3-540-70591-8. doi: http://dx.doi.org/10.1007/978-3-540-70592-5_20.

John K. Ousterhout. Scripting: Higher level programming for the 21st century. IEEE Com-
puter, 31:23–30, 1997.

D. L. Parnas. A technique for software module specification with examples. Commun. ACM,
15(5):330–336, 1972. ISSN 0001-0782. doi: http://doi.acm.org/10.1145/355602.361309.

Terence Parr. The Definitive ANTLR Reference: Building Domain-Specific Languages (Prag-
matic Programmers). Pragmatic Bookshelf, 2007.

David Parsons. Object Oriented Programming (Computing Programming Textbooks). Thomson


Learning, 1997.

Dan Pilone and Neil Pitman. UML 2.0 in a Nutshell (In a Nutshell (O’Reilly)). O’Reilly
Media, Inc., 2005.

David Plainfossé and Marc Shapiro. A survey of distributed garbage collection techniques.
In IWMM ’95: Proceedings of the International Workshop on Memory Management, pages
211–249, London, UK, 1995. Springer-Verlag. ISBN 3-540-60368-9.

Tim Rowledge. A tour of the squeak object engine. 2001. URL http://stephane.
ducasse.free.fr/FreeBooks/CollectiveNBlueBook/oe-tour-sept19.pdf.

Bruce Schneier. Applied Cryptography: Protocols, Algorithms and Source Code in C. John
Wiley , &, Sons, 1995.

Kenn Scribner and Mark Stiver. Understanding SOAP: Simple Object Access Protocol (Sams
professional). Sams, 2000.

Yunhe Shi, Kevin Casey, M. Anton Ertl, and David Gregg. Virtual machine showdown: Stack
versus registers. ACM Trans. Archit. Code Optim., 4(4):1–36, 2008. ISSN 1544-3566. doi:
http://doi.acm.org/10.1145/1328195.1328197.

James Snell, Doug Tidwell, and Pavel Kulchenko. Programming Web Services with SOAP.
O’Reilly Media, Inc., 2001.

Stephan Somogyi and Bruce Schneier. Inside risks: The perils of port 80. Commun. ACM, 44
(10):168, 2001. ISSN 0001-0782. doi: http://doi.acm.org/10.1145/383845.383875.

67
BIBLIOGRAPHY

Joel Spolsky. Joel on Software: And on Diverse and Occasionally Related Matters That Will
Prove of Interest to Software Developers, Designers, and Managers, and to Those ... or
Ill-Luck, Work with Them in Some Capacity. APRESS, 2004.

Robert Tolksdorf and Kai Knubben. Programming distributed systems with the delegation-
based object-oriented language dself. In SAC ’02: Proceedings of the 2002 ACM symposium
on Applied computing, pages 927–931, New York, NY, USA, 2002. ACM. ISBN 1-58113-
445-2. doi: http://doi.acm.org/10.1145/508791.508971.

Aaron Weiss. Computing in the clouds. netWorker, 11(4):16–25, 2007. ISSN 1091-3556. doi:
http://doi.acm.org/10.1145/1327512.1327513.

68
A
Appendix

69
B
License

1 Copyright (c) 2009, Hoffmann Geerd-Dietger


2 All rights reserved.
3
4 Redistribution and use in source and binary forms, with or without
5 modification, are permitted provided that the following conditions are met:
6 - Redistributions of source code must retain the above copyright
7 notice, this list of conditions and the following disclaimer.
8 - Redistributions in binary form must reproduce the above copyright
9 notice, this list of conditions and the following disclaimer in the
10 documentation and/or other materials provided with the distribution.
11 - Neither the name of the copyright owner nor the
12 names of its contributors may be used to endorse or promote products
13 derived from this software without specific prior written permission.
14 - All advertising materials mentioning features or use of this software
15 must display the following acknowledgement:
16 This product includes software developed by Hoffmann Geerd-Dietger
17 and contributors.
18 - The Program and its derivative work will neither be modified or
19 executed to harm any human being nor through inaction permit
20 any human being to be harmed.
21
22 THIS SOFTWARE IS PROVIDED BY Geerd-Dietger Hoffmann ’’AS IS’’ AND ANY
23 EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
24 WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
25 DISCLAIMED. IN NO EVENT SHALL <copyright holder> BE LIABLE FOR ANY
26 DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
27 (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
28 LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
29 ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
30 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
31 SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

70
C
Antlr Syntax

This is a listing of the syntax file that Antlr uses to create the Tokenizer and Parser, it is
basically a list of rules describing the way the language, created in this project, should be
structured. The syntax is very similar to EBNF (Extended Backus-Naur-Form).

1 grammar Expr;
2
3 //Some options for the generator
4 options {
5 language=Java;
6 output=AST;
7 ASTLabelType=CommonTree;
8 }
9
10 //Tokens for the AST (see bottom for more)
11 tokens {
12 BLOCK;
13 EQ;
14 PARAMS;
15 NEWOBJ;
16 CALL;
17 IFTEST;
18 CLASS;
19 PTRASS;
20 BOOL;
21 WHILE;
22 BREAK;
23 DOWHILE;
24 ELSE;
25 PRINTSTM;
26 OSTRING;
27 SERVERNAME;

71
APPENDIX C. ANTLR SYNTAX

28 METHDEF;
29 DEBUGFLAG;
30 RETURNSTM;
31 }
32
33 //Every program has one class
34 prog : classdef ;
35
36 // A class is defined through the class keyword, name and a block
37 // Ex : class test { print ("hello") }
38 classdef
39 : ’class’ NAME block -> ^(CLASS NAME block )
40 ;
41
42 //A block is alwasy encapsulated between { and } and has n statements
43 block
44 : ’{’ ( stat )* ’}’ -> ^(BLOCK stat*)
45 ;
46
47 //A list of all the different statements there can be in a block
48 stat
49 : iftest
50 | methdef
51 | forloop
52 | newvar
53 | call
54 | block
55 | whileloop
56 | loopbreak
57 | printstm
58 | returnstm
59 | NEWLINE
60 ;
61
62 //Handels a return in the code
63 //Ex : return(a)
64 returnstm
65 : ’return’ ’(’ NAME ’)’ -> ^( RETURNSTM NAME)
66 ;
67
68 //Defines a new method in a class
69 // Ex : def printHelp { print ("HELP") }
70 methdef
71 : ’def’ NAME block -> ^(METHDEF NAME block)

72
72 ;
73
74 //A soimple mehtod call
75 // Ex : object.add(otherobject)
76 call
77 : a=NAME ’.’ b=NAME paramlist -> ^(CALL $a $b paramlist)
78 ;
79
80 //The print statement
81 //Ex : print("This is printed")
82 printstm
83 : ’print’ ’(’ printparams ’)’ -> ^(PRINTSTM printparams)
84 ;
85
86 //A list of things that can be printed
87 printparams
88 : call
89 | NAME
90 | STRINGTPL -> ^(OSTRING STRINGTPL)
91 ;
92
93 //The definition of a while loop
94 //Ex : while (a.value() == a.value()){print ("looping")}
95 whileloop
96 : ’while’ ’(’ boolexp ’)’ a=block -> ^(WHILE boolexp $a)
97 ;
98
99 //The break statement to stop executing a loop
100 loopbreak
101 : ’break’ -> ^(BREAK)
102 ;
103
104 //The if definition
105 //Ex : if (a.value() == b.value()){
106 // print("equa")
107 // }else{
108 // print("not equal")}
109 iftest : ’if’ ’(’ boolexp ’)’ a=block (’else’ b=block)? ->
110 ^(IFTEST boolexp $a ( ELSE $b)?)
111 ;
112
113 //Defines a new variable ptr, 3 main different possibilities
114 //Ex : a = new Int()
115 newvar

73
APPENDIX C. ANTLR SYNTAX

116 : a=NAME ’=’ ’new’ b=NAME paramlist (’@’ servername)? ->


117 ^(EQ $a NEWOBJ (servername)? $b paramlist)
118 | a=NAME ’=’ b=NAME -> ^(EQ $a PTRASS $b )
119 | NAME ’=’ call -> ^(EQ NAME call )
120 ;
121
122 //Defines a for loop quite clever as it creates the AST for an while loop
123 //they byte-code doesn’t know what a for is
124 forloop
125 : ’for’ ’(’ a=newvar ’;’ boolexp ’;’ b=newvar ’)’ block ->
126 ^(BLOCK $a ^(WHILE boolexp ^(BLOCK block $b)))
127 ;
128
129 //A boolean expression used in while and if
130 boolexp
131 :
132 | a=booltype cmp_operator b=booltype ->
133 ^(BOOL cmp_operator $a $b)
134 ;
135
136 //A list of comparement operators that are valid
137 cmp_operator
138 : ’==’
139 | ’!=’
140 | ’<=’
141 | ’>=’
142 | ’<’
143 | ’>’
144 ;
145
146 //A boolena type can be a call or a method objcect
147 booltype
148 : call
149 | NAME
150 ;
151
152 //Defines how parameters should look like
153 paramlist
154 : ’(’ ( atom (’,’ atom)* )? ’)’ -> ^(PARAMS atom* )
155 ;
156
157 //Defines the syntax for a server
158 //Ex : ptrtovoid.net
159 servername

74
160 : a=NAME ’.’ b=NAME -> ^(SERVERNAME $a ’.’ $b)
161 ;
162
163 //The smallest entity
164 atom
165 : INT
166 | NAME
167 | STRINGTPL
168 ;
169
170
171 // MORE TOKENS
172
173 INT : (’+’ | ’-’)? ’0’..’9’+
174 ;
175
176
177 NAME : (’a’..’z’|’A’..’Z’|’_’|INT)+
178 ;
179
180 STRINGTPL
181 : (’"’ (~’"’)* ’"’)
182 ;
183
184 NEWLINE : ’\r’? ’\n’ ;
185
186
187 COMMENT
188 : ’\*’ ( options {greedy=false;} : . )* ’*/’ {skip();}
189 ;
190
191 LINE_COMMENT
192 : ’\\’ ~(’\n’|’\r’)* ’\r’? ’\n’ {skip();};
193
194
195 WS : (’ ’|’\t’|’\n’|’\r’)+ {skip();} ;

75
D
Gantt Chart

76
E
INSTALL

This is the INSTALL file that is provided with the objic distribution.

1 This is the install document for the objic language.


2
3 1) First of all you have to check if you have python > 2.4 installed, this can be
best done by running :
4
5 $ python -V
6
7 This has to return something bigger than Python 2.4.0
8
9 Then you have to check for the antlr python libraries.
10
11 $ python -c ’import antlr3’
12
13 If either of these command fail you have to install these packages. Most
distributions will supply packages consult man apt-get or man yum for more
information. Otherwise
14
15 http://www.python.org/
16 http://www.antlr.org/
17
18 will help
19
20 2) After checking for the libs you can extract the code and install it
21
22 the source code can be found under objic.ribalba.de
23
24 after downloading the latest version
25
26 $ tar -xvzf objic.tar.gz
27

77
APPENDIX E. INSTALL

28 will unzip the file with all you need.


29
30
31 3) It is advised to edit the configuration file. This should be relative self
explanatory and the default settings are normally OK for testing
32
33 4) Then you have to setup your execution path to include the executable files
This is done by extending your PATH to include the src folder
34
35 $ export PATH=/path/to/source:$PATH
36
37 will normally do the trick
38
39
40 Please email me if you have any further questions under didi@ribalbaNOSPAM.de

78
F
CD Content

Directory Layout
/ ............................................................... The CD root directory
code .............................................The dir where all the code resides
JavaTreeGen ................................A program to generate parse trees
LexParse ...................................................The objic program
excipsecode .....................The eclipse project holding all source files
output ...................................................Output from Antlr
design ............................................... High level design documents
diagrams ............................................. Diagrams used in the paper
documents .....................................................License and Paper
examples ....................................................Example source code
man .....................................Manuals like INSTALL and the man pages
managment ......................................................The Gantt charts
objic .................................................The global configuration file
paper .....................................................The actual paper source
images ...............................................Images used in the report
includes ...................................Source examples used in the report
syntaximg ................................................Images of the syntax
proposal ............................................................The proposal
scripts ..........................Management scripts, like backup and word count
dump .......................................... SVN dump and other unhelpful files

Further information
Operating system Linux, with Python 2.4, see INSTALL for further detail

Documentation Is in plain ASCII and can be viewed with appropriate program

Libraries Can be found in the INSTALL file

79
G
Backup Script

1 #!/bin/sh
2
3 #Set some colour
4 export GREP_OPTIONS=’--color=auto’ GREP_COLOR=’1;32’
5
6 #Check for "bad words"
7 grep -i ’ me ’ ./paper/*.tex
8 grep -i ’ I ’ ./paper/*.tex
9 grep -i ’ we ’ ./paper/*.tex
10 grep -i ’ you ’ ./paper/*.tex
11
12 #Check that everything is committed
13 if [ ‘svn st | wc -l‘ != "0" ]; then
14 echo "Please commit all changes"; exit;
15 fi
16
17 #Goto CERN and update svn
18 ssh ribalba@lxplus.cern.ch ’svn up /afs/cern.ch/user/r/ribalba/fyp/’
19
20 #Tar up everything
21 tar -cjf fyp‘date ’+%e%b’ | sed -e ’s/\s*//g’‘.tar.bz2 *
22
23 #upload to uni
24 scp fyp‘date ’+%e%b’ | sed -e ’s/\s*//g’‘.tar.bz2 \
25 ghoffman@decweb.bmth.ac.uk:/home/ghoffman/fyp/
26
27 #And delete
28 rm fyp‘date ’+%e%b’ | sed -e ’s/\s*//g’‘.tar.bz2

80
H
For Loop

This code example will be translated into the following parse tree.
1 class ForLoop{
2 a = new Int(3)
3
4 for (c = new Int(); c.value() <= a.value(); c = c.add(1)){
5 print(c.value())
6 }
7 }

The output of the program is


1 0
2 1
3 2
4 3

81
SOAP/HTTP Comparison
I
SOAP message

1 <?xml version="1.0"?>
2 <soap:Envelope
3 xmlns:soap="http://www.w3.org/2001/12/soap-envelope"
4 soap:encodingStyle="http://www.w3.org/2001/12/soap-encoding">
5
6 <soap:Body xmlns:m="http://www.example.org/stock">
7 <m:GetStockPrice>
8 <m:StockName>IBM</m:StockName>
9 </m:GetStockPrice>
10 </soap:Body>
11
12 </soap:Envelope>

Time needed to parse: 0.137 seconds


System calls: 2334

objic

1 CALL:GetStockPrice
2 PARAM:IBM

Time needed to parse: 0.031 seconds


System calls: 891

82
Programming the Cloud
J
This is a paper which was written before the project was started, as a research proposal. It
should clarify how the vision of a cloud enabled language should look like.

Author: Hoffmann Geerd-Dietger


Contact: didi@ribalba.de
Date: 2009-04-20
Web site: http://www.ribalba.de
Version: 0.2
Dave is sitting at home in front of his TV and is trying to remember all the people who he
should send a Christmas card to. Being annoyed by the fact that he has this problem every
year, and always forgets someone, he decides to write a little application he can use through
the year to record these people. Thinking about the problem he identifies two scenarios. The
first one is that he has to add people from his address book and the second is that he has to
manually enter the data. So he grabs the keyboard that is next to the couch, changes the TV
over to his Desktop and creates a new application. First he defines a class he calls Model,
that defines the data the program should hold. Experienced readers can tell that Dave knows
the concept of Model-View-Controller.
people = new List<Person>
On a very high level the only thing that he needs is a list of people with their names and
addresses. So he searches for a “person” Object hosting service that suits his needs1 and finds
one offered by Bluesphere.org. This service can be used by private users for free, so he defines
that all Person objects should be hosted on Bluesphere.org. The guys2 at Bluesphere are so
nice and also offer a backup server so that if the first one is not reachable objects can still be
instantiated and used:
Person @ bluesphere.org, backup.bluesphere.org
If this would be an important application he would also specify that all objects are mirrored,
to be extra safe. This would involve specifying where and the language would take care of all
the underlying work. Because his desktop machine is quite low spec he decides to host the
list object on one of his rented object servers:

83
APPENDIX J. PROGRAMMING THE CLOUD

List @ Flipcube.net

He has a contract with Flipcube so that he can create 1 Million list objects a day with less
than 10000 entries. If he goes over this limit they will charge him according to the methods he
invokes, so creating a new list will cost one credit but a complex ordering will cost 5. But this
will easily fit into his limit, so now he has to set up the data structure he needs. As he wants
to use the application throughout the year he has to save the data. He does this by defining
a save() and retrieve() method in his model, which he can pretty much copy and paste:

myDataStore = new ObjectStore("peopleToRememberToSendCardsTo")

def save(){
myDataStore.save(people)
}

def retrieve(){
for person in myDataStore.getItems(){
people.add(person)
}
}

He has rented out a 2 TB of store with the company InodeBird so the tells the program to
initialize all ObjectStore objects there:

ObjectStore @ InodeBird.com, dave.homeserver.net

Further he has set up a private backup server at dave.homeserver.net which is a mirror of


InodeBird. Dave has done this so he still has physical access to his data; all his friends call
him paranoid and old fashioned because of this. Their opinion is that he can never reach the
99.9% availability of the InodeBird server farm that is located in an old nuclear bunker. All
the data transmission is encrypted and signed, when signing up to the services it is required
to transmit your personal ”profile” file which includes keys and other relevant data.
Now Dave is satisfied that the model can handle all the data it needs to. So he starts pro-
gramming the actual functionality in a class named Controller. The first thing the controller
must do is retrieve the saved data so he defines the constructor.

def Controller(stdinParam, stdoutParam){


stdin = stdinParam
stdout = stdoutParam
Model.retrieve()
}

First he adds the method to add a person from his address book. The variable stdin and
stdout are defined by the object View and are passed in every time a Controller is created.
But he does not really care how he is going to access the application, as there are predefined
views for his phone, TV and computer.

def addFromAddressBook(){
var myAddressBook = new AddressBook() @ addressbookserv.net

84
#ToDo check if successful
myAddressBook.login(stdin, stdout)

stdout.print("Enter Name ... : ")


personToFind = stdin.read()

personToAdd = myAddressBook.find(personToFind)

Model.people.add(persontoAdd)

Model.save()
}
The method to add a person by hand is written quite easily too.
def addByHand(){
personToAdd = new Person()
personToAdd.propagate(stdin, stdout)
Model.people.add(personToAdd)
Model.save()
}
Remember we use the Person object hosted on bluesphere.org and this has a propagate method
that will ask all the data it needs to know through stdin and stdout. Finally Dave saves the
updated list of persons. He then adds a similar method for deleting people from the list.
He could make the program more compact and write the addByHand() method in a short
form.
def addByHand(){
Model.people.add(new Person().propagate(stdin, stdout))
}
But Dave likes the idea to have nice structured and readable code. Further to write in this
way he would have to update the model specification to auto-save when the data is modified.
What is not directly visible from the code is that the list is actually only a list of pointers. So
if he updates an address in his address book the list will have the up-to-date data. Thread
safety is an issue here but Dave can assume that the programmer of the interface has taken
care of this. Instead of going through the process of a login in to the AddressBook server he
could have added a certificate to the program, so as soon as it runs he has read rights. He is
also relying on that bluesphere will keep the objects accessible for as long as he needs them.
In the long run he will pay them for hosting the objects, but for the development stage this
risk is acceptable.
Now the only thing he has to do is to create some front-end to his service. For this he can
use some objects that are predefined so he can access his application through all the devices
he has like phone, computer, netbook, TV, etc. So he defines a class View that extends from
viewport.org and he registers the methods he has created in the controller. He doesn’t need
to care about the display of the Person objects as there is a method display(stdout) that will
know how to display the object correctly depending on what type stdout is. Stdout always

85
APPENDIX J. PROGRAMMING THE CLOUD

has a type field that tells the render function what to output. For example if Dave accesses
his application through a web browser the type will be XHTML but he can easily get an XML
feed by using a stdout object with the type XML.
As all the objects are hosted on servers distributed around the world Dave never really know
or needs to know where the data is. While adding the List object, to his application, Dave
chuckled. He was reminded of how the sort function had a really funny error that when 100
references where the same the ordering would not be correct. This was a really weird error
and only a few people noticed it, but because all the computation was done by the servers
the fix was instant and some people still didn’t know about this. He also remembered how it
used to be, when you had to download updated versions of every software application all the
time and what a pain that was.
Dave uploaded his application to his application hosting service and can now use it from
wherever he is in the world. After some time, Dave’s friends found out about his application
and wanted to use it too. The only thing he had to do is add some user verification and he
also decided to add some error catching and then allowed his friends to use the app. Because
every part of the system is hosted somewhere it doesn’t matter that now 20 people us it, as
the servers he uses are all part of a scalable system. Dave is thinking about selling access to
his application now, this is also quite easy; the only thing he has to do is include a payment
object from the company payfriend.com, that he already uses to pay for his storage and then
he can start making money.

1 He is not bound to one service, he can use anyone he thinks will host the objects to his
satisfaction.
2 Of course there are girls working at Bluesphere too.

86
Grammar Description
K
In this section the main grammar elements are described and visualized. Small examples are
also provided to clarify the descriptions. There are more elements but these are not vital to
the understanding of the language and the design of the system.

K.0.1 block
A block is everything between ’{’ and ’}’, it is possible to have as many “stat” expressions
(see K.9 on page 91) as wanted, further empty blocks are allowed. Blocks can and have to be
able to be nested. An example for a commonly used block is a method definition see Code 13,
which might have a loop embedded, which is a block in a block.
In the implementation a block will further mean that the stack level will be raised, so all
variables defined in this block are local to the block.

Figure K.1: A diagram of the syntax for a new block of code

Code 13 Method declaration


1 def printMe{
2 print ("Hello world")
3 }

K.0.2 call
A call is a method invocation on an object with some parameters. This is known from Java,
from which this syntax is copied. The “NAME” token in front of the dot is the reference to

87
APPENDIX K. GRAMMAR DESCRIPTION

a name in the symbol table which is linked to a pointer to an object. The “NAME” after the
dot is the method to invoke. The syntax of “paramlist” can be found under K.8 on page 91.

Figure K.2: A diagram of the syntax for a call statement

Code 14 Simple method call

1 ME.printMe("Hello world")

K.0.3 classdef and methdef


Class and Method declarations share the same syntax. They both have a “NAME” token
which identifies the instance and take a block that is executed. The only restriction is that a
method definition has to be in the block of a class definition, otherwise it cannot be addressed.
Functions alone are not possible in objic, as the aim is to have the object accessible from remote
objects and as such they would not be in the published scope. The syntax is very similar to
the way Python declares such elements.

Figure K.3: A diagram of the syntax for a new class definition

Figure K.4: A diagram of the syntax for a new method definition

Code 15 Method and Class declaration


1 def PrintMeClass{
2 def printMe{
3 print ("Hello world")
4 }
5 }

K.0.4 whileloop
A while loop is one of the most important constructs for a programming language to be Turing
complete (see objectives). The construct contains a Boolean expression and a block. While

88
the Boolean expression evaluates to true the block is executed. Before every new round of
execution the condition is checked again. The Boolean expression is the same as in an if
statement. One major functionality needed to be able to use the while loop is a “break”
keyword, when invoked it stops the execution of the loop immediately. This is often used in
conjunction with an “if” statement to check for some condition and if this holds exit the loop.

Figure K.5: A diagram of the syntax for a while loop

Code 16 While loop

1 while (a.value() <= b.value()){


2 print("In while")
3 }

K.0.5 forloop
A “for” loop is a short syntax for a while loop, every “for” loop can be expressed in a while
grammar. This will be done through creating the while AST when parsing a “for” syntax.
1 for (i = new Int(); i.value() == b.value(); i = i.add(1)){}

is equal to
1 i = new Int()
2 while (i.value() == b.value()){
3 i = i.add(1)
4 }

So in the actual interpreter the “for” keyword will not be understood. This is implemented to
provide a familiar syntax for the Java and C programmers. The grammar is therefore exactly
copied from C. With the first statement being the loop invariant which is created as a counter,
the second parameter being the Boolean evaluation that has to evaluate to true for the loop
to execute and the third parameter being the loop invariant modifier, which is executed at the
beginning of every loop invocation.

Figure K.6: A diagram of the syntax for a new for loop

The actual transition can be seen in Appendix H on page 81

K.0.6 newvar
The “new” keyword could also be named equal. In objic it will not be possible to create a new
object without assigning it to a variable name on the stack, by this anonymous classes [Bloch,

89
APPENDIX K. GRAMMAR DESCRIPTION

2001] are not possible. Further all three cases will create a new instance of the object to the
right of the “=” token. This is done so other references to this object are not modified and
as such the risk of race conditions can be reduced. In some conditions the object might do
this by itself, by returning the NEWPTR message the variable pointer on the local stack gets
updated to the new object reference (see 4.8 on page 28). This is done when adding a number
to an Int object, by calling the “add” method. The method will create a new variable with
the new value and return an updated pointer, the calling object then updates its reference to
the new object and thus contains the correct value.

Code 17 Pointer update

1 a = new Int() @ bigi.home


2 a = a.add(1)

In line 1 “a” is pointing to an instance of an Int object with the value “0” when invoking
the add method, on line 2, with the parameter 1 the original Int takes the value 0 adds the
parameter 1 and thus creates a new object with the value 1 and returns the updated pointer
which is then assigned to the variable “a”.
There are three ways of specifying the invocation location of a new object. A detailed discus-
sion can be found under the heading “Object location specification” 4.5 on page 25. One way
can be seen on line one where the programmer tells the VM to initialize the Int object on the
server that can be found under bigi.home.

Figure K.7: A diagram of the syntax for a new variable declaration

K.0.7 paramlist
Parameter transfer is always a problem in distributed systems [Orfali et al., 1995]. There is
no way in knowing that the side where the method is invoked will understand the parameter
given. Further it cannot be assumed that the implementation is identical to the local system.
This causes a lot of confusion and is a regular source of errors. In objic this problem is solved
by serialising all data into strings, by this technique the object receiving the parameter deals
with the data typing and as such multiple usages can be applied. The convention is that
differing parameters have to be separated by commas. It is possible to have 0..n parameters.
As objic is dynamically typed serialisation can be performed very easily and efficiently. It also
offloads the error checking onto the called object which increases the chances of catching a
type error as the client does not know or care about the implementation on the server side.
However this also creates some difficulties, for example that type safety cannot be checked at
compile time like in Java or C, but languages like Python use duck typing very successfully
and do not seem to have too many problems with this approach. Further a minor speed impact
is the result of having to check the type of every parameter which it acceptable for the gained

90
security and less communication overhead.

Figure K.8: A diagram of the syntax for the parameters passed in to methods

K.0.8 stat

There are numerous statements that can be included in a block, as a block is a indefinite
repetition of stats (see K.0.1 on page 87). This list is a collection of everything allowed.

Figure K.9: A diagram of what can be included in a block

K.0.9 NAME

The NAME token is the main identifier for variables and as such for pointers and method
names. A name can be constructed out of capital and lowercase letters, numbers as well as
“+”, “-” ,. The numbers and “+”, “-” are taken from the INT token which is not described here.

91
APPENDIX K. GRAMMAR DESCRIPTION

Figure K.10: A diagram of the syntax for the NAME token

K.0.10 Comments
In objic there are two types of comments:
Line comments that end at the newline and comment blocks that can encapsulate whole re-
gions of the code. The line comment can be placed anywhere and will cause the parser to
ignore anything until the end of the line, despite the length of the line. Both Unix
Linux and Windows newline
c characters are understood, which should make porting easier.
In a block comment there is a defined start tag “
” which will cause the parser to ignore anything till the close tag “*/” is seen. This represen-
tation and behaviour is a direct copy from C and Java, which should make the code easier to
understand for people that already know these languages. For optimization all the comments
are not represented in the byte code. This is based on the decision that for debugging purposes
comments might be helpful but not vital, whereas size of the byte code is an important factor
for a language especially, if it can be assumed that the binary is transferred over the network.

92
L
Man Pages

A man page or manual pages is a documentation text for a program, mostly used in the
*NIX world. The command to view such a page is the “man” command followed by the
program name. This then displays information like NAME, SYNOPSIS, DESCRIPTION and
EXAMPLES. The man page has become the de facto documentation standard in the *NIX
environment.

93
APPENDIX L. MAN PAGES

ORUN(1) GNU/LINUX ORUN(1)

NAME

orun – objic runner

SYNOPSIS

orun [-d] [-h] [-v] class params server

OPTIONS

–d Debug Mode
–h Print help and exit
–v Echo version and exit

DESCRIPTION

This program initializes the class on the server and executes the main mehtod

EXAMPLES

Example 1: ./orun fib 10


This will run the fib class and pass in the parameter 10

VERSION

This documentation describes orun version 1

SEE ALSO

globalConf.py objicc ObjServer


objic.ribalba.de site

AUTHOR

Hoffmann Geerd-Dietger
didi@ribalba.de

Mon, May 18, 2009 1 v1

94
OBJSERVER(1) LINUX OBJSERVER(1)

NAME

ObjServer – Initializes the main server

SYNOPSIS

ObjServer [-h] [-v] [-d]

OPTIONS

–h Prints out a help message and exits


–d Enables debug output also called verbose mode
–v Prints out the version number

DESCRIPTION

Starts the main server loop waiting for connections and executes the objects

VERSION

This documentation describes ObjServer version 1

SEE ALSO

orun objicc
http://objic.ribalba.de site

AUTHOR

Hoffmann Geerd-Dietger
didi@ribalba.de

Mon, May 18, 2009 1 v1

95
APPENDIX L. MAN PAGES

OBJICC(1) DARWIN – MAC OS X OBJICC(1)

NAME

objicc – the objic compiler

SYNOPSIS

objic [-d] [-h] [-v]

OPTIONS

-d Enables debug mode


-h Prints a help message
-v Prints the version of the compiler

DESCRIPTION

The compiler for the objic language

VERSION

This documentation describes objicc version 1

SEE ALSO

Objserver orun
http://objic.ribalba.de site

AUTHOR

Hoffmann Geerd-Dietger
didi@ribalba.de

Mon, May 18, 2009 1 v1

96
M
Code Example

1 class multi {
2
3 \\The multiplier method
4 def mulitply{
5 argv = ARGS.value()
6
7 argsint = new Int(argv)
8
9 retval = argsint.mul(argv)
10
11 retMeth = retval.value()
12
13 return(retMeth)
14 }
15
16
17 \\The main method
18 def main {
19
20 argint = ARGS.value()
21
22 a = new Int(argint)
23
24 returnBuffer = new String()
25
26 for (c = new Int(); c.value() < a.value(); c = c.add(1)){
27
28 cval = c.value()
29
30 val = ME.mulitply(cval)

97
APPENDIX M. CODE EXAMPLE

31
32 returnBuffer = returnBuffer.add(val)
33
34 returnBuffer = returnBuffer.add(" ")
35
36
37 }
38
39 returnBudderMeht = returnBuffer.value()
40
41 return (returnBudderMeht)
42
43 }
44 }

98
Design Diagrams
N
These are early sketches of how the object communication should work. This was mostly
followed in the implementation phase. Unfortunately the program this was created with
corrupted the file so these print outs are the last versions.

99
APPENDIX N. DESIGN DIAGRAMS

An early class diagram of the server, this was modified so that each VM has its own connection:

100

Das könnte Ihnen auch gefallen