Sie sind auf Seite 1von 7

CSCE-604 Project Report

TSXML : Type Safe XML Generator for Scala


Pawan Kumar Singh, Prannay Jain, Saptaparni Kumar
Texas A&M University
December 9, 2014
Abstract
XML is the standard format for data exchange between inter-enterprise applications
on the Internet. To facilitate data exchange, industry groups define public document
type definitions(DTDs) or XML Schema (XSD) that specify the format of the XML
data to be exchanged between their applications. In this work, we address the problem
of automating the process of XML generation in Scala with type-safety, as opposed
to the existing library functionality that provide absolutely no type-checking during
XML generation. We named the libraray as TSXML: Type Safe XML Generator,
a generic, dynamic, and efficient library for type safe XML generation in Scala. It
uses predefined DTD/XSD and xml tag elements or XML creation. We develop a
hierarchical tree structure to represent the XML object. Applications express the data
they need as an a set of objects and values. TSXML utilizes the underlying nesting
checking mechanism provided by Scala for the generation.

Intoduction

The Extensible Markup Language(XML) is a language that specifies a set of rules to encode
documents in a human-readable and machine-readable format. It is defined by free open standards.XMLs design goals emphasize simplicity, generality and usability across the Internet.
Although the XML design focuses on documents, it is widely used for the representation of
arbitrary data structures(ADTs) especially the ones used in web services.
XML can serve many purposes: as a more expressive mark-up language than HTML;
as an object-serialization format for distributed object applications; or as a data exchange
format. In this work, we focus on generation of type-safe XML documents from persistent
data that is sent over a network to an application. Numerous industry groups, including
health care and telecommunications, are working on document type definitions (DTDs) and
XML Schema(XSD) that specify the format of the XML data to be exchanged between their
applications.
The existing XML generation mechanism in most modern programming languages is
based on string literals and type unsafe. We tried to rule out the idea of treating XML tags as
string literals and implement an Object Oriented Model of genrating XML document.Thus,
it makes TSXML an efficient and generic library for type safe XML generation in Scala.
1

Motivation

The Web has so far been incredibly successful at delivering information to human users.
XML and its various extensions (data-models, query languages) are a primary step in this
direction. Unfortunately, the Web is not yet a well organized repository of nicely structured
XML documents but rather a conglomerate of volatile HTML pages, for which structure has
to be extracted. There is still a lot of data present and data migration from legacy systems
is a primary task being taken up by many enterprises.
Most modern programming languages use XML as the standard markup language to
store documents. They treat XML commands and tags as string literals and almost no type
checking is done before its generation. These languages might provide XML syntax checking
libraries but do not provide type checking. Type systems built directly into compilers cannot
easily extend to keep track of run-time invariants abstractions. This work presents library
techniques for extending the type system of Scala to support domain-specific abstractions.
For type safety, one has to depend on external APIs or parsers. Native language support
for type safety in these languages. The XML is created by the language and is passed on to
an external XML parser for parsing. As a result, if the generated XML has an error it has
to be regenerated, and re-parsed as long as a correct XML is not created. We feel that this
method is time consuming and there should be mechanism of type-checking at the time of
XML generation itself, provided in the native language.

A previous paper [1] addressed this problem for C++. We based our work on their idea
2

and using the Object Relation Mapping model, we create objects of XML instead of string
literals.
There are two important reasons Scala was chosen to built this solution. Firstly the
interoperability of Scala with Java is extremely robust. Scala can use java libraries and in
return built libraries which can be imported in Java. The below graph depicts the popularity of languages of programming languages on GitHub and Stackoverflow. On the top on
popularity graph Java is present and among functional languages Scala is top. By building
a solution in Scala, we can leverage the Scala users together with Java users.

XML Validation

XML validation is the process of checking whether a document written in XML confirms that
it is both well-formed and also valid. A well-formed document follows the basic syntactic
rules of XML, which are the same for all XML documents. A valid document also respects
the rules dictated by a particular DTD or XML schema, according to the application-specific
choices for those particular. In addition, extended tools are available such as OASIS CAM
standard specification that provide contextual validation of content and structure that is
more flexible than basic schema validations. There are tools that do this, some examples are
xmllint in LINUX commmand prompt and XLint in java.

3.1

Document Type Definition

A document type definition (DTD) is a set of markup declarations that define a document
type for XML. It defines the legal building blocks of an XML document. It defines the
document structure with a list of legal elements and attributes. DTDs persist in applications
that need special publishing characters.
Parsers for these : XML parser, example XLint

3.2

XML Schema Definition

An XML Schema describes the structure of an XML document. It can be used by developers
to verify each piece of item content in an XML document. Unlike most other schema languages, XSD was designed with the intent that determination of a documents validity would
produce a collection of information adhering to specific data types. Such a post-validation
infoset is very useful in the development of XML document processing software.

Document Model

Document modeling deals with the structures and patterns of the written work and breaks
it into branches and labels them.

4.1

Semi Structured Database

XML is a Semi structured database, i.e it is a form of structured data which does not conform
with the formal structure of data models associated with relational databases or other forms
of data tables. However, it contains tags or other markers to separate semantic elements
and enforce hierarchies of records and fields within the data and is thus called self-describing
structure. In the semi-structured data, the entities belonging to the same class may have
different attributes even though they are grouped together, and the attributes order is not
important. Semi-structured data is increasingly occurring, where full-text documents and
databases are not the only forms of data any more and different applications need a medium
for exchanging information.

4.2

XML Generation based on ORM (Object Relation Mapping)

Object-relational mapping converts data between incompatible type systems in object-oriented


programming languages creating virtual object database that can be used from within the
programming language. In object-oriented programming, data management tasks act on
object-oriented objects. We use this ORM model to represent XML tags as nested objects
to create a well-formed type safe XML file.

Algorithm 1: TSXML: Type-Safe XML generation for Scala


1 Input:
2
Objects of type TSXMLFile, TSRoot, TSNode as a tree structure.
3
XSD or DTD
4 Output:
5
A type-safe, well formed, valid XML.
6 Derived object:
7
Parent Set: contains tag names of all tages above the current level.
8 Part 1: Check Nesting
9 Perform Level order traversal
10
For each node check if it is present in its Parent Set
11
If true
12
throw an error
13
Else
14
Add all tags in this level in Parent Set.
15 Continue to Part 2
16 Part 2: XML Validation
17
Use a stack to verify object types based on XSD or DTD.
18 Part 3: Create XML tags
19
Using liftElement, create XML string literals in the form of valid XML tags for each
node in the Input tree structure
20
Using string Interpolation feature of Scala embed nested XML string literal in the
parent tag element
21
Use macros to generate correct indentation based on nesting level
22 Part 4: XML File Generation
23
Using the nesting and indentation information from Part 3, generate the XML file
with Scalas native XML generator.

TSXML: The Library

TSXML changes the traditional way in which XML is created. It is a new scheme to generate XML, namely using the ORM model. The user provides us with objects to be put
in the XML, and the library takes care of everything else from nesting, type-checking, wellformedness and indentation. This relieves the user from the unnecessary hassle of generating
XML via strings. If data entered by the user has an error, he is infored at compile time,
rather than using a parser an re-creating the XML.

5.1

Classes exposed to developers

TSXMLFile: An object of this class is responsible for handling creating or appending


an XML file on disk.
TSRoot: For each object of TSXMLFile type, only one TSroot can be created. It
maintains the first tag of the XML file and provides methods to add subsequent tags
inside it. It utilises the concept of String interpolation to accomplish this task.
TSNode: An object of this class represents an XML element tag. It contains method
to add attribute, text and child tags. The tag creation method makes sure that the
child element is not of parent type and throws a subsequent custom error as well.

5.2

Behind the Scenes

There are primarily six classes which implements the algorithm in the previous section. These
classes are customXmlTypeSafe, Lift, Liftable, ImplementLiftable, XMLNodes, MacroImplementLiftable. customXmlTypeSafe is the main class which uses the other background classes.
Lift class is used to generate the tree structure of the user classes. Liftable class performs the
level order traversal of the tree. ImplementLiftable reads the DTD/XSD and performs the
type checking. Once the type safe criteria passes successfully then class XMLNodes creates
XML tags and attributes. Lastly, MacroImplementLiftable uses string interpolation to nest
XML tags together with appropriate indentation.

Conclusion

The TSXML library redefines the way XML geceration is viewed by programming languages.
It brings in an object oriented approach using the Object relation Mapping model for XML
6

generation. The main idea is, if an XML is created it better be perfect, else dont create it
at all.

Future Work

The current mechanism of error invocation on XML violation detection is weak. Specifically,
in the case of high level of tag nesting (>15 levels) detection of error becomes difficult.
The performing logic need to be modularized.
For reading an XSD/DTD file, we are using an external library. We are planning to write
our own XSD/DTD reader.

References
rvi, Extending type systems in a library: Type-safe
[1] Yuriy Solodkyy and Jaakko Ja
XML processing in C++, Journal of Science of Computer Programming. Vol. 76, 2011,
290-306
[2] Mary Fernandez, Wang-Chiew Tan and Dan Suciu, SilkRoute: Trading between
Relations and XML, International World Wide Web Conference (WWW) (2000)