Sie sind auf Seite 1von 41

Parsing with Transformers

Olivier Gournet, Alexandre Borghi and Nicolas Pierron

Technical Report no 0512, March 2006


revision 920

To disambiguate C++, which is an essential step toward the Transformers project, a new approach us-
ing attribute grammar has been adopted. An attribute grammar system, based on Stratego/XT, has been
developed from scratch to meet our needs. However, it still can be greatly improved, particularly our
attribute syntax.

Attribute grammars and its Transformers counterpart implementation, sdf-attribute will be de-
picted and (hopefully) demystified. Although this paper is mainly focused on attribute grammar, prepro-
cessing is not forgiven.

Keywords
Transformers, C++, program transformation, parsing, disambiguation, attribute grammar

Laboratoire de Recherche et Développement de l’Epita


14-16, rue Voltaire – F-94276 Le Kremlin-Bicêtre cedex – France
Tél. +33 1 53 14 59 47 – Fax. +33 1 53 14 59 22
olivier.gournet@lrde.epita.fr – http://transformers.lrde.epita.fr/
2

Copying this document


Copyright c 2006 LRDE.
Permission is granted to copy, distribute and/or modify this document under the terms of
the GNU Free Documentation License, Version 1.2 or any later version published by the Free
Software Foundation; with the Invariant Sections being just “Copying this document”, no Front-
Cover Texts, and no Back-Cover Texts.
A copy of the license is provided in the file COPYING.DOC.
Contents

Introduction 6

1 Preprocessor headache 8
1.1 How things work today . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Transformers’ AG implementation 11
2.1 Evaluator generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.2 attrsdf2table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.3 str-lazy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.4 str-ref . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.5 attr-defs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.1.6 Attribute propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.7 Checker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.8 Compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3 Transformers’ AG Usage 22
3.1 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1.1 Basic Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1.2 Inherited Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1.3 Attribute Declaration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2 Automatic rules propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2.1 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2.2 Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4 Transformers’ AG syntax proposal 28


4.1 Default value/code for horizontal propagation . . . . . . . . . . . . . . . . . . . . 28
4.2 Vertical propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.3 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.4 Ambiguous synthesizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.5 The woes of this AG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5 Others AG systems 32
5.1 UUAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.1.1 Presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.1.2 Writing AG in Utrecht University Attribute Grammar System (UU-AG) . 32
5.1.3 Incorporate it in Transformers . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.1.4 A comparative example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.2 FNC-2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2.1 Evaluators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.2.2 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.2.3 OLGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.2.4 Incorporate it in Transformers . . . . . . . . . . . . . . . . . . . . . . . . . 38

Conclusion 39
CONTENTS 4

6 Bibliography 40
List of Figures

2 Namespace definition ambiguity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.1 Preprocessing phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1 Parse table generation, for c-grammar. . . . . . . . . . . . . . . . . . . . . . . . . . 12


2.2 Parse table generation, attrsdf2table details. . . . . . . . . . . . . . . . . . . . . . . 13

3.8 Left hand attribute traversal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26


3.9 A list, seen from AsFix viewpoint. . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Introduction

In almost every introduction papers about Transformers, people can learn that this project at-
tempts to be a program transformation framework (Anisko et al., 2003), from source code to
source code. It is composed of a parser for several languages (with ISO/C++ as our main
target), and tools to work on grammars to ease transformations. It heavily uses the generic
rewriting Stratego/XT framework, which offers a solid foundation to the project.

However, in spite of our earlier optimism, Transformers is not able to parse C++ yet. We are
still working on the first stage of the parsing process, namely the disambiguation process1 .

Attribute grammar were described a long time ago by Knuth (Knuth, 1968). They represent
a way to attach semantic to an AST. Given a free-context grammar, at each of its non-terminal
rules, two sets of symbols are attached, called attributes. Attributes can be “synthesized”, ie.
defined solely in terms of attributes of the descendants of the corresponding non-terminal sym-
bol, while other attributes are “inherited”, ie. defined in terms of attributes of the ancestors of
the nonterminal symbol.

This paper is mainly about attribute grammars. Firstly, we will describe how to use the dedi-
cated Attribute Grammar (AG) system2 of Transformers, then its internal working.

Ambiguity as leitmotif

Our real goal is the capability to use C/C++ as a source and a target language. We already have
C and C++ context-free grammar, inspired from the ISO standard (respectively ISO/IEC (1999)
and ISO/IEC (2003)), but unfortunately their syntaxes are highly ambiguous. For instance, for
the namespace definition as stated in the standard, the listing 1 gives two derivation trees. It
comes from the fact that the standard makes distinction between a namespace seen for the first
time, and an extension to an already defined namespace. However, nothing can distinguish
them syntactically. The resulting tree can be seen in the figure 2.

NamedNSDefinition −> NSDefinition

OriginalNS −> NamedNSDefinition


ExtensionNS −> NamedNSDefinition

"namespace" Identifier "{" NSBody "}" −> OriginalNS


"namespace" OriginalNS "{" NSBody "}" −> ExtensionNS
Listing 1: C++ namespace ambiguity

1 Some suggested the temporarily renaming of Transformers to Parsers.


2 If somebody finds a good name for it, mail us!
7 LIST OF FIGURES

NSDefinition

NamedNSDefinition NamedNSDefinition

OriginalNS

ExtensionNS

"namespace" Identifier "{" NSBody "}"

"namespace" OriginalNS "{" NSBody "}"

Identifier

Figure 2: Namespace definition ambiguity.

Several methods to disambiguate these languages (Durlin, 2007, 2008) have been tried. Firstly,
there was ASF+SDF, which gave bad results. It was too much declarative and did not have the
expressive power we wanted. Then, came Stratego/XT, which had a different apprehension of
the problem. Here, transformations are done using the paradigm of traversals and rewriting
rules. We started using it to disambiguate C++, but afterwards it revealed itself too slow, not
extensible enough, and not maintainable enough. But it was a long time ago and the tools we
were using have greatly evolved since then. Finally, a compromise between the two has been
found, raising attribute grammars on top of our priorities.
Chapter 1

Preprocessor headache

A quote from the Boost Preprocessor Library (?) website:

Problem #3

"I’d like to see Cpp abolished." - Bjarne Stroustrup

Solution

The C/C++ preprocessor will be here for a long time.

In practice, preprocessor metaprogramming is far simpler and more


portable than template metaprogramming [Czarnecki].

The use of a preprocessor is a tricky legacy of the C language, allowing programmers to


use macros, sometimes in a weird manner. It makes parsing and unparsing more difficult for
Transformers by introducing holes in the grammar, as generic tools are used. But as it is still in
usage today, it must be managed in a good way, otherwise Transformers would be unusable.

The following requirement should be seen as a basis for Transformers capabilities to handle
preprocessor directives. However, some of them should be seen as optional, depending on
Transformers fate.

• Handle control lines (PP directives). Of course, this is the main objective.
• Pretty-printing. Some efforts must be made to rewrite text as close as the original.
• Keep file structure. With the #include directive, a pretty huge amount of files can be
parsed for a single compilation unit. It would be nice to respect the file structure when
pretty-printing.
• Do substitutions for all possible branches.
• Optimization. Little experimentation with Transformers shows that it is already slow
enough, there is no need to add another launch and take coffee process into the pipeline.
• Error handling. As another front end process is added into the pipeline, errors rising
further must be correctly translated and reported to the user (for example: locations).

Things have not greatly evolved since the last attempt to manage it. Anyway, a quick sum-
mary of the whole picture will be given, so one could easily take back the subject and do things
9 Preprocessor headache

better. Firstly, the current implementation will be looked over (more details can be found in
Gournet (2004)), then ideas collected over the time will be presented.

1.1 How things work today

Preprocessed Perl script


Source File cpp
source Split

AsFix parse
Preprocessed Preprocessed
Preprocessed
Preprocessed
tree chunks sglr Preprocessed
source chunks
Preprocessed
source Preprocessed
source
source source
source source

AsFix parse
asfix-concat
Forsest

Figure 1.1: Preprocessing phase

Currently, the system is able to parse and expand any PP token, thanks to SGLR. However,
the current system is not able to unparse these expanded macros, nor can it recreate the file
hierarchy. The system pipeline is presented in figure 1.1. First, a source file is given to cpp, with
the necessary flags. These flags do the following: set cpp the input language source, keep com-
ments, and remove tokens specifics to the GNU extension (tokens that are not in the standard).
It produces source code representing the entire compilation unit.

Then, a tiny hand written perl script is applied, to split this monolithic output into small
chunks. After that, SGLR processes these chunks one after one. The reason it cannot parse cpp
output at once is because SGLR stops if it encounter too many ambiguities, as discussed below.
At this stage, we have a bunch of small parse forests, that we have to concatenate together.
parse-c takes care of this, with a little complication. We must care about rule dependency,
because as soon as two parse forests are concatenated together, some evaluation rules must be
added to link the two chunks. This is actually the subject of a deeper study, namely partial
evaluation, but right now it is solved in a easy way. “Cuts” (between two preprocessed source
chunks) are always done on the same grammar symbol, so the glue is extracted from a simple
snippet of code (see listing 1.2), then pasted between our two parse forests to keep the whole
thing consistent.
int i ;
int i ;
Listing 1.2: Sample code to retrieve the evaluation rules needed to stick AsFix. These two lines
are specifically provided to retrieve evaluation rules that lay between them. Subsequently, these
rules will be pasted to stick again the cut parse forest.

Splitting output and then reassemble it was the solution originally kept, but there is another
one. This one is to patch SGLR to let it accepts more that 10 000 ambiguities (500 000 should
be a good choice). That was the chosen solution for the biggest benchmarks. One of them has
about 60 000 ambiguities in one chunk. Actually, none of these two methods is perfect, so they
should be used jointly.

At this stage, we have one big AsFix parse forest, ready for disambiguation and transfor-
mation. The stage after transformation, preprocessing symbols refactoring, is scrupulously
skipped, for the reason that nothing is implemented.
1.2 Directions 10

1.2 Directions

This section is a placeholder for directions that have been thought over the time. They may or
may not be used to carry out this problem.

• Improve our current technique, by patching cpp to be able to store more informations.
Then, write a program (any language should be good) to do the pretty printing.
• Build from scratch yet another C preprocessor, and handle macros ourself. This is a fall
back to the first technique, in the case where cpp reveals to be too much difficult to patch
and maintain.
• Use SGLR with the so called Island Grammar (Largillier, 2005), to fetch only tokens we
are interested in. The standard defines a grammar for preprocessor directives, it could
be derived from it. Then, use a Stratego filter to do the same job as cpp, pretty print
it in one source file. Eventually, it could be reparsed with our regular C/C++ grammar
and managed as if there was no PP directives. Thomas Largillier is currently working in
this way to preprocess and unpreprocess files using a grammar based on the one used by
ASF+SDF. More details about his work can be found in ?.
Chapter 2

Transformers’ AG implementation

The evaluator is under development, and the design, like the implementation, is not yet frozen.
This means that parts of this chapter may or may not reflect the reality, and might become
obsolete soon.

Two parts are distinctly separated. The first concerns the compilation of our AG system (the
compilation), and the second, the current attribute computation (the execution).

2.1 Evaluator generation

2.1.1 Overview

The Syntax Definition Formalism (SDF) modules are packed according to AttrSdf, which ex-
tends SDF with our attribute syntax. This is performed by esdf, which is best described in the
paper Demaille et al. (2005). Then, a parse table and an evaluator are generated by the whole
attrsdf2table chain, composed of several steps. Besides the parse table and the evaluator, a pretty
print table is generated, as well as a signature file that will be used in transformation filters. Fig-
ure 2.1 gives an overview of all these steps. Most of the tools used in this pipeline are standard
Stratego/XT tools, so they won’t be explained in details.

One tool that need more consideration is attrsdf2table, and will be thoroughly explained be-
low.
2.1 Evaluator generation 12

SDF Modules

cat

pack-esdf

delete-desugared-cons

sdf-cons

boxed-desugar

Extra Stratego
pp-attrsdf Sources

Extended SDF Definition

parse-attrsdf-defintion attrsdf2table boxed2pp-table

sdf-strip Stratego pp-pp-table


Evaluator
Source
pp-sdf
parse-pp-table

sdf2rtg strc

rtg2sig gcc

SDF Stratego Pretty-print


Evaluator Parse Table
Signature Table

Figure 2.1: Parse table generation, for c-grammar.

2.1.2 attrsdf2table

The helper program, attrsdf2table, is composed of several filters (see figure 2.2):

• parse-attrsdf-definition: Produces an AsFix corresponding to our grammar. It takes the


.edef file produced by esdf as input.

• attrs-desugar-ns: For attribute that aren’t explicitly embodied in a namespace, set it in the
current namespace (as explained in section 3.1.1).
• embed-attributes: There are two problems with our attribute section when translating to
a parse table:
13 Transformers’ AG implementation

SDF Extended
Definition

parse-attrsdf-definition

attrs-desugar-ns

embed-attributes

sdf-strip

pp-attrsdf Extra Stratego


Sources

sdf2table
from.tbl

deembed-attributes
deembed.tbl

sdf-labelize
labelized.tbl

attr-defs

attrc
desugared.tbl

clean-tbl-attributes pp-stratego

Evaluator
Parse Table
Stratego Source

Figure 2.2: Parse table generation, attrsdf2table details.

– The original SDF syntax accepts ATerms into annotation lists, but not symbols that
are capitalized as they are seen as SDF non-terminals. To avoid this problem, a filter
prepends a lower case letter to all constructors for the parse table generation.
– Non-terminal aliases (labels) in production rules are not kept, but they are referenced
as attribute names, and used during the evaluation. Thus, to ensure the translation
between sorts and label names, an alias table is added to the annotations.
• sdf-strip: This is a tool coming from the Stratego/XT distribution, doing some pre-processing
needed by sdf2table.

• pp-attrsdf: Transform the input AsFix into text. This is unfortunately needed as sdf2table
cannot take an ATerm as input.
• sdf2table: Translate our grammar to a parse table. Starting from this point, work will be
directly done on the parse table.

• deembed-attributes: Clean the gory hacks introduced by embed-attributes.


2.1 Evaluator generation 14

• sdf-labelize: During the parse table creation, some rules may be added (refer to 3.10 and
3.11), with a great probability that the same non-terminal will appear twice in a produc-
tion. Moreover, as it was said previously, sdf2table does not care about labels. To be able
to distinguish these symbols, labels are created. In a more general manner, it generates a
label for all non-terminals in each production, to ease further filters and evaluation.
• attr-defs: With a single traversal, this filter provides the checker (as explained in section
2.1.7) and attributes propagation.

• clean-tbl-attributes: At this stage, the parse table is made heavy by annotation rules. But
these annotations are not required anymore, they can be safely removed. If they were kept
in the parse table, sources processed by SGLR with this parse table would also contain
these annotation rules, and would enlarge the output AsFix 1 . Labels are also removed, as
they will not be useful anymore.

• attrc: The attribute grammar evaluator is generated from this filter. The previous version
of the evaluator (used in Transformers 0.3) was based on the data-flow paradigm and
was not generated. Everything was dynamic and used only local information. It was
very slow and thus an evaluator generator has been created from scratch by Valentin
David in mid-2005. The current version (used in Transformers 0.4) is generated from
information contained in the grammar, so it can be much more efficient. The design of the
generated evaluator relies on lazy evaluation to resolve attributes easily in the right order.
Thus, attributes are computed on demand and a dependency graph is maintained while
evaluating.

2.1.3 str-lazy

As said previously, attribute evaluation relies on laziness to ensure the correctness of the eval-
uation flow. But Stratego has no lazy evaluation, so a special tool has been designed to suit our
needs.

str-lazy simulates laziness by maintaining a dependency graph. When an attribute must


be computed, its direct dependencies are added to the graph. These added attributes add theirs
dependencies, and so on. When all attribute dependencies are in the graph, the resolution is
performed and as one goes along, the dependency graph is updated. After the initial attribute
is computed, the evaluation continues with the next attribute which needs resolution. The
implementation of the graph is a matter of 180 lines of C (for performance considerations) and
the interface with Stratego is about 20 lines.

The main difference between the laziness and what is implemented in str-lazy is that it
is possible to define multiple ways to compute a value. This difference is the main point that
make the disambiguation with AG transparent to the user. To set the value of an attribute it
is required that all valid values computed during the evaluation are identical. Otherwise the
attribute cannot be computed and the value is invalid.

The interface of str-lazy gives three functions which are new-value, getv and formula.

• new-value creates a pointer to a structure that contains an empty list of strategies and
an ATerm which references nothing. Values are stored with an ATerm blob to ensure that
it could be garbage collected.
• formula registers a formula in the list of method to compute a value. This formula are
register with the input term, which is used to pass attribute dependencies. In the C code
the strategy is transferred with a static closure which from which the strategy pointer and
the static link are extracted.
1 Which is already too fat.
15 Transformers’ AG implementation

• getv evaluates all formulae registered previously and fold all values with the equal op-
eration to check if all valid values are identical.

lazy−add = (getv, getv); add

main =
new−value => arg1
; new−value => arg2
; new−value => result
; formula( lazy−add; debug(!"result = ") | result , (arg1, arg2) )
; formula( debug(!"arg2 = ") | arg2, 2 )
; formula( debug(!"arg1 = ") | arg1, 1 )
; <debug(!"begin = ")> 0
; <getv> result => 3
Listing 2.3: str-lazy: lazy sum

The listings 2.3 is an example of laziness. In this example 3 values are created with the
new-value strategy. Formulae are declare by specifying the strategy that would be used to
compute its value followed by the term that contains the value to define and the input term of
the strategy to evaluate. The example at Line 8 and 9 will just print there input term and set
it to their value. The strategy used to compute the result at Line 7 use another strategy named
lazy-add that takes the input term (arg1, arg2) and evaluate each of the member of the
couple with getv (Line 1) before summing them. The result start to be evaluated at the Line 11.
This program output is the sequence from 0 to 3.

Due to the delayed evaluation, strategies registered with a formula cannot use symbols from
the strategy that have declared them except if they are evaluated in the same scope. The reason
is that the stack has been removed or replace since the strategy has been registered. Therefore
all references to a symbol not declared in the scope of the formula are invalid.

2.1.4 str-ref

This library2 provides references to manipulate graph. It is inspired by the implementation of


Kalleberg and Visser (2006) that used the meta-stratego compiler to add some sugar over the
notation. This library provide references without backtracking possibilities which is not safe
but could be more efficient than the implementation in Stratego.

str-ref provides four C functions to create, destroy, bind and dereference the pointer. This
functions use an integer stored in a Ref constructor in order to not access an invalid memory.

The Stratego library provides strategies based on the C functions that manipulate the con-
tent of a reference like apply-on-ref and where-on-ref and bind it after the rewriting
to the reference in the case of apply-on-ref. Other common strategies are provided like
create-bind-ref that create and bind the current term to the new created reference and
deref-destroy-ref that move the content of the reference to the current term and delete the
reference.

attr-defs use this library for two purposes. The first use of was done to manipulate a
queue with constant time operations. The second use of was done to manipulate the AG graph.
2 In french “library” is translated to “bibliothèque”.
2.1 Evaluator generation 16

2.1.5 attr-defs

This program is responsible of checking the correctness of the grammar. Said differently, it
checks whether all attributes used are correctly defined everywhere. The checker could be dis-
abled, but then, inconsistent evaluation rules will be given to attrc, which will in turn produce
a flawed evaluator. Needless to say that it is not advised. This section talks about its internal
mechanisms.

attr-defs was originally in Stratego3 and then written as second time in C++. The reasons
why it was in C++ are that, according to its author Olivier Gournet, he is not fluent enough
in Stratego and for speed. This new filter is about 200 times faster than the original one was
(although Olivier Gournet never considered this as a good reason because the old checker was
badly written and relied sorely on linear searches in big lists). On the other hand, some parts
go very, very ugly in C++.

The last version of this tool is again written in Stratego with the help of str-ref to manip-
ulate graphs quickly. The concept is to represent each production, symbol, attribute rules and
attributes as a node of a graph with relations between each nodes (Pierron, 2007). This version
uses the parse table as a table of pointer that could be referenced by other nodes which improve
the speed to go through the grammar. There is not much difference of speed between the C++
version and the Stratego version but this two systems cannot be compared since we do not
know precisely what algorithm was implemented in C++.

Attribute fetching makes heavy use of the ATerm library, and especially the ATmatch func-
tion. One thing to note is flexibility. Using the ATerm library as it were done in C++ is less
powerful than using Stratego pattern matching and all other goodies that this language offers.
But, in this case we have to handle only a parse table. The parse table AsFix is very simple and
has a very particular form, as shown here:

parse-table(
4
, 0
, [ label(prod(...))
, label(prod(...))
, label(prod(...))
]
, states([...])
, priorities([...])
)

Here, [] denotes a list, and ... something that was clipped. As a reminder, run
pp-aterm < C.tbl | less to see the content of the parse table, and run attrsdf2table
with -k 2 to keep intermediate tables during the building. Remember that the final parse table
doesn’t contain evaluation rules, so don’t search for them. We only care about the list of labels,
which contains all the production rules. Others parts are not modified.

In the parse table, each prod constructor corresponds to a SDF production rule. Here is an
example of a SDF production, followed by its corresponding prod constructor in the parse table
where some attributes have been propagated.

"sizeof" UnaryExpression -> UnaryExpression

prod(
[lit("sizeof"), cf(opt(layout)), cf(sort("UnaryExpression"))]
, cf(sort("UnaryExpression"))
3 You can find it in sdf-attribute/src/desugar/attr-desugar-magic.str.
17 Transformers’ AG implementation

, attrs(
[ term(cons("sizeof"))
, term(labels([label(3, "rule_3")]))
, term(
attribute-rule(
None
, attribute(
Sym("rule_3"),
Some(namespace-head(namespace-name("decl"))),
"lr_table_in"
)
, Build(
attribute(
root,
Some(namespace-head(namespace-name("decl"))),
"lr_table_in"
)
)
)
)
, term(
attribute-rule(
None
, attribute(
root,
Some(namespace-head(namespace-name("decl"))),
"lr_table_syn"
)
, Build(
attribute(
Sym("rule_3"),
Some(namespace-head(namespace-name("decl"))),
"lr_table_syn"
)
)
)
)
]
)
)

attr-defs makes heavy use of labels. Note that the constructor labels is a list of pairs;
the first element is the index of the element, and the second a string to which we can later refer,
to make it independent of production symbols order. Only context-free sections are listed. The
work of adding labels is carried out by sdf-labelize, because sdf2table doesn’t do that.

To solved the problem of big list manipulation, str-ref is used to replace all productions
in the parse table by references. A traversal register all symbols and if they are used or pro-
duced and by which productions. This conversion step is reversible and can be used on any
parse table with or without attributes in it. A strategy named apply-on-graph4 is used to
manipulate the parse table as a graph in Stratego and restore the parse table in its original state.
The same idea is applied to manipulate attributes and attribute rules; all attribute rules are re-
placed by references and all attributes are linked to their attribute rules. This second graph is
linked to the grammar to be able to go through the entire attribute grammar. A strategy named
apply-on-attr-graph is used to manipulate an attribute grammar as a graph instead of
lists.

attr-defs contains the code of the attribute propagation and the cycle detection where
both use the graph to manipulate the attribute grammar. The attribute propagation and cy-
4 The Stratego module to manipulate parse table as a graph in Stratego can be fetch at sdf-attribute/src/check/table-

graph.str
2.1 Evaluator generation 18

cle detections are respectively described in the subsection 2.1.6 and 2.1.7. It also provides the
--statistics options to check the number of generated attribute rules in the AG.

2.1.6 Attribute propagation

This process is executed before the checker, thus errors in propagation could be detected. The
propagation occurs in two fix-point phases(Demaille et al., 2008; Pierron, 2007):

• The first step propagates flags. Theses flags are used to have a choice which is local to a
grammar production. They are used to show if an attribute could be synthesized through
a symbol. Therefore this flags are only propagated for Bottom-Up and Left-to-Right at-
tributes.
• The second step propagates attributes where they are needed in the grammar to obtain
the shortest path of attributes. All copy rules are added for Top-Down and Left-to-Right
propagations. All concatenations rules that concatenates the result of their children are
added for Bottom-Up but only for list manipulation for the moment.

The fix-point is implemented with a queue realised with str-ref. It is use to propagate
flags and attributes and to check for circularity. The fix-point allows us to go on all productions
related to the previous modifications.

Each propagation has to satisfy a set of equations that ensure that an attribute is declared
before it used and that describe the path follow by the attribute on a production depending on
the presence of the corresponding flags. The equations ensure that the AG is well-defined. The
current attribute propagator can be extended to add specific propagations.

The propagation, which adds a lot of attributes in the grammar, is not a run-time problem
since copy rules are aliased in the generated evaluator.

The attribute propagation can be disabled if you run attrsdf2table with the option --no-proba.

2.1.7 Checker

The checker does some verification about the grammar. It ensure that there is no circularity in
the grammar by performing a transitive closure of the dependencies between attributes. This
transitive closure is realized on all productions and use a fix-point to propagate modifications
between productions. This algorithm is a conservative algorithm as the test prove it in the
sdf-attribute test suite. It is inspired by the article of (Rodeh and Sagiv, 1999, section 5).

Other kinds of checks are performed like producing warnings when this user write something
that could be generated by the attribute propagator. To do this, when the propagator is trying
to add a production, it will look if it already exists and report to the user if they have similar
code.

The checker part could be disable when running the tool named attrsdf2table with the
option --no-cycle.

It no longer checks if the attribute grammar is well-defined compared to the version imple-
mented in C++ because the attribute propagation is used to do that for each attribute in the
grammar.

The transitive closure done on each production by the checker could probably be reused
to automatically add where clause that add dependencies between attributes. This is a point
19 Transformers’ AG implementation

totally useless in other AG because they are not bother by ambiguities. This point could also
be useless in the case of Transformers if we enable the possibility to synthesized more than one
value through an ambiguity.

2.1.8 Compiler

The compiler named attrc compiles a parse tree with attribute rules to a Stratego AST code.
It can be generated for either parse tree or Abstract Syntax Tree (AST) by specifying the option
--ast. At the moment the code is a bit messy and it should be organized in modules to separate
the goal of each steps. The best things should be to re-use the code done to manipulate the parse
table as a graph.
1 E −> S {cons("StartSymbol"),
2 attributes (process:
3 E.lr_value_inh := !0
4 root . trash := !E.lr_value_syn
5 )}
6
7 Star −> E {cons("Expr"),
8 attributes (process:
9 local . tmp := !root .lr_value_inh
10 root . lr_value_syn := <add> (local.tmp, 42)
11 )}
12
13 "∗" −> Star {cons("Star" ),
14 attributes (process:
15 local . input := !root .lr_value_inh
16 )}
Listing 2.4: Small compilable AG

The listing 2.4 is an example which will support the explanation and help to understand the
relation between the generated evaluator and the AG. The syntax is describe in the section 3.1.

The generated file is separated in four parts:

• In the first part the imports and constructors required by the evaluator are added. The
list of imports can be extended with the option --imp of attrsdf2table. This option
enable the user to add extra imports that contains his strategies that are used in his AG.

• In the second part the list of grammar productions that create the binding traversal. The
binding traversal is a top-down traversal which annotate each production of the grammar
with the attribute.
1 compute−E5253sort45cf(|
2 p_attrib−root−process−lr_value_syn
3 , p_attrib−root−process−lr_value_inh) =
4 ( (?underamb(<id>) < ?is−amb + id)
5 ; ( appl(
6 prod(?[cf( sort ("Star" ))], ?cf ( sort ("E" )), id )
7 , id
8 )
9 ; where(! "" => amb−name)
10 )
11 ; where(
12 ![ syn(p_attrib−root−process−lr_value_syn)
13 , inh(p_attrib−root−process−lr_value_inh)]
14 ; if !is−amb then duplicate−attributes else copy−attributes end
15 ; ?[attrib−root−process−lr_value_syn
16 , attrib−root−process−lr_value_inh]
2.1 Evaluator generation 20

17 )
18 )
19 ; where(
20 !attrib−root−process−lr_value_inh
21 ; ?attrib−rule_1−process−lr_value_inh@attrib−local−process−tmp@_)
22 ; add−attributes(
23 [ amb−compute(
24 compute−Star5253sort45cf(|attrib−rule_1−process−lr_value_inh)
25 | amb−name
26 )
27 ]
28 | [ (( "process", "lr_value_inh" ), attrib−root−process−lr_value_inh)
29 , (( "process", "lr_value_syn"), attrib−root−process−lr_value_syn)
30 , (( "process∗", "tmp"), attrib−local−process−tmp)
31 ]
32 ) => tree
33 ; where(
34 formula(
35 b_0
36 | attrib−root−process−lr_value_syn
37 , (amb−name, tree, [attrib−local−process−tmp])
38 )
39 )
Listing 2.5: Small compiled AG

The listing 2.5 is a Stratego code extracted from the generated evaluator corresponding to
the production Star -> E of the listing 2.4.
Strategies are named with the symbol name of the grammar. (here, compute-E5253sort45cf
at l.1) This implies that two productions that produce the same symbol have the same
strategy name. Generating many strategies with the same name use the extending def-
inition property of Stratego. These are generated to add the attributes and the attribute
rules.
Attribute rules are added to the tree in each strategy corresponding to a production of
the grammar. The correspondence is made by matching the production in the AsFix for-
mat. (here, at l.6) Attributes are declared with the new-value strategy of the library
str-lazy and attributes that are only copied are aliased. (here, at l.21) The attribute
root.lr_value_syn located in listings 2.4 at the Line 10 cannot be aliased. The value
of this attribute is computed with a formula that is registered at Line 34.
Aliases are determined with a topological sort on attribute dependencies in the tree where
only copy rules are kept. All attributes that are on the produced symbol (here, at l.2)
are considered as already bound to a lazy value and cannot be aliased together. Aliases
remove approximately 70% of rules on the C and C++.
The top-down traversal is executed by the strategy add-attributes (here, at l.22) which
put the attributes on the tree (here, at l.28) and continue to add attributes on children of
the production. (here, represented by a list of strategy calls at l.24)
• The third part contains all attribute rules converted to strategies. The listing 2.6 contains
the only formula used in the listing 2.5 at Line 35. This code is exactly the same as the
code from the Line 10 of the listing 2.4 except that a header has been added. The header
part of the strategy is used to get the debug information accessible inside the amb-name
variable and retrieve the value of the given attribute with the getv strategy. The current
term of the user code is the attributed tree created at the Line 32 of the listing 2.5.
1 b_0 =
2 ( id , id , [getv])
3 ; ?(amb−name, <id>, [attr_rule_var_0])
4 ; <add> (attr_rule_var_0, 42)
Listing 2.6: Small compiled AG
21 Transformers’ AG implementation

In our system attribute rules are not lazy at all because all lazy values are escaped from
the input term before the evaluation of the rule written by the user. (listing 2.6 at l.2)

• The last part contains all strategies hard coded that are used to evaluate all the tree, and
handle ambiguities like the strategies add-attributes and amb-compute that appears
at the Line 22 of the listings 2.5. This part exists for the parse tree version of the evaluator
and for the AST version. It should be put as an import to remove useless pretty-printing.
Chapter 3

Transformers’ AG Usage

Like everything new in software engineering, documentation is sparse and interface are chang-
ing rapidly. In this chapter, the usage of our system in its current version will be explained.

3.1 Syntax

3.1.1 Basic Syntax

1 l : Line+ −> Document


2 {cons(Document),
3 attribute (process:
4 l .use_dt := !True()
5 root .tags := < filter (?Tag(_))> l .doc:content
6 )}
Listing 3.1: Snippet of attribute rules

In the listing 3.1, line 1 represents a classical production rule, written in SDF. Lines 2 to 6 are
SDF annotations. Line 2 is a well-known SDF annotation, used to specify a constructor. Lines 3
to 6 consist in a custom annotation which specifies the attribute evaluation rules.

Each attribute annotation is put in a namespace (here, the namespace is process at l.3). At-
tributes are specified using a customized notation, symbol-name.[namespace:]name. The
symbol-name represents a non-terminal name in the production rule. The namespace refers
to the place where an attribute live and can be accessed, as attributes could be put in different
namespaces. If it is not specified, the attribute will be fetched from the current namespace (here,
process). The name is the attribute name and could be whatever the user wants.

Using SDF alias to specify rule-name is useful when the non-terminal is not a valid Stratego
identifier. For instance, here we used l as an alias for Line+, because "+" is not a valid character
for an identifier. The attribute root (l.5) is a special attribute, and refers to the non-terminal that
the production rule produce. Here, root must be written instead of Document.

There must be, on the left hand side of ":=", an attribute that will contain the result of the rule
evaluation. On the right hand side, Stratego code with the optional addition of references to
attributes name are allowed. Almost all Stratego constructs can be used to specify an attribute
rule. Because of the desugaring from this syntax to real Stratego, some limitations exist. For
example, Stratego keywords cannot be used as attribute names.
23 Transformers’ AG Usage

A node can set or retrieve its own attributes (using root) or its children attributes (using
child non-terminal or alias, like l in the example). It cannot access its parent’s attributes, that
is, it is the parent responsibility to give or retrieve attribute for its children.

To factor some code, a local attribute can be used. Local attributes have a scope restricted
to the current production and cannot be retrieved from another node. They can be used by
replacing root by local. The local symbol name exists to ensure that the attribute cannot
be used by another production. The root symbol name can be used to create local attribute
as-well but the system could not ensure that it is not used by another node.

3.1.2 Inherited Patterns

1 context−free meta−syntax
2 scope <diff , merge> =
3 Open A Close −> B
4 { attribute (process:
5 root . lr_table_syn :=
6 <diff> (A.lr_table_syn, root . lr_table_inh ) => diff_table
7 ; <merge> (root.lr_table_inh, diff_table )
8 )}
9
10 context−free syntax
11 "{" Stm∗ "}" −> Stm {
12 inherit (scope<id, Fst>)
13 }
Listing 3.2: Declaration of production pattern

In the listing 3.2, there is two sections. The first part from Line 1 to 9 is used to declare pat-
terns. The second part from Line 10 to 13 is an usual SDF section used to declare the grammar.

Each pattern has a name (here, the name is scope at l.2) and a list of arguments that are
enclosed between the symbols < and > and followed by an equal symbol. Arguments are sep-
arated by commas. The enclosure can be omitted if there is no arguments. Arguments are
strategies written in Stratego code.

Production patterns (like at l.3) are identical to SDF production and the name. When a pro-
duction pattern is used (like at l.12) with an inherit annotation, the annotated production and
the production are unified and symbol names are replaced in the production annotations of the
pattern. After the renaming step all annotations are included as an annotation of the derived
production.

Therefore in the listing 3.2 the production at Line 11 define a scope for the attribute table.
3.2 Automatic rules propagation 24

3.1.3 Attribute Declaration

Attribute declaration is an optional section where it is not necessary to declare attributes. Usu-
ally this could be helpful to ensure properties and to skip the writing of attribute propagation
in the name of the attribute.
1 attributes
2 table {
3 propagation−type LR
4 assertion check−content (
5 map( (is−string, id ) )
6 )
7 }
8
9 context−free syntax
10 "{" Stm∗ "}" −> Stm {attribute (process:
11 root .table_syn := root .table_inh
12 }
Listing 3.3: Overview of attribute declaration

In the listing 3.3, there is the declaration of the attribute table which is used after to create a
scope.

Attributes are declared in an attribute section which contains the list of attributes that are
declared. An attribute declaration starts with the name of the attribute (here, the name is table
at l.2) and is followed by the list of properties enclosed by brackets.

At the moment there are only two kinds of properties. The propagation-type property
(l.3) can be followed by one of the following: TD, BU or LR to respectively define that the at-
tribute is a Top-Down, Bottom-Up or Left-to-Right. In the case of the LR the suffixes inh and
syn have to be specified (like at l.11). The assertion property (l.4) is followed by the name
of the property and followed by a strategy contained in a pair of parenthesis. Assertions check
the value of the produced attribute and produce an error message if the value does not satisfy
the assertion.

3.2 Automatic rules propagation

3.2.1 Tables

Some things are recurrent. A repetitive task1 is to pass attributes up and down, without modi-
fying them. This is for instance heavily used for context tables, because the same context table
must be accessible from a lot of nodes in the AST. In the example 3.4, an attribute, Foo.ids, is
added to an existing table, initialized at A -> Start, then returned to it. The attribute rules
attached to C -> B and B -> A are in excess, because they are what a programmer should
expect if they were not present.

Like all cut-and-paste jobs, it is better to find a way to avoid it, either by factoring, or, in this
case, by allowing attribute writers to use some kind of “magic”. The example 3.5 shows the
cleaned way. Here, an attribute with a special prefix (lr_) is used. In each production rule,
it exists in two flavors: lr_<attr-name>_in which represents the inherited attribute, and
lr_<attr-name>_syn for the synthesized one.

Despite the name: “magic”, there is nothing weird with this propagation. When an attribute
is inherited to a non-terminal), or synthesized from another non-terminal, evaluations rules will
1 In C grammar, about 80% of the rules fall into this category.
25 Transformers’ AG Usage

Foo −> C
{ attribute {ns:
root . up := <conc> (root.down, Foo.ids)
)}
C −> B
{ attribute (ns:
C.down := root.down
root . up := C.up
)}
B −> A
{ attribute (ns:
B.down := root.down
root . up := B.up
)}
A −> Start
{ attribute (ns:
A.down := ![]
root . res := A.up
)}
Listing 3.4: Redundant attributes

Foo −> C
{ attribute (ns:
root . lr_table_syn := <conc> (root. lr_table_in , Foo.ids)
)}
C −> B
B −> A
A −> Start
{ attribute (ns:
A.lr_table_in := ![]
root . res := A.lr_table_syn
)}
Listing 3.5: Less attributes, more fun

be added to all below production rules. This process stops whenever the user explicitly uses
this attribute in evaluation rules. It is interesting to see how the listing 3.6 is expanded to 3.7.

A B C −> D

D −> Start
{ attribute (ns:
D.lr_table_in := ![]
)}
Listing 3.6: Before expansion

The lr_ prefix was not randomly chosen. It means attributes do a left to right traversal of the
tree, as in 3.8.
3.2 Automatic rules propagation 26

A B C −> D
{ attribute (ns:
A.lr_table_in := root . lr_table_in
B. lr_table_in := A.lr_table_syn
C.lr_table_in := B.lr_table_syn
root . lr_table_syn := C.lr_table_syn
)}

D −> Start
{ attribute (ns:
D.lr_table_in := ![]
)}
Listing 3.7: After expansion

1 10

synthesized Start
attribute
2 9

inherited D
attribute 3 8
5 6
4 7

A B C

Figure 3.8: Left hand attribute traversal.

3.2.2 Lists

Another interesting and useful 2 way to extend automatic propagation rules would be to extend
it to lists. Lists are only SDF sugar, they are actually implemented as a binary tree (see figure
3.9).

Let’s take an example, as shown in listing 3.10. After sdf2table, it is desugared and additional
rules are introduced, as in listing 3.11.

Id −> L
L+ −> Start
Listing 3.10: A tiny SDF grammar using list.

The problem is that when attributes are passed from the rule L+ -> Start to Id -> L (and
vice-versa), we have to explicitly desugar this list and add all for the eight rules listed in 3.11.

Take a breath, and try to exhibit exactly what would be the desired behavior:

• Bottom-up traversal: All attributes are fetched from the leaves, concatenated together as
lists are joined. Only propagation of synthesized attributes would be required. This is
the most useful feature to be developed, as it is often required. For instance, for the C
grammar, it would be to fetch a list of declaration specifiers, as they are totally unrelated
and don’t need to have the context.
2 From our experience writing C-grammar attributes.
27 Transformers’ AG Usage

Start

L+

L+ L+

L+ L+
L

L L
Id

Id Id

Figure 3.9: A list, seen from AsFix viewpoint.

Id −> L
−> L∗
L+ −> L∗
L∗ L+ −> L+
L+ L∗ −> L+
L+ L+ −> L+ {left}
L −> L+
L+ −> Start
Listing 3.11: Example of listing 3.10 desugared.

• Top-down traversal: Used when some information must be sent to all leaves, with the
requirement that this information is independent from other leaves. Only propagation of
inherited attributes would be required. Albeit easy to implement, its usefulness remains
to be proved.
Chapter 4

Transformers’ AG syntax proposal

The attribute syntax used in Transformers is not perfect for what we must do in spite of all
the things already done to improve it. Some improvements will be proposed to make easier
the work of developers. Considerations about improvements of the current attribute syntax are
split in two parts: those which are entirely in the way of thinking of the current syntax and
those which consist of deeper syntax modifications.

Some problems and hints on how to develop these extensions will be discussed in the next
chapter, the author left it as an open case for his successors.

4.1 Default value/code for horizontal propagation

The first improvement comes from a simple observation. We can see in listing 4.1 that all pro-
duction rules have Typespecifier at their right hand sides. To manage the particular case of the
class typedef, the key attribute must be propagated upwards, which is done by the first attribute
rule. The rules which have TypeSpecifier at their left hand sides propagate the key attribute to
detect when a class typedef is used and some computations are done on upper attributes in the
grammar. So the key attribute must be set for all possible TypeSpecifier even if there is no class
typedef. Consequently, all production rules with TypeSpecifier at their right hand sides must
assign their key attribute. Only three of those rules are written here but there are three more
in the C++ grammar. This pattern can be found at several locations in the C++ grammar and
requires to add a lot of attribute rules which we do not want to write. We only want to write
the interesting attribute rule (the first one) and know what is the default value if no value is
explicitly set.

A possible answer to this problem is a new section named attributes used to specify the default
attribute value or code. Listing 4.2 represents the template for a grammar file including the
new section. All default values are brought together in a place that is easily reachable given
the production rules using it because they are in the same file. The interesting attributes are in
the grammar, directly at the location where they are important and really used. In the future,
if we find that some other information about attributes should be specified for some reason,
this section could play this role too. For instance, attributes could be explicitly specified as
synthesized or inherited, à la UU-AG.

The TypeSpecifier should be optional. If it is not specified, all attributes named key would
have a default value of None().
29 Transformers’ AG syntax proposal

%% 7.1.5 [dcl.type]
ClassSpecifier −> TypeSpecifier
{ attributes (disamb:
root . key := ClassSpecifier .key
)}

SimpleTypeSpecifier −> TypeSpecifier


{ attributes (disamb:
root . key := None()
)}

EnumSpecifier −> TypeSpecifier


{ attributes (disamb:
root . key := None()
)}

[...]
Listing 4.1: Default value/code or horizontal propagation

module TypeSpecifiers

imports
[...]

exports

sorts
TypeSpecifier
[...]

attributes
default [TypeSpecifier.]key = None()
[...]

context−free syntax
ClassSpecifier −> TypeSpecifier
[...]
Listing 4.2: Default value - proposed solution

4.2 Vertical propagation

We must propagate attribute values from node to node very often. This represents vertical
propagation in the sense of the grammar, ie. the AST. The listing 4.3 is an example of such
a phenomenon. This is another polluting thing because the interesting attribute rules are not
those which propagate but those which give information. So the proposal is here to write inter-
esting attributes only and to automatically add propagation rules in the same way as the table
ones previously seen in subsection 3.2.1.

Note that research has been done to avoid this kind of heavy process. For instance, FNC-2
automatically detects when rules are missing and adds them implicitly. Some people extend
canonical attribute grammars for several purposes and Reference Attribute Grammars (RAGs)
(Hedin, 1999) offer to avoid our explicit and inelegant attribute propagations (even if they have
not been created for this). With those attribute grammars, attributes may be references to other
attributes (or nonterminals), so propagations become useless, and this is probably a better so-
lution. Nevertheless, using a RAG in Transformers is not so easy because of some techniques
used such as branches pruning, implying that at some times some references reference either
nothing or several things at a time. So reference attributes handling implies some extra run-time
4.3 Tables 30

computations to suit our system.

ClassSpecifier −> TypeSpecifier


{ attributes (disamb:
root .key := ClassSpecifier .key
)}

TypeSpecifier −> DeclSpecifier


{ attributes (disamb:
root .key := TypeSpecifier.key
)}

list : DeclSpecifier+ −> DeclSpecifierSeq


{ attributes (disamb:
root .key := list .key
)}

ds:DeclSpecifierSeq? il : InitDeclList ? " ; " −> SimpleDecl


{ attributes (disamb:
[...]
)}
Listing 4.3: Vertical propagation

4.3 Tables

Tables are a very debatable subject. They are used everywhere and have an arcane syntax. A
short example of what must be written to use tables is shown in listing 4.4.

The OLGA language, which is a language designed for attribute handling, does not sup-
port tables natively but offers to write types and functions to manipulate them as we want.
Bruno Vivien (1997) wrote an E-LOTOS compiler with FNC-2 and implemented symbol tables
in OLGA. This is a huge amount of work which is representative of the potential of this system.
The table implementation is very intuitive and composed of a set of functions to manipulate
data easily. So there is no magic nor universal solution. We must use the constructs proposed
by the language and sugar above it if necessary. Several styles can be chosen: the first proposal
for table syntax 4.5 uses a style based on operators, instead of the second proposal 4.6 which is
more functional.

A.lr_table_in := ![ B. str | C.lr_ns_in ]


A.lr_table_in := ![ (B.name, B.str, "a") | C.lr_table_syn ]

B.string := !A.lr_table_in
; filter (( "a" , "b", ? str ))
; str
Listing 4.4: Tables

A.lr_table_in := B. str + C.lr_ns_in


A.lr_table_in := (B.name, B.str, "a") + C.lr_table_syn

A.lr_table_in [ ("a" , "b", ? str ) ]


B. str := str
Listing 4.5: First proposal for table syntax
31 Transformers’ AG syntax proposal

A.lr_table_in := insert C.lr_ns_in B. str


A.lr_table_in := insert C.lr_table_syn (B.name, B.str, "a")

lookup A.lr_table_in ("a" , "b", ? str )


B. str := str
Listing 4.6: Second proposal for table syntax

4.4 Ambiguous synthesizing

When synthesizing attributes on an ambiguous node, attributes can be different, and there is no
way to determine which is the right one.

This was a problem for a long time, forcing the user to set a special value, invalid(), from
the branch he knows he does not want the value to be returned. This was cumbersome, and
added lots of uninteresting code to the attributes.

However, on ambiguity node, only one branch is valid, thus only attributes from this branch
should be candidate for passing upward. Now, the AG system works this way, and the user has
not to care about anything anymore.

4.5 The woes of this AG

Quoting Valentin David, the main Transformers’ AG system author: “If you don’t know how it
works internally, you can’t use it.”

That is, some parts are very tricky, and the overall desired behavior could be easily modified
by some undocumented "feature". To write a good specification, it is better to know how the
system is working internally.

Moreover, debugging tools are almost non-existent1 . The compilation process is better than
it was in the beginning. Missing attributes are checked and reported, a stack trace is printed to
provide error location, typo errors are reported, to cite a few improvements.

1 Hey, who said that God needed debugging tools to create Earth ? :)
Chapter 5

Others AG systems

This section presents a quick overview of two other AG systems already developed and run-
ning, and highlights differences against Transformers’ AG. Other technical reports such as
Vasseur (2004) and Borghi (2005) provide additional information about this subject.

5.1 UUAG

5.1.1 Presentation

In his slides (joint work with S. Doaitse Swierstra and Baars, 2005), Andres Löh presents UUAG
(Baars et al., 1999) as “a simple preprocessor on top of Haskell that allows to structure Haskell
programs as attribute grammars. It allows a form of aspect-oriented programming: one can
program data-oriented (defining all the attributes for one data-type in one place) and attribute-
oriented (defining one attribute for all data-type in one place), and also choose an arbitrary
path in between, which allows to keep semantically related code together in the code. Further-
more, copy rules take care of propagating attributes automatically and allow to omit a lot of
uninteresting code.”

5.1.2 Writing AG in UU-AG

Writing production rules is very similar to writing Haskell code. Each nonterminal is seen as
a data type, and its productions correspond to constructor definitions. So desugaring UU-AG
syntax into Haskell is very easy, as we can see in listings 5.1 and 5.2.

Expr = Var
| Expr Expr
| Var Expr
| Decls Expr
Listing 5.1: UUAG code
The same principle is used for all the UU-AG syntax. A few modifications are done on the
UU-AG attribute grammar source code to produce Haskell code which is directly inserted into
the rest of the system which is kind of a template.

A special type is used to declare attributes and they are explicitly declared as inherited, syn-
thesized or both.
33 Others AG systems

DATA Expr = var Var


| fun:Expr arg:Expr
| lam Var Expr
| let Decls Expr
Listing 5.2: UUAG code desugared in Haskell code

Attribute rules, like all available constructs in UU-AG, have a special type and pure Haskell
code can be used.

Attribute declarations and attribute rule definitions are shown in listing 5.3.

TYPE Docs = [ Doc ]

DATA Doc
| Section title : String body : Docs
| Pcdata string : String

ATTR Doc Docs [ | | html : String ]

SEM Doc
| Section lhs .html = "<bf>" ++ title ++ "</bf>\n" ++ @body.html
| Pcdata lhs.html = "<P>" ++ string ++ "</P>"

SEM Docs
| ...
Listing 5.3: Example of attribute usage in UU-AG

UU-AG enables the programmer to omit straight-forward propagation. For synthesized at-
tributes, a synthesized attribute is by default propagated from the leftmost child that provides
an attribute of the same name to the declaration node. If instead the results should be com-
bined in a uniform way, a USE construct can be employed, taking a constant which becomes the
default for a leaf and a binary operator which becomes the default combination operator.

Because UU-AG relies on desugaring to Haskell through a few modifications and is supposed
to be rather light-weight no typechecking is done on UU-AG level. This can lead to ugly error
messages on the Haskell compilation/interpretation level about errors at the UU-AG level. So
the generation process must be understood by the programmer.

5.1.3 Incorporate it in Transformers

The grammar (ie. the production rules, the attributes and the attribute rules) must be entirely
rewritten to fit the UU-AG system.

The grammar has to be altered because ambiguity nodes must be declared in the grammar,
whereas the standard C++ grammar does not use this convention (except for a few trivial cases).

Furthermore, all attributes must be declared and qualified as synthesized or inherited (in
opposition to the system currently used) and attribute rules must be rewritten in Haskell.

Finally, attributes and attribute rules are not embedded in the grammar with UU-AG. Trans-
formers clearly associates each production rule with its attributes and attribute rules. They are
at the same location in the code, and we take many benefits from this: debugging is much easier
if most of the related information is located at the same place.
5.1 UUAG 34

Reaching the efficiency of Transformers would require a lot of modifications. Plus, if we


consider using a dedicated tool set we do not want to add a layer to be able to use our subtleties
above, particularly when this layer involves heavy work.

5.1.4 A comparative example

In this section, a comparative study of two AG system used for disambiguation — the UUAG
one and ours — will be carried out. A piece of ambiguous grammar (listing 5.4) will be used as
a discussion thread. It follows the Core language proposed by Vasseur (2004).

EnumName <− Identifier Identifier


TypeName <− Identifier Identifier
Use <− TypeName | EnumName
Root <− Use
Listing 5.4: Simple grammar presenting an ambiguity.

The ambiguity is simple to imagine. When a tree would be derived from the grammar 5.4,
nothing helps us to determine whether Use should have TypeName or EnumName as children.
Thus, it will have both.

Make the assumption that lines represented in listing 5.5 (not represented in our grammar)
were parsed before. Now, suppose that the first identifier of EnumName or TypeName is a
type, and the second is a variable. Then, when the first Identifier (the type) is T, it should
be a TypeName. If it is E, it should be an EnumName. Otherwise this is an error.

type T
enum E
Listing 5.5: Contextual information.

An example input is illustrated in Lst. 5.6. Here, g will be a TypeName, and r an EnumName.

Eg
Tr
Listing 5.6: Example input.

With Transformers

The grammar is pretty simple, and does not need thorough explanations. It is presented in
listing 5.7.

With UUAG

A special node, not existing in the former grammar, AmbTE, has been added in 5.8.

Here, three attributes are declared, each for a given subset of nodes. val and ok are synthe-
sized, table is both inherited and synthesized.
SEM Root
| Root child. table = { [(" E", "enum−name"), ("T", "type−name")] }
35 Others AG systems

type: Identifier var: Identifier −> TypeName


{ attributes (disamb:
root . ok :=
<lookup> ((type.string, TypeName()), root.lr_table_in)
root . lr_table_syn :=
!root . ok
; ![( type.string , var. string ) | root . lr_table_in ]
)}

type: Identifier var: Identifier −> EnumName


{ attributes (disamb:
root . ok :=
<lookup> ((type.string, EnumName()), root.lr_table_in)
root . lr_table_syn :=
!root . ok
; ![( type.string , var. string ) | root . lr_table_in ]
)}

TypeName −> Use

EnumName −> Use

dus:Use+ −> Root


{ attributes (disamb:
dus.lr_table_in := ![( "T", TypeName()), ("E", EnumName())]
)}
Listing 5.7: SDF grammar using attributes.
This example is not very clean. Two different things get mixed in lr_table: types identification
and variables identification.

SEM AmbTE
| AmbTE lhs.val = { if @first .ok then @first .val else if @second.ok then @second.val else "error" }

SEM TypeName
| TypeName lhs.ok = { filter (\( t , v) −> v == "type−name" && t == @type) @lhs.table /= [] }
lhs .val = { @val ++ ":typedef" }

SEM EnumName
| EnumName lhs.ok = { filter (\(t , v) −> v == "enum−name" && t == @type ) @lhs.table /= [] }
lhs .val = { @val ++ ":enum" }

In this section, semantic rules are attached to nodes. The context table was previously initial-
ized, pretending lines listed in 5.5 were previously parsed.

Listing 5.11 represents the main program. Haskell code between braces is given verbatim to
Haskell, and contains the entry point. Two partial parse forest, example1 and example2 are cre-
ated, containing the interesting ambiguities, then processed by the AG system with the function
sem_Root.

5.2 FNC-2

FNC-2 (Parigot, 1988) is an AG-processing system developed at the INRIA. It accepts all strongly
non-circular attribute grammars but other AG types too, like Dynamic Attribute Grammars
(DAGs) (a type of AGs subsuming circular ones) (Parigot et al., 1996a,b).
5.2 FNC-2 36

DATA Root
| Root child:Use

DATA Use
| Use child:AmbTE

DATA AmbTE
| AmbTE first:TypeName second:EnumName

DATA TypeName
| TypeName type:String val:String

DATA EnumName
| EnumName type:String val:String
Listing 5.8: Node declaration

ATTR Root Use AmbTE TypeName EnumName [ | | val:String ]


ATTR Use AmbTE TypeName EnumName [ | table:{[(String, String)]} | ]
ATTR TypeName EnumName [ | | ok:Bool ]
Listing 5.9: Attributes declaration

5.2.1 Evaluators

FNC-2 uses generated evaluators based on the “visit-sequence” paradigm. This paradigm uses
as much information as possible from the grammar to reach a high level of performance. Fur-
thermore, it offers to apply very efficient space optimization techniques. Some results show
that the generated evaluators are as efficient in time and space as hand-written programs using
a tree as internal data structure.

FNC-2 is highly versatile and can be suitable for many situations.

Actually, the generated evaluators offer several evaluation modes:

• exhaustive (evaluating all attributes),


• incremental (evaluating only a set of attributes and saving the current state of evaluation
to resume later),
• parallel (in order to use several computers with a shared memory to achieve a common
evaluation).

Furthermore, these evaluators can be generated in several languages:

• C,
• Le_Lisp (on Centaur system),
• fSDL (on The CoSy Compilation System),

• Caml.
37 Others AG systems

{
−− The string "T g" was parsed
example1 :: Use
example1 = Use (AmbTE (TypeName ("T") ("g")) (EnumName ("T") ("g")))
−− The string "E g" was parsed
example2 :: Use
example2 = Use (AmbTE (TypeName ("E") ("g")) (EnumName ("E") ("g")))

main :: IO ()
main = do putStr "result1 : "
print (sem_Root (Root example))
putStr " result2 : "
print (sem_Root (Root example))
}
Listing 5.11: Main program

5.2.2 Tools

FNC-2 eases the use of attribute grammars with its flexibility, but also provides many tools to
ease the life of developers:

• a generator of abstract tree constructors driven by parsers (ATC), implemented in two


ways: one on top of SYNTAX, and one on top of Lex and Yacc,
• a generator of unparsers of attributed abstract trees (PPAT), based on the TEX-like notion
of nested boxes of text,
• an interactive circularity trace system (XVISU),

• source-level optimizations of AGs resulting from descriptional composition,


• ...

5.2.3 OLGA

FNC-2 uses OLGA, a language designed to handle attribute grammars. It is a strongly typed
functional language, similar to Caml, but is also designed to support the writing of seman-
tic rules (also known as attribute rules) attached to their nodes in the abstract tree. Conse-
quently, attributes are typed, like all OLGA identifiers. As we have seen before, many evalua-
tion schemes can be used but this is independent from the writing of attributes. Performance is
achieved without loss of expressiveness.

Moreover, many helpful things are done to ease the work of developers, such as default copy
rules of attributes, abbreviated syntax for the root node of the local tree, and predefined at-
tributes (for instance, to save the location of the symbols in the input source).

We can easily set apart the attribute declarations which are in the header from the grammar
rules and their associated attribute rules which are in the body. The focus is to highlight the
grammar aspect of the language. In this context the examples do not use control flow state-
ments, type declarations, functions, nor any other advanced constructs.

An example of attribute grammar header in OLGA is shown in the listing 5.12. Attribute
declarations follow the keyword attribute. Each attribute is typed and qualified as synthesized
or inherited. Moreover, nonterminals which own an attribute are specified at the attribute dec-
laration.
5.2 FNC-2 38

attribute
synthesized $s (A, B) : int ;
[...]
Listing 5.12: OLGA header

The body of the OLGA file holds production rules. The listing 5.13 shows the declaration of
a production rule and its associated attribute rules.

where A −> B1 B2 B3 use


$s := $s(B1) + $s(B2) + $s(B3);
[...]
end where
Listing 5.13: OLGA rules

5.2.4 Incorporate it in Transformers

The attribute rules must be entirely rewritten in OLGA if we consider using of FNC-2. Even if
OLGA is another specific domain language to learn, it is designed to handle attributes nicely.

Nevertheless, FNC-2 shares some problems with UU-AG when you want to transpose this
system in the Transformers world.

FNC-2 and Transformers share some syntactic abbreviations such as automatic propagation
and a special syntax for the root node, but attributes and attribute rules are not embedded in
the grammar which is a strong advantage of Transformers.

A very important point for FNC-2 is its flexibility in evaluations and evaluators. The multi-
visit paradigm uses information from the grammar to efficiently evaluate attributes in terms of
performance and memory usage too. In fact, the type of the dependency node can be deduced
from the type of the node we want to process, so the order of evaluation can be computed at
compile-time.

The incremental mode could be very useful if associated to a cache system for the include
mechanism of C/C++. The result of the evaluations of standard headers and more generally
headers used more than once can be saved in the cache, to be reused later when the same file is
included. However this system cannot express its real power because of the high context sensi-
tivity of the C/C++ grammar. Disambiguation of the code held by two includes can depend on
the order of the includes in question.

Sadly, the FNC-2 project died in 1998, when the main author left the team. There is no more
support, so if we find bugs in the system, we will be alone to solve them. Even worse we cannot
find the sources of the project anywhere. However there are many very interesting technical and
highly detailed papers involving new approaches to problems using special kinds of attribute
grammars and specific evaluation (mixing data-flow and multi-visit evaluations by determining
the maximal sub-multi-visit parts of the grammar,. . . ). Moreover, many projects used FNC-2,
and OLGA seemed to be a very good way to write attribute grammars.
Conclusion

In this paper, we only talked about attribute grammar system, its usage, implementation, and
possible extensions. We believe that our approach is good, but there is some work pending for
the Transformers project.

On the one hand, attribute writing should be improved, from our experience. Several pos-
sibilities were explored, for lists and context tables. On the other hand, the implementation in
itself is quite good today, but needs some lifting to deal with problems like the explosion in time
and memory for the compilation process. Some aspects and solutions were developed in this
paper, although there are many others.

Anyway, we should not forget that disambiguating is not the main objective of Transformers.
It is just a means. More references to other works on Transformers can be found in the (some-
what outdated) Transformers technical report Anisko et al. (2003), or in David (2004). Moreover,
real transformations 1 should not be forgotten, like Despres (2004) and ?.

1 Cf. introduction footnote.


Chapter 6

Bibliography

Anisko, R., David, V., and Vasseur, C. (2003). Transformers: a C++ program transformation
framework. Technical report, LRDE.
Baars, A., Swierstra, D., and Löh, A. (1999). UU-AG System. http://www.cs.uu.nl/wiki/
Center/AttributeGrammarSystem.
Borghi, A. (2005). Attribute grammar: A comparison of techniques. Technical report, LRDE.
David, V. (2004). Attribute grammars for C++ disambiguation. Technical report, LRDE.
Demaille, A., Durlin, R., Pierron, N., and Sigoure, B. (2008). Automatic attribute propagation
for modular attribute grammars. Submitted.
Demaille, A., Largillier, Th., and Pouillard, N. (2005). ESDF: A proposal for a more flexible
SDF handling. Communication to Stratego Users Day 2005.
Despres, N. (2004). C++ transformations panorama. Technical report, LRDE.
Durlin, R. (2007). Semantics driven disambiguation. Technical report, EPITA Research and
Development Laboratory (LRDE).
Durlin, R. (2008). Semantics driven disambiguation: A comparison of different approaches.
Technical report, EPITA Research and Development Laboratory (LRDE).
Gournet, O. (2004). Progress in c++ source preprocessing. Technical report, LRDE.
Hedin, G. (1999). Reference attributed grammars.
ISO/IEC (1999). ISO/IEC 9899:1999 (E). Programming languages - C.
ISO/IEC (2003). ISO/IEC 14882:2003 (E). Programming languages - C++.
joint work with S. Doaitse Swierstra, A. L. and Baars, A. (2005). Attribute Grammars in Haskell
with UUAG. www.cs.ioc.ee/~tarmo/tsem04/loeh-slides.pdf.
Kalleberg, K. T. and Visser, E. (2006). Strategic graph rewriting: Transforming and traversing
terms with references. In Proceedings of the 6th International Workshop on Reduction Strategies in
Rewriting and Programming, Seattle, Washington. Accepted for publication.
Knuth, D. E. (1968). Semantics of context-free languages. Journal of Mathematical System Theory,
pages 127–145.
Largillier, T. (2005). Proposition d’un support pour les notions de programmation avancées en
c++. Technical report, LRDE.
Parigot, D. (1988). FNC-2 Attribute Grammar System. http://www-rocq.inria.fr/
oscar/www/fnc2/.
41 BIBLIOGRAPHY

Parigot, D., Roussel, G., Jourdan, M., and Duris, E. (May 1996a). Dynamic attribute grammars.
Technical Report 2881, INRIA.

Parigot, D., Roussel, G., Jourdan, M., and Duris, E. (September 1996b). Dynamic attribute
grammars.
Pierron, N. (2007). Formal Definition of the Disambiguation with Attribute Grammars. Tech-
nical report, EPITA Research and Development Laboratory (LRDE).

Rodeh, M. and Sagiv, M. (1999). Finding circular attributes in attribute grammars. J. ACM,
46(4):556–ff.
Vasseur, C. (2004). Semantics driven disambiguation: a comparison of different approaches.
Technical report, LRDE.

Vivien, B. (1997). Etude et réalisation d’un compilateur e-lotos à l’aide du générateur de com-
pilateurs syntax/fnc-2.

Das könnte Ihnen auch gefallen