Sie sind auf Seite 1von 56

Principles of Compiler Design

- The Brainf*ck Compiler -

Clifford Wolf - www.clifford.at


http://www.clifford.at/papers/2004/compiler/

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 1/56


Introduction
● Introduction
● Overview (1/2)
● Overview (2/2)
● Aim

Brainf*ck

Lexer and Parser Introduction


Code Generators

Tools

Complex Code Generators

The BF Compiler

Stack Machines

The SPL Project

LL(regex) parsers

URLs and References

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 2/56


Introduction
Introduction
● Introduction
n My presentation at 20C3 about CPU design featuring a
● Overview (1/2)
● Overview (2/2)
Brainf*ck CPU was a big success
● Aim

Brainf*ck

Lexer and Parser


n My original plan for 21C3 was to build a Brainf*ck CPU with
tubes..
Code Generators

Tools

Complex Code Generators n But:


The BF Compiler The only thing more dangerous than a hardware guy with a
Stack Machines code patch is a programmer with a soldering iron.
The SPL Project

LL(regex) parsers
n So this is a presentation about compiler design featuring a
URLs and References
Brainf*ck Compiler.

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 3/56


Overview (1/2)
Introduction
● Introduction
In this presentation I will discuss:
● Overview (1/2)
● Overview (2/2)
● Aim
n A little introduction to Brainf*ck
Brainf*ck

Lexer and Parser

Code Generators
n Components of a compiler, overview
Tools

Complex Code Generators n Designing and implementing lexers


The BF Compiler

Stack Machines
n Designing and implementing parsers
The SPL Project

LL(regex) parsers

URLs and References


n Designing and implementing code generators

n Tools (flex, bison, iburg, etc.)

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 4/56


Overview (2/2)
Introduction
● Introduction
n Overview of more complex code generators
● Overview (1/2) u Abstract syntax trees
● Overview (2/2)
● Aim u Intermediate representations
Brainf*ck u Basic block analysis
Lexer and Parser u Backpatching
Code Generators u Dynamic programming
Tools u Optimizations
Complex Code Generators

The BF Compiler
n Design and implementation of the Brainf*ck Compiler
Stack Machines

The SPL Project

LL(regex) parsers
n Implementation of and code generation for stack machines
URLs and References

n Design and implementation of the SPL Project

n Design and implementation of LL(regex) parsers

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 5/56


Aim
Introduction
● Introduction
n After this presentation, the auditors ..
● Overview (1/2)
● Overview (2/2)
● Aim

Brainf*ck
n .. should have a rough idea of how compilers are working.
Lexer and Parser

Code Generators n .. should be able to implement parsers for complex


Tools configuration files.
Complex Code Generators

The BF Compiler
n .. should be able to implement code-generators for stack
Stack Machines
machines.
The SPL Project

LL(regex) parsers

URLs and References


n .. should have a rough idea of code-generation for register
machines.

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 6/56


Introduction

Brainf*ck
● Overview
● Instructions
● Implementing "while"
● Implementing "x=y"
● Implementing "if"
● Functions Brainf*ck
Lexer and Parser

Code Generators

Tools

Complex Code Generators

The BF Compiler

Stack Machines

The SPL Project

LL(regex) parsers

URLs and References

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 7/56


Overview
Introduction n Brainf*ck is a very simple turing-complete programming
Brainf*ck
● Overview
language.
● Instructions
● Implementing "while"
● Implementing "x=y"
● Implementing "if"
n It has only 8 instructions and no instruction parameters.
● Functions

Lexer and Parser n Each instruction is represented by one character:


Code Generators < > + - . , [ ]
Tools

Complex Code Generators n All other characters in the input are ignored.
The BF Compiler

Stack Machines
n A Brainfuck program has an implicit byte pointer which is free
The SPL Project
to move around within an array of 30000 bytes, initially all set
LL(regex) parsers
to zero. The pointer itself is initialized to point to the
URLs and References
beginning of this array.

Some languages are designed to solve a problem.


Others are designed to prove a point.

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 8/56


Instructions
Introduction

Brainf*ck
> Increment the pointer. ++p;
● Overview
● Instructions < Decrement the pointer. --p;
● Implementing "while"
● Implementing "x=y"
● Implementing "if" + Increment the byte at the pointer. ++*p;
● Functions

Lexer and Parser


- Decrement the byte at the pointer. ++*p;
Code Generators . Output the byte at the pointer.
Tools
putchar(*p);
Complex Code Generators

The BF Compiler
, Input a byte and store it in the byte at the pointer.
Stack Machines *p = getchar();
The SPL Project
[ Jump forward past the matching ] if the byte at the pointer is zero.
LL(regex) parsers
while (*p) {
URLs and References

] Jump backward to the matching [ unless the byte at the pointer is zero.
}

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 9/56


Implementing "while"
Introduction n Implementing a while statement is easy, because the
Brainf*ck
● Overview
Brainf*ck [ .. ] statement is a while loop.
● Instructions
● Implementing "while"
● Implementing "x=y"
● Implementing "if" n So while (x) { <foobar> } becomes:
● Functions

Lexer and Parser

Code Generators
<move pointer to a>
Tools
[
Complex Code Generators
<foobar>
The BF Compiler
<move pointer to a>
Stack Machines
]
The SPL Project

LL(regex) parsers

URLs and References

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 10/56


Implementing "x=y"
Introduction n Implementing assignment (copy) instructions is a bit more
Brainf*ck
● Overview
complex.
● Instructions
● Implementing "while"
● Implementing "x=y"
● Implementing "if"
n The straight forward way of doing that resets y to zero:
● Functions
<move pointer to y> [ -
Lexer and Parser
<move pointer to x> +
Code Generators
<move pointer to y> ]
Tools

So, a temporary variable t is needed:


Complex Code Generators
n
The BF Compiler

Stack Machines <move pointer to y> [ -


The SPL Project <move pointer to t> +
LL(regex) parsers <move pointer to y> ]
URLs and References

<move pointer to t> [ -


<move pointer to x> +
<move pointer to y> +
<move pointer to t> ]

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 11/56


Implementing "if"
Introduction n The if statement is like a while-loop, but it should run its
Brainf*ck
● Overview
block only once. Again, a temporary variable is needed to
● Instructions
● Implementing "while"
implement if (x) { <foobar> }:
● Implementing "x=y"
● Implementing "if"
● Functions
<move pointer to x> [ -
Lexer and Parser
<move pointer to t> +
Code Generators
<move pointer to x> ]
Tools
<move pointer to t> [
Complex Code Generators

The BF Compiler
[ -
Stack Machines
<move pointer to x> +
The SPL Project
<move pointer to t> ]
LL(regex) parsers

URLs and References


<foobar>

<move pointer to t> ]

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 12/56


Functions
Introduction n Brainf*ck has no construct for functions.
Brainf*ck
● Overview
● Instructions
● Implementing "while"
● Implementing "x=y"
n The compiler has support for macros which are always
● Implementing "if"
● Functions
inlined.
Lexer and Parser

Code Generators n The generated code may become huge if macros are used
Tools intensively.
Complex Code Generators

The BF Compiler
n So recursions must be implemented using explicit stacks.
Stack Machines

The SPL Project

LL(regex) parsers

URLs and References

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 13/56


Introduction

Brainf*ck

Lexer and Parser


● Lexer
● Parser
● BNF
● Reduce Functions
● Algorithms
Lexer and Parser
● Conflicts

Code Generators

Tools

Complex Code Generators

The BF Compiler

Stack Machines

The SPL Project

LL(regex) parsers

URLs and References

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 14/56


Lexer
Introduction n The lexer reads the compiler input and transforms it to lexical
Brainf*ck
tokens.
Lexer and Parser
● Lexer
● Parser
● BNF
● Reduce Functions
n E.g. the lexer reads the input "while" and returns the
● Algorithms numerical constant TOKEN WHILE.
● Conflicts

Code Generators

Tools n Tokens may have additional attributes. E.g. the textual input
Complex Code Generators "123" may be transformed to the token TOKEN NUMBER with
The BF Compiler the integer value 123 attached to it.
Stack Machines

The SPL Project


n The lexer is usually implemented as function which is called
LL(regex) parsers
by the parser.
URLs and References

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 15/56


Parser
Introduction n The parser consumes the lexical tokens (terminal symbols)
Brainf*ck
and reduces sequences of terminal and non-terminal
Lexer and Parser
● Lexer
symbols to non-terminal symbols.
● Parser
● BNF
● Reduce Functions
● Algorithms n The parser creates the so-called parse tree.
● Conflicts

Code Generators

Tools n The parse tree never exists as such as memory-structure.


Complex Code Generators

The BF Compiler
n Instead the parse-tree just defines the order in which
Stack Machines
so-called reduction functions are called.
The SPL Project

LL(regex) parsers

URLs and References


n It is possible to create tree-like memory structures in this
reduction functions which look like the parse tree. This
structures are called "Abstract Syntax Tree".

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 16/56


BNF
Introduction
BNF (Backus-Naur Form) is a way of writing down parser
Brainf*ck
definitions. A BNF for parsing a simple assign statement (like
Lexer and Parser
● Lexer
“x = y + z * 3”) could look like (yacc style syntax):
● Parser
● BNF
● Reduce Functions assign: NAME ’=’ expression;
● Algorithms
● Conflicts

Code Generators primary: NAME | NUMBER


Tools | ’(’ expression ’)’;
Complex Code Generators

The BF Compiler product: primary


Stack Machines | product ’*’ primary
The SPL Project | product ’/’ primary;
LL(regex) parsers

URLs and References sum: product


| sum ’+’ product
| sum ’-’ product;

expression: sum;

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 17/56


Reduce Functions
Introduction n Whenever a sequence of symbols is reduced to a
Brainf*ck
non-terminal symbol, a reduce function is called. E.g.:
Lexer and Parser
● Lexer
● Parser %union {
● BNF
● Reduce Functions int numval;
● Algorithms
● Conflicts }
Code Generators %type <numval> sum product
Tools

Complex Code Generators


%%
The BF Compiler

Stack Machines
sum: product
The SPL Project
| sum ’+’ product { $$ = $1 + $3; }
LL(regex) parsers
| sum ’-’ product { $$ = $1 + $3; };
URLs and References

n The attributes of the symbols on the right side of the


reduction can be accessed using $1 .. $n. The attributes of
the resulting symbol can be accessed with $$.

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 18/56


Algorithms
Introduction n A huge number of different parser algorithms exists.
Brainf*ck

Lexer and Parser


● Lexer
● Parser
n The two most important algorithms are LL(N) and LALR(N).
● BNF
● Reduce Functions
● Algorithms
● Conflicts n Other algorithms are LL(k), LL(regex), GLR and Ad-Hoc.
Code Generators

Tools
n Most hand written parsers are LL(1) parsers.
Complex Code Generators

The BF Compiler

Stack Machines
n Most parser generators create LALR(1) parsers.
The SPL Project

LL(regex) parsers n A detailed discussion of various parser algorithms can be


URLs and References found in “The Dragonbook” (see references on last slide).

n The design and implementation of LL(1) parsers is also


discussed in the section about LL(regex) parsers.

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 19/56


Conflicts
Introduction n Sometimes a parser grammar is ambiguous.
Brainf*ck

Lexer and Parser


● Lexer
● Parser
n In this cases, the parser has to choose one possible
● BNF
● Reduce Functions
interpretation of the input.
● Algorithms
● Conflicts

Code Generators n LALR parsers distinguish between reduce-reduce and


Tools shift-reduce conflicts.
Complex Code Generators

The BF Compiler
n Reduce-reduce conflicts should be avoided when writing the
Stack Machines
BNF.
The SPL Project

LL(regex) parsers

URLs and References


n Shift-reduce conflicts are always solved by shifting.

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 20/56


Introduction

Brainf*ck

Lexer and Parser

Code Generators
● Overview
● Simple Code Generators Code Generators
Tools

Complex Code Generators

The BF Compiler

Stack Machines

The SPL Project

LL(regex) parsers

URLs and References

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 21/56


Overview
Introduction n Writing the code generator is the most complex part of a
Brainf*ck
compiler project.
Lexer and Parser

Code Generators
● Overview
● Simple Code Generators
n Usually the code-generation is split up in different stages,
such as:
Tools
u Creating an Abstract-Syntax tree
Complex Code Generators
u Creating an intermediate code
The BF Compiler
u Creating the output code
Stack Machines

The SPL Project

LL(regex) parsers
n A code-generator which creates assembler code is usually
URLs and References much easier to write than a code-generator creating binaries.

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 22/56


Simple Code Generators
Introduction n Simple code generators may generate code directly in the
Brainf*ck
parser.
Lexer and Parser

Code Generators
● Overview
● Simple Code Generators
n This is possible if no anonymous variables exist (BFC) or the
target machine is a stack-machine (SPL).
Tools

Complex Code Generators

The BF Compiler
Example:
Stack Machines
if_stmt:
The SPL Project
TK_IF TK_ARGS_BEGIN TK_STRING TK_ARGS_END stmt
LL(regex) parsers
{
URLs and References
$$ = xprintf(0, 0, "%s{", debug_info());
$$ = xprintf($$, $5, "(#tmp_if)<#tmp_if>[-]"
"<%s>[-<#tmp_if>+]"
"<#tmp_if>[[-<%s>+]\n", $3, $3
$$ = xprintf($$, 0, "]}");
}

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 23/56


Introduction

Brainf*ck

Lexer and Parser

Code Generators

Tools
● Overview
Tools
● Flex / Lex
● Yacc / Bison
● Burg / iBurg
● PCCTS

Complex Code Generators

The BF Compiler

Stack Machines

The SPL Project

LL(regex) parsers

URLs and References

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 24/56


Overview
Introduction n There are tools for writing compilers.
Brainf*ck

Lexer and Parser

Code Generators
n Most of these tools cover the lexer/parser step only.
Tools
● Overview
● Flex / Lex n Most of these tools generate c-code from a declarative
language.
● Yacc / Bison
● Burg / iBurg
● PCCTS

Complex Code Generators

The BF Compiler

Stack Machines n Use those tools but understand what they are doing!
The SPL Project

LL(regex) parsers

URLs and References

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 25/56


Flex / Lex
Introduction n Flex (Fast Lex) is the GNU successor of Lex.
Brainf*ck

Lexer and Parser

Code Generators
n The lex input file (*.l) is a list or regular expressions and
Tools
actions.
● Overview
● Flex / Lex

The “actions” are c code which should be executed when the


● Yacc / Bison
● Burg / iBurg n
lexer finds a match for the regular expression in the input.
● PCCTS

Complex Code Generators

The BF Compiler

Stack Machines
n Most actions simply return the token to the parser.
The SPL Project

LL(regex) parsers n It is possible to skip patterns (e.g. white spaces) by not


URLs and References
providing an action at all.

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 26/56


Yacc / Bison
Introduction n Bison is the GNU successor of Yacc (Yet Another Compiler
Brainf*ck
Compiler).
Lexer and Parser

Code Generators

Tools
n Bison is a parser generator.
● Overview
● Flex / Lex

The bison input (*.y) is a BNF with reduce functions.


● Yacc / Bison
● Burg / iBurg n
● PCCTS

Complex Code Generators

The BF Compiler
n The generated parser is a LALR(1) parser.
Stack Machines

The SPL Project n Bison can also generate GLR parsers.


LL(regex) parsers

URLs and References

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 27/56


Burg / iBurg
Introduction n iBurg is the successor of Burg.
Brainf*ck

Lexer and Parser

Code Generators
n iBurg is a “Code Generator Generator”.
Tools
● Overview
● Flex / Lex n The code generator generated by iBurg implements the
“dynamic programming” algorithm.
● Yacc / Bison
● Burg / iBurg
● PCCTS

Complex Code Generators

The BF Compiler
n It is a bit like a parser for an abstract syntax tree with an
Stack Machines extremely ambiguous BNF.
The SPL Project

LL(regex) parsers n The reductions have cost values applied and an iBurg code
URLs and References
generator chooses the cheapest fit.

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 28/56


PCCTS
Introduction n PCCTS is the “Purdue Compiler-Compiler Tool Set”.
Brainf*ck

Lexer and Parser

Code Generators
n PCCTS is a parser generator for LL(k) parsers in C++.
Tools
● Overview
● Flex / Lex n The PCCTS toolkit was written by Terence J. Parr of the
MageLang Institute.
● Yacc / Bison
● Burg / iBurg
● PCCTS

Complex Code Generators

The BF Compiler
n His current project is antlr 2 - a complete redesign of pccts,
Stack Machines written in Java, that generates Java or C++.
The SPL Project

LL(regex) parsers n PCCTS is now maintained by Tom Moog, Polhode, Inc.


URLs and References

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 29/56


Introduction

Brainf*ck

Lexer and Parser

Code Generators

Tools Complex Code Generators


Complex Code Generators
● Overview
● Abstract syntax trees
● Intermediate representations
● Basic block analysis
● Backpatching
● Dynamic programming
● Optimizations

The BF Compiler

Stack Machines

The SPL Project

LL(regex) parsers

URLs and References

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 30/56


Overview
Introduction n Unfortunately it’s not possible to cover code generation in
Brainf*ck
depth in this presentation.
Lexer and Parser

Code Generators

Tools
n However, I will try to give a rough overview of the topic and
explain the most important terms.
Complex Code Generators
● Overview
● Abstract syntax trees
● Intermediate representations
● Basic block analysis
● Backpatching
● Dynamic programming
● Optimizations

The BF Compiler

Stack Machines

The SPL Project

LL(regex) parsers

URLs and References

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 31/56


Abstract syntax trees
Introduction n With some languages it is hard to create intermediate code
Brainf*ck
directly from the parser.
Lexer and Parser

Code Generators
n In compilers for such languages, an abstract syntax tree is
Tools
created from the parser.
Complex Code Generators
● Overview
● Abstract syntax trees

The intermediate code generation can then be done in


● Intermediate representations
● Basic block analysis n
different phases which may process the abstract syntax tree
● Backpatching
● Dynamic programming
● Optimizations
bottom-up and top-down.
The BF Compiler

Stack Machines

The SPL Project

LL(regex) parsers

URLs and References

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 32/56


Intermediate representations
Introduction n Most compilers create intermediate code from the input and
Brainf*ck
generate output code from this intermediate code.
Lexer and Parser

Code Generators

Tools
n Usually the intermediate code is some kind of three-address
code assembler language.
Complex Code Generators
● Overview
● Abstract syntax trees

The GCC intermediate language is called RTL and is a wild


● Intermediate representations
● Basic block analysis n
mix of imperative and functional programming.
● Backpatching
● Dynamic programming
● Optimizations

The BF Compiler

Stack Machines
n Intermediate representations which are easily converted to
The SPL Project
trees (such as functional approaches) are better for dynamic
LL(regex) parsers
programming, but are usually not optimal for ad-hoc code
URLs and References
generators.

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 33/56


Basic block analysis
Introduction n A code block from one jump target to the next is called
Brainf*ck
“Basic Block”.
Lexer and Parser

Code Generators

Tools
n Optimizations in basic blocks are an entirely different class of
optimization than those which can be applied to a larger
Complex Code Generators
● Overview code block.
● Abstract syntax trees
● Intermediate representations
● Basic block analysis

Many compilers create intermediate language trees for each


● Backpatching
● Dynamic programming n
basic block and then create the code for it using dynamic
● Optimizations

The BF Compiler
programming.
Stack Machines

The SPL Project

LL(regex) parsers

URLs and References

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 34/56


Backpatching
Introduction n It is often necessary to create jump instructions without
Brainf*ck
knowing the jump target address yet.
Lexer and Parser

Code Generators

Tools
n This problem is solved by outputting a dummy target address
and fixing it later.
Complex Code Generators
● Overview
● Abstract syntax trees

This procedure is called backpatching.


● Intermediate representations
● Basic block analysis n
● Backpatching
● Dynamic programming
● Optimizations

The BF Compiler
n The Brainf*ck compiler doesn’t need backpatching because
Stack Machines
Brainf*ck doesn’t have jump instructions and addresses.
The SPL Project

LL(regex) parsers n However, the Brainf*ck runtime bundled with the compiler is
URLs and References
using backpatching to optimize the runtime speed.

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 35/56


Dynamic programming
Introduction n Dynamic programming is an algorithm for generating
Brainf*ck
assembler code from intermediate language trees.
Lexer and Parser

Code Generators

Tools
n Code generators such as Burg and iBurg are implementing
the dynamic programming algorithm.
Complex Code Generators
● Overview
● Abstract syntax trees

Dynamic programming uses two different phases.


● Intermediate representations
● Basic block analysis n
● Backpatching
● Dynamic programming
● Optimizations

The BF Compiler
n In the first phase, the tree is labeled to find the cheapest
Stack Machines
matches in the rule set (bottom-up).
The SPL Project

LL(regex) parsers n In the 2nd phase, the code for the cheapest solution is
URLs and References
generated (top-down).

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 36/56


Optimizations
Introduction n Most optimizing compilers perform different optimizations in
Brainf*ck
different compilation phases.
Lexer and Parser

Code Generators

Tools
n So most compilers don’t have a separate “the optimizer”
code path.
Complex Code Generators
● Overview
● Abstract syntax trees

Some important optimizations are:


● Intermediate representations
● Basic block analysis n
u Global register allocation
● Backpatching
● Dynamic programming

u Loop detection and unrolling


● Optimizations

u Common subexpression elimination


The BF Compiler

u Peephole optimizations
Stack Machines

The SPL Project

LL(regex) parsers

URLs and References


n The Brainf*ck compiler does not optimize.

n The SPL compiler has a simple peephole optimizer.

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 37/56


Introduction

Brainf*ck

Lexer and Parser

Code Generators

Tools The BF Compiler


Complex Code Generators

The BF Compiler
● Overview
● Assembler
● Compiler
● Running
● Implementation

Stack Machines

The SPL Project

LL(regex) parsers

URLs and References

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 38/56


Overview
Introduction n The project is split up in an assembler and a compiler.
Brainf*ck

Lexer and Parser

Code Generators
n The assembler handles variable names and manages the
pointer position.
Tools

Complex Code Generators

The BF Compiler n The compiler reads BFC input files and creates assembler
code.
● Overview
● Assembler
● Compiler
● Running
● Implementation

Stack Machines
n The assembler has an ad-hoc lexer and parser.
The SPL Project

LL(regex) parsers n The compiler has a flex generated lexer and a bison
URLs and References
generated parser.

n The compiler generates the assembler code directly from the


parser reduce functions.

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 39/56


Assembler
Introduction n The operators [ + and - are unmodified.
Brainf*ck

Lexer and Parser n The ] operator sets the pointer back to the position where it
Code Generators was at [.
Tools

Complex Code Generators n A named variable can be defined with (x).


The BF Compiler

The pointer can be set to a named variable with <x>.


● Overview
● Assembler n
● Compiler
● Running
● Implementation
n A name space is defined with { ... }.
Stack Machines

The SPL Project


n A block in single quotes is passed through unmodified.
LL(regex) parsers

URLs and References n Larger spaces can be defined with (x.42).

n An alias for another variable can be defined with (x:y).

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 40/56


Compiler
Introduction n Variables are declared with var x;.
Brainf*ck

Lexer and Parser

Code Generators
n C-like expressions for =, +=, -=, if and while are available.
Tools

Complex Code Generators n Macros can be defined with macro x() { ... }.
The BF Compiler
● Overview
● Assembler
● Compiler
n All variables are passed using call-by-reference.
● Running
● Implementation

Stack Machines n The compiler can’t evaluate complex expressions.


The SPL Project

LL(regex) parsers

URLs and References


n Higher functions (such as comparisons and multiply) are
implemented using built-in functions.

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 41/56


Running
Introduction n The compiler and the assembler are both filter programs.
Brainf*ck

Lexer and Parser

Code Generators
n So compilation is done by:
$ ./bfc < hanoi.bfc | ./bfa > hanoi.bf
Tools
Code: 53884 bytes, Data: 275 bytes.
Complex Code Generators

The BF Compiler

The bfrun executable is a simple Brainf*ck interpreter:


● Overview
● Assembler n
● Compiler
● Running $ ./bfrun hanoi.bf
● Implementation

Stack Machines

The SPL Project

LL(regex) parsers

URLs and References

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 42/56


Implementation
Introduction

Code review of the assembler.


Brainf*ck

Lexer and Parser

Code Generators

Tools

.. and the compiler.


Complex Code Generators

The BF Compiler
● Overview
● Assembler
● Compiler
● Running
● Implementation

Stack Machines

The SPL Project


.. and the built-ins library.
LL(regex) parsers

URLs and References

.. and the hanoi example.

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 43/56


Introduction

Brainf*ck

Lexer and Parser

Code Generators

Tools Stack Machines


Complex Code Generators

The BF Compiler

Stack Machines
● Overview
● Example

The SPL Project

LL(regex) parsers

URLs and References

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 44/56


Overview
Introduction n Stack machine are a computer architecture, like register
Brainf*ck
machines or accumulator machines.
Lexer and Parser

Code Generators

Tools
n Every instruction pops it’s arguments from the stack and
pushes the result back on the stack.
Complex Code Generators

The BF Compiler

Stack Machines n Special instructions push the content of a variable on the


stack or pop a value from the stack and write it back to a
● Overview
● Example

The SPL Project variable.


LL(regex) parsers

URLs and References


n Stack machines are great for virtual machines in scripting
languages because code generation is very easy.

n However, stack machines are less efficient than register


machines and are harder to implement in hardware.

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 45/56


Example
Introduction

Brainf*ck

Lexer and Parser

Code Generators

Tools
x = 5 * ( 3 + y );
Complex Code Generators

The BF Compiler

Stack Machines
PUSHC "5"
● Overview
● Example PUSHC "3"
The SPL Project
PUSH "y"
LL(regex) parsers

URLs and References


IADD
IMUL
POP "x"

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 46/56


Introduction

Brainf*ck

Lexer and Parser

Code Generators

Tools The SPL Project


Complex Code Generators

The BF Compiler

Stack Machines

The SPL Project


● Overview
● WebSPL
● Example

LL(regex) parsers

URLs and References

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 47/56


Overview
Introduction n SPL is an embeddable scripting language with C-like syntax.
Brainf*ck

Lexer and Parser n It has support for arrays, hashes, objects, perl regular
Code Generators expressions, etc. pp.
Tools

Complex Code Generators n The entire state of the virtual machine can be dumped at any
The BF Compiler time and execution of the program resumed later.
Stack Machines

The SPL Project n In SPL there is a clear separation of compiler, assembler,


● Overview
● WebSPL optimizer and virtual machine.
● Example

LL(regex) parsers
n It’s possible to run pre-compiled binaries, program directly in
URLs and References
the VM assembly, use multi threading, step-debug programs,
etc. pp.

n SPL is a very small project, so it is a good example for


implementing high-level language compilers for stack
machines.

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 48/56


WebSPL
Introduction n WebSPL is a framework for web application development.
Brainf*ck

Lexer and Parser

Code Generators
n It creates a state over the stateless HTTP protocol using the
Tools
dump/restore features of SPL.
Complex Code Generators

The BF Compiler n I.e. it is possible to print out an updated HTML page and
Stack Machines then call a function which “waits” for the user to do anything
The SPL Project and returns then.
● Overview
● WebSPL
● Example

LL(regex) parsers
n WebSPL is still missing some bindings for various SQL
URLs and References
implementations, XML and XSLT bindings, the WSF
(WebSPL Forms) library and some other stuff..

n Right now I’m looking for people who want to participate in


the project.

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 49/56


Example
Introduction object Friend {
var id;
Brainf*ck <...>
method winmain(sid) {
Lexer and Parser
title = name;
Code Generators
.sid = sid;
while (1) {
Tools template = "show";
bother_user();
Complex Code Generators
if ( defined cgi.param.edit ) {
template = "edit";
The BF Compiler
bother_user();
Stack Machines name = cgi.param.new_name;
phone = cgi.param.new_phone;
The SPL Project email = cgi.param.new_email;
● Overview
addr = cgi.param.new_addr;
● WebSPL
● Example
title = name;
}
LL(regex) parsers if ( defined cgi.param.delfriend ) {
delete friends.[id].links.[cgi.param.delfriend];
URLs and References
delete friends.[cgi.param.delfriend].links.[id];
}
if ( defined cgi.param.delete ) {
delete friends.[id];
foreach f (friends)
delete friends.[f].links.[id];
&windows.[winid].finish();
}
}
}
}

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 50/56


Introduction

Brainf*ck

Lexer and Parser

Code Generators

Tools LL(regex) parsers


Complex Code Generators

The BF Compiler

Stack Machines

The SPL Project

LL(regex) parsers
● Overview
● Left recursions
● Example

URLs and References

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 51/56


Overview
Introduction n LL parsers (recursive decent parsers) are straight-forward
Brainf*ck
implementations of a BNF.
Lexer and Parser

Code Generators

Tools
n Usually parsers read lexemes (tokens) from a lexer.
Complex Code Generators

The BF Compiler n A LL(N) parser has access to N lookahead symbols to


Stack Machines decide which reduction should be applied.
The SPL Project

LL(regex) parsers
● Overview
n Usually LL(N) parsers are LL(1) parsers.
● Left recursions
● Example

URLs and References n LL(regex) parsers are LL parsers with no lexer but a regex
engine.

n LL(regex) parsers are very easy to implement in perl.

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 52/56


Left recursions
Introduction n Often a BNF contains left recursion:
Brainf*ck
<...>
Lexer and Parser
product: primary
Code Generators | product ’*’ primary
Tools | product ’/’ primary;
Complex Code Generators <...>
The BF Compiler

Stack Machines n Left recursions cause LL parsers to run into an endless


The SPL Project recursion.
LL(regex) parsers

There are algorithms for converting left recursions to right


● Overview
● Left recursions n
recursions without effecting the organization of the parse
● Example

URLs and References


tree.

n But the resulting BNF is much more complex than the


original one.

n Most parser generators do that automatically (e.g. bison).

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 53/56


Example
Introduction

Brainf*ck

Lexer and Parser

Code Generators

Tools

Complex Code Generators


Code review of llregex.pl.
The BF Compiler

Stack Machines

The SPL Project http://www.clifford.at/papers/2004/compiler/llregex.pl


LL(regex) parsers
● Overview
● Left recursions
● Example

URLs and References

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 54/56


Introduction

Brainf*ck

Lexer and Parser

Code Generators

Tools URLs and References


Complex Code Generators

The BF Compiler

Stack Machines

The SPL Project

LL(regex) parsers

URLs and References


● URLs and References

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 55/56


URLs and References
Introduction n My Brainf*ck Projects:
Brainf*ck
http://www.clifford.at/bfcpu/
Lexer and Parser

Code Generators n The SPL Project:


Tools http://www.clifford.at/spl/
Complex Code Generators

The BF Compiler n Clifford Wolf:


Stack Machines http://www.clifford.at/
The SPL Project

LL(regex) parsers
n “The Dragonbook”
URLs and References Compilers: Principles, Techniques and Tools
● URLs and References
by Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman
Addison-Wesley 1986; ISBN 0-201-10088-6

n LINBIT Information Technologies


http://www.linbit.com/

http://www.clifford.at/papers/2004/compiler/

Clifford Wolf, December 22, 2004 http://www.clifford.at/papers/2004/compiler/ – p. 56/56

Das könnte Ihnen auch gefallen