Sie sind auf Seite 1von 8

Compiler Algorithm Language (CAL): An Interpreter and Compiler CS499 - B.Tech. Project Report

BTP Supervisor: Prof. Sanjeev K. Aggarwal

Abhinav Bhatele

Department of Computer Science and Engineering Indian Institute of Technology, Kanpur, INDIA 208 016 Email: bhatele@cse.iitk.ac.in

Abstract— Compiler Algorithm Language (CAL) has been designed to provide compiler writers with a language which is quite close to actual algorithms. In this project we have provided an interpreter for CAL which can be used by people for algorithm testing. We have also provided a compiler can translate CAL programs to C code which can be plugged elsewhere. We have provided a Graphical User Interface to make it convenient for the user to use our interpreter and compiler. Another feature is web-enabling of the entire project so that a remote user would not have to take the trouble of downloading the entire code. Altogether, we have tried to save the compiler writers from the trouble of writing lengthy programs for their algorithms!

Index Terms - ICAN, Compiler Algorithms, Interpreter, Com- piler, Syntax directed evaluation, Virtual Machine, GUI

I. INTRODUCTION

A. Problem Statement

The first part of the problem is to define a language similar to the Informal Compiler Algorithm Notation (ICAN)[1] for writing compiler algorithms and then to develop an interpreter for the same. The second part of the problem is to write a compiler to convert a program or a piece of code 1 written in this language to equivalent C code. Additions to these two main parts of the problem include providing a debugger, GUI and web-enabled version of the compiler.

B. Motivation

An important part of compiler design is development of faster and more efficient algorithms for parsing, optimization, translation and code generation. Researchers continue to invent new and better techniques for solving these conventional compiler design problems. Compiler writers in institutions or other scientific forums write these algorithms in a pseudo code which is easy to understand.

1 the smallest block which can be translated would be strictly defined

Shubham Satyarth

Department of Computer Science and Engineering Indian Institute of Technology, Kanpur, INDIA 208 016 Email: shubhams@cse.iitk.ac.in

Optimization is the heart of advanced compiler design. Optimizing the compiler requires an analysis of the programs which are to be compiled. The analysis includes - (1) Control Flow Analysis, (2) Data Flow Analysis, (3) Dependence Analysis, and (4) Alias Analysis. Such analysis again requires writing algorithms for concrete solutions to a given problem. Algorithms for such analysis are very complex and involve data structures like sets, tuples and records. Converting these algorithms to programs becomes an even more difficult task. One would like to quickly test the algorithms without getting into the trouble of writing the corresponding program. To relieve the researcher of the task of expressing their algorithms in languages like C, we propose to provide a lan- guage which can is very close to the notation used for writing algorithms. We would provide an interpreter for this language so that algorithms can be quickly checked and debugged. This language syntax would be very similar to algorithms but would help in clearly defining and homogenising the structure of the code which would be the input to the interpreter. A debugger would be an adjunct and help in the analysis of the algorithms themselves. Sometimes compiler writers may want to plug-in their new algorithms into already existing code. For this we would provide a compiler which takes an algorithm as an input and produces corresponding C code. To save the user the trouble of downloading and compiling the tool, we would also web- enable the tool. A user would be able to convert an algorithm in the aforementioned language to C code by the mere click of a mouse!

C. Related Work

Stark algorithm language was developed as a language for showing provable algorithms by Richard Stark in 1968. This was called an instructional algorithmic language [3]. INTERCAL was developed by Woods & Lyon in 1972. It has various implementations like INTERCAL-72, C-INTERCAL

etc [3]. Another instructional language was developed in 1974 at the Bowling Green State University. This was a hybrid of FORTRAN, PL/I and Algol 68 and was called LINUS (Language for INstructional USe). INTERCAL was the only one developed for general use like our language CAL. SETL (for Set Language) is the language which comes closest to CAL. It was developed by Robert B. K. Dewar in 1979 [4]. SETL allows a large variety of programming problem to be solved in an efficient manner.SETL is a very high-level language whose syntax and semantics are based on the standard set theoretical dictions of mathematics.

II. PROJECT DETAILS

THE LANGUAGE: The language which we have defined is called Compiler Algorithm Language or CAL in short. It is similar to ICAN (Informal Compiler Algorithm Notation) which is a notation commonly used for writing compiler algorithms. It derives features from many languages such as C

and Pascal and extends them with natural notations for objects like sets, tuples, sequences, functions and arrays.

A CAL file consists of a series of type definitions, followed

by a series of variable declarations, procedure declarations and an optional main program. The syntax is so designed that every compound statement includes an ending delimiter. As a result, separators are not needed between statements. Tabs, comments and blanks are considered as white spaces. The generic simple types are boolean, integer, real, and character. It supports most of the control flow state- ments like if, case, while, and for. INTERPRETER: The interpreter which we have provided will save the user the trouble of converting the algorithm to a high level language for checking its correctness. The interpreter will do a line by line interpretation of the program and give the desired output. The front end of the interpreter consists of a lexical and syntax analyzer. The back end of the interpreter also called the executor[2] does a syntax directed evaluation of the program. COMPILER: The second phase of the project was to develop a compiler which would convert any algorithm written in CAL to a piece of C code. It would be possible to plug-in this C code into a larger program. The compiler uses the front

end of the interpreter as it is. The back end does the translation to C code. GUI and WEB ENABLING: The most effective interface for a user would be a graphical one which can encompass the interpreter and the compiler as well. The GUI would have an editor and a prompt for the interpreter. There would be options for debugging and compilation.

A user may find it cumbersome to download the compiler

code and install it. For such users we propose to have a web

enabled interface. A user will be able to compile his code through a script running on a http server.

III. COMPILER ALGORITHM LANGUAGE AND ITS FEATURES

A CAL program consists of a series of type definitions,

followed by a series of variable declarations, procedure dec-

larations and an optional main program. A typical program looks like:

<type definitions> <variable declarations> <procedure declarations> <optional main program>

A type definition consists of a type name followed by an

equals sign and a type expression, such as

inset = set of integer

A. Data Types

The language will have four generic simple data types: 1. Integers, 2. Reals, 3. Characters, 4. Boolean. The constructed data types which will be an integral part of this language are 1. Arrays, 2. Sets, 3. Function.

A variable declaration consists of the name of the variable,

followed by an optional initialization, followed by a colon and

the variable’s type, e.g.,

is := 1,3,3: intset

B. Operators

An expression is either a constant, a variable, nil, a unary operator followed by an expression or two expressions separated by two operators. The operands and operators should be of compatible types. Here, we provide a list of operators in the language.

1) Unary Operators for basic data types:

! Negation of booleans

- Negation of integers and reals

2) Basic Operators for basic data types:

+

- Subtraction

*

/

Addition

Multiplication

Division

%

Remainder

Exponentiation

&

! Not

And

Or

3) Unary Operators for composite data types:

Gives an arbitrary element of a set

4) Binary Operators for composite data types:

UNION

INTERSECTION Intersection

DIFF

INSET

NOTINSET

Union

Difference Belongs to the set Does not belong to the set Specifies the mapping of the function

Fig. 1. Symbol Table 5) Conditional Operators: Equal Not Equal to Greater than Less than

Fig. 1.

Symbol Table

5) Conditional Operators:

Equal Not Equal to Greater than Less than Greater than or equal to Less than or equal to

The language also provides two quantifiers: 1. FORALL and 2. EXISTS.

C. Simple Statements

The language provides basic statements like assignment, procedure call, return, goto and I/O statements.

D. Control flow statements

The language supports all basic control flow statements like:

language supports all basic control flow statements like: Fig. 2. Symbol Table set to true until

Fig. 2.

Symbol Table

set

to

true

until

where

while

Some additional keywords (apart from the ones in ICAN) are:

DIFF

EXISTS

FORALL

NOTINSET

NULL

INSET

OUTPUT

UNION

INPUT

INTERSECTION

IV. INTERPRETER - IMPLEMENTATION DETAILS

We have provided an interpreter for this language. It is

if-then-else

a

one pass interpreter that does syntax directed evaluation.

switch-case

The concept of the interpreter is similar to a virtual machine

for loop

that simulates a processor and memory. For procedure calls

while loop

and loops, the virtual machine uses different objects of the

repeat-until loop

executor.

E. Procedures

A procedure declaration consists of the procedure’s name, its parameter list in parenthesis, its optional return type, followed by its parameter declarations and the body. The procedure body starts with a begin, has some statements in between and finishes with a end. A procedure call follows the same pattern as in other high level languages.

F. Keywords

The various keywords in CAL are:

array

begin

boolean

by

case

character

default

do

each

elif

else

end

esac

false

for

goto

if

in

inout

integer

nil

od

of

out

procedure

real

repeat

return

returns

A. Symbol Table

We use a symbol table which contains the references to the

location of the actual values. Each global variable encountered

is directly stored in the symbol table while each procedure

has an entry in the symbol table with a pointer to a file which stores the actual code. This is done to avoid a second pass on the procedures. Symbol table for a procedure is initialized as soon as the procedure is encountered. When there is a procedure call in main(), the code for that procedure is again parsed and this time executed. There are two levels of the symbol table. Outer level entries just have the lexeme and a pointer to the location where the actual value is stored. Different linked lists for different data types like ArrList, SetList etc. Each entry in the symbol table points to a particular entry in one of these linked lists.

B. Implementation

We have used java as our implementation language. This has facilitated the use of rich set of java collection framework for data storage. The use of java has also simplified type handling by using Object as the universal type and then typecasting it to the required type. Use of java also aids GUI development and web-enabling. The tools we have used for lexical analyzer and parser generator are called jflex[5] and cup[6] respectively. The front end of the interpreter consists of the Lexical Analyser (lexer.java) and Syntax Analyser (parser.java). The back end or the executor consists of the Symbol Table cum Scratch Memory (SymbolTable.java), Virtual Machine (Vir- tualMachine.java) and The Interpreter (MainProg.java).

V. COMPILER - IMPLEMENTATION DETAILS

The compiler converts a CAL program into a C code which is not a complete program but a set of functions which can be used in another program. The compiler has also been made using the grammar built for the interpreter. The compiler converts the data types like char, int, real into basic data types in C and the rest into structures like linked lists, arrays etc. We have created a header files setheader.h for the data structures required to intialise a complex data types and various functions to do operations on these data structures. Each function is converted into a C function with the same name while the main() function is converted into a function called the main(). It is this function together with the previously declared functions which can be plugged into some other C program.

A. Implementation

The front end of the compiler again consists of the Lexical Analyser (lexer.java) and Syntax Analyser (translator.java). The back end uses files like editor.java and FileCreator.java to output the file with the C program. The file generated is called ’prog.c’.

VI. EASY-TO-USE INTERFACES

is called ’prog.c’. VI. E ASY - TO - USE I NTERFACES Fig. 3. GUI for

Fig. 3.

GUI for CAL Interpreter and Compiler

B. Graphical User Interface

The GUI has been made in java with the help of netbeans

and standard swing and awt api’s. The various functionalities available with the GUI (MainPanel.java) are:

1. File Handling - New, Open, Save, Save As

2. File Editing - Copy, Cut, Paste, Delete, Find, Find Next and

Replace

3. Interpreter - Execute

4. Compiler - Translate

5. Other Options - Go to line, Select All, Set Background

Color

6. Help - Help Contents and About CAL Interpreter

The GUI (figure 03 above) has a editor pane for the file and a output box wherein we can see the output of the interpreter or the translated C program.

VII. SOME ILLUSTRATIVE EXAMPLES

Before we move on, let us look at a few examples of how easy it is to write programs in CAL. The CAL code for finding closure of a set of items (used by a parser) looks like this:

The idea behind interfaces like Web and GUI is to save

G: set of Productions procedure closure(I) returns set of Items

the user of our interpreter and translator from the trouble of

I:

in

set of

Items

downloading and compiling the source code.

begin

J:

set of Items

 

A. Web-enabling

X: Items

 

Y: Production

 

We have used the apache tomcat server to run our inter-

J

:=

I

preter and translator on the server side. This receives requests

for each

X

INSET J

(ServletUtilities.java) from servlets when a remote user posts

do

data on the form. The data is passed on to the interpreter and/or the translator

for each Y do

INSET G

which then produces the output (ShowParameters.java) which is passed back to the client side.

if .Y NOTINSET J then

end

fi

od

od

return J

J

:=

J

+

.Y

However the corresponding C code for the above algorithm is rather involved and looks something like this:

struct Items

//Code for Items Items *next;

struct Production

//Code for Production Productions *next;

struct setofItems

Items head;

struct setofProductions

Y=Y->next;

X=X->next;

return J;

Lets look at another example which illustrates input and output also:

procedure exam(x,y) returns boolean

x,

y: out integer

begin is: in intset

INPUT is; tv := true: boolean z: integer

z

for each

INSET is

end

if

fi

x

return tv

z

=

then

od return y INSET is || exam

(z

>

0)

do

The corresponding C code is:

Production head; *G;

 

boolean exam(int x, int y)

// Initialize G

void setofItems closure(setofItems I)

int n; printf("Enter the size of the set");

scanf("%d",n);

 

setofItems *J = I;

 

int is[n];

setofProductions *K;

printf("Enter the set");

K=G;

for(int i=0;i<n;i++)

Items *X;

 

scanf("%d",is[i]);

Items *X1; Production *Y;

 

boolean tv = true; for(int i=0;i<n;i++)

X = J->head;

Y = G->head

if(is[i]==x)

while(X!=NULL)

return tv;

while(Y!=NULL)

 

for(int i=0;i<n;i++)

 

Productions Z = K->head; while(K!=NULL)

if(is[i]==y) return true;

 

if(.Y==Z)

 

return false;

 

X1

=

Z;

X1->next = J->head;

 

J->head = X1;

VIII. CONCLUSION

 

We faced a lot of challenges while doing this project

Z=Z->next;

 

because it was a totally self-implemented idea. Due to one pass

 

nature of the interpreter, handling of loops and procedures in

syntax directed evaluation was difficult and involved many is- sues. Special handling was required for loops and procedures. The size of the code for various phases is quite large. The parser for the interpreter runs into 7000 lines whereas for the translator runs into 5000 lines. The code for the GUI is also about 700 lines. The lexical analyser takes more than 30 tokens and the grammar for our language has above 200 productions. The grammar is provided in the appendix. We have tested our interpreter and compiler for a huge and diverse variety of programs. We have tested all the implemented data types and loops and other structures. Some sample programs which have been tested are also listed in the appendix. We have completed the interpreter and compiler for CAL. We have also developed a GUI for both the interpreter and compiler. We have also web-enabled the entire project. We had also intended to provide a debugger as an adjunct but could not do that.

Future work on the project is possible in terms of imple- menting more data types in our language and providing a debugger in addition.

ACKNOWLEDGMENT

The authors would like to thank Prof. Sanjeev K. Aggarwal, their BTP guide for his constant help and guidance throughout the term of the project. Without his invaluable support, they could not have completed the project.

REFERENCES

[1] Steven S. Muchnick, Advanced Compiler Design and Implementation, Morgan Kaufmann Publishers, Inc., 1997 [2] Ronald Mak, Writing Compilers and Interpreters, John Wiley and Sonc, Inc., 1996 [3] http://hopl.murdoch.edu.au/findlanguages2.prx?NodeID=1125000&which =byMyCat [4] Robert B. K. Dewar, The SETL Programming Language, 1979 [5] http://jflex.de/index.html [6] http://www.cs.princeton.edu/ appel/modern/java/CUP/

APPENDIX

Some sample programs which were used for testing are:

Program 01:

procedure abhinav(a, b) returns boolean a: in integer b: in integer begin i: integer while (a>=b)

do

a:=a-b

od

if

then

(a

=

0)

return true

else

return false

fi

end

procedure main() begin cond: boolean cond := abhinav(8, 2) OUTPUT cond

end

Program 02:

procedure main() begin ab: array[1

i : integer

i:=

while(i<5)

do

0

ab[i]:=i

i:=i+1

od

INPUT ab

OUTPUT ab

end

Program 03:

5]

of integer

procedure main()

begin ab: set of integer ac: set of integer ad: set of integer

end

INPUT ab

if

then

(i INSET ab)

OUTPUT ac

else

OUTPUT ad

fi

OUTPUT ab

We now present the entire grammar here which we have developed for our language:

Syntax of whole CAL programs:

program

typedef list vardecl list procdecl list

mainprog

procbody

typedef list vardecl list procdecl list mainprog

Syntax of CAL type definitions:

typedef list typedef typedef list NIL typedef typename list typeexpr typename list ID EQUAL typename list ID EQUAL

typeexpr comp typeexpr other typeexpr other typeexpr simple typeexpr constr typeexpr LPAREN typeexpr RPAREN NULL simple typeexpr BOOLEAN INTEGER REAL CHARACTER constr typeexpr enum typeexpr array typeexpr set typeexpr seq typeexpr record typeexpr comp typeexpr tuple typeexpr union typeexpr func typeexpr enum typeexpr ENUM LBRACE id list RBRACE array typeexpr ARRAY LBRACK arraybound list RBRACK OF typeexpr arraybound list arraybound COMMA arraybound list arraybound arraybound expr LIMIT expr LIMIT set typeexpr SET OF typeexpr seq typeexpr SEQUENCE OF typeexpr tuple typeexpr other typeexpr MULT tuple typeexpr other typeexpr MULT other typeexpr record typeexpr RECORD LBRACE rec decllist RBRACE rec decllist rec decl COMMA rec decllist rec decl rec decl id list COLON typeexpr union typeexpr other typeexpr UNION union typeexpr other typeexpr UNION other typeexpr

tuple typeexpr ARROW lft

func typeexpr lft typeexpr

Syntax of CAL declarations:

i

: integer

procdecl list procdecl procdecl list procdecl PROCEDURE ID paramlist RETURNS typeexpr paramdecl list procbody PROCEDURE ID paramlist paramdecl list procbody

i:=

0

ac

:= 2,3,5

ad := 5,7,9

paramlist LBRACK id list RBRACK LBRACK RBRACK paramdecl list paramdecl list paramdecl

paramdecl id list COLON IN typeexpr id list COLON OUT typeexpr id list COLON INOUT typeexpr procbody BEGIN vardecl list statement list END vardecl list vardecl vardecl list NIL vardecl varlist COLON typeexpr varlist var COMMA varlist var var ID EQTO constexpr ID

constexpr

NUM

Syntax of CAL statements:

statement list statement list statement statement assignstmt procfuncstmt returnstmt gotostmt ifstmt casestmt whilestmt forstmt repeatstmt ID COLON statement assignstmt leftsidelist EQTO expr leftsidelist binaryop EQUAL expr leftsidelist leftside EQTO leftsidelist leftside

leftside ID arrayelt seqelt tupleelt recordelt funcelt arrayelt ID arrlist arrlist LBRACK explist RBRACK arrlist LBRACK explist RBRACK seqelt ID ELOFSEQ seqlist seqlist expr ELOFSEQ seqlist expr tupleelt ID AT tuplelist tuplelist expr AT tuplelist expr recordelt ID DOT recordlist recordlist expr DOT recordlist expr

funcelt

ID funclist

funclist

arglist funclist arglist

procfuncstmt procfuncexpr returnstmt RETURN expr RETURN gotostmt GOTO ID ifstmt IF expr THEN statement list eliflist lft elif lft elif ELSE statement list FI FI eliflist ELIF statement list eliflist casestmt CASE expr OF caselist DEFAULT COLON

statement list ESAC CASE expr OF caselist ESAC caselist ID COLON statement list caselist ID COLON statement list whilestmt WHILE expr DO statement list OD

forstmt

iterator ID EQTO expr TO expr EACH id list INSET expr ID EQTO expr TO expr LPAREN expr RPAREN EACH id list INSET expr LPAREN expr RPAREN EACH id list INSET typeexpr LPAREN expr RPAREN repeatstmt REPEAT statement list UNTIL expr

FOR iterator DO statement list OD

Syntax of CAL expressions:

expr other expr comp expr arrayeltexpr other expr simpleconst LPAREN expr RPAREN unaryop expr arrayexpr setexpr procfuncexpr sizeexpr quantexpr ID comp expr other expr binaryop expr other expr INSET

typeexpr unaryop NOT MINUS TILDE ; binaryop EQUAL NOTEQ AND OR PLUS MINUS MULT FSLASH MOD LT LTEQ GT GTEQ UNION INTERSECTION INSET NOTINSET ELOFSEQ SEQCONCAT AT arrayexpr LBRACK explist RBRACK seqexpr LBRACK explist RBRACK setexpr LBRACE explist RBRACE tupleexpr LT explist GT recordexpr LT idexplist GT procfuncexpr ID arglist arglist LPAREN explist RPAREN LPAREN RPAREN arrayeltexpr other expr LBRACK explist RBRACK comp expr LBRACK explist RBRACK quantexpr EXISTS ID INSET l fact quant FORALL ID INSET l fact quant l fact quant expr LPAREN expr RPAREN typeexpr LPAREN expr RPAREN sizeexpr OR expr OR idexplist ID COLON expr COMMA idexplist ID COLON expr simpleconst intconst boolconst charconst explist expr COMMA explist expr id list ID COMMA id list ID intconst NUM charconst ASCI boolconst TRUE FALSE