Compiler Design - Software Design Project

Dedication
We dedicate this work ..
to the great persons Founded through the history giving

everything to their nations and they didn’t take any things.!
Also to any person around this world that searching about

the pure love and pure Peace among the civilizations and
he think this love can save this world from it’s narrow
disputation.!
SE4 compiler design workgroup
SE4 Compiler Design Documentation Page 2

Contents
Chapter One
1 Introduction ………………………………………………………………..… 7
1.1 Preface ………………………………………………………………… 7
1.2 Project proposal …………………………………………………………7
1.3 The plan ……………………………………………………………..7
1.4 Setting Milestones ………………………………………………………8
1.5 Report Overview ……………………………………………………….. 8
Chapter Two
2 Compiler Analysis …………………………………………………………..10
2.1 What Is a Compiler ……………………………………………………11
2.2 History of a Compiler ………………………………………………….11
2.3 Compiler Vs Interpreter ……………………………………………….12
2.4 Compiler Phases ……………………………………………………….12
2.4.1 Front End Phases ………………………………………………..13
2.4.2 Back End Phases …………………………………………………14
2.5 Object Oriented development …………………………………………15
2.6 Compilation Processes …………………………………………………15
2.6.1 Lexical Analyzer Phase ……………………………………………16
2.6.2 Syntax Analyzer Phase ……………………………………………16
2.6.2.1 Context Free Grammar ( CFG ) …………………………...17
2.6.2.2 Derivation the Parse Tree …………………………………17
2.6.3 Symantec Analyzer Phase …………………………………………18
2.6.3.1 Types and Declarations ……………………………………19
2.6.3.2 Type Checking ……………………………………………..19
2.6.3.3 Scope Checking ……………………………………………19

2.6.4 Intermediate Code Generator Phase ……………………………...19
2.6.5 Code Optimizer Phase ……………………………………………..20
2.6.5.1 Types of optimizations …………………………………….21
2.6.6 Code Generator Phase ……………………………………………..21
2.6.7 Error Handling methods …………………………………………..22
2.6.7.1 Recovery from the lexical phase errors ……………………23
2.6.7.2 Recovery from the Symantec phase errors …………….…23
2.6.8 Symbol Table & Symbol Table Manger …………………………..23

Chapter Three
3 Compiler Design

3.1 Compiler Design ……………………………………………………….25
3.2 The Context of the System……………………………………………..25
3.2.1 Compiler Architectural Design……………………………………25
3.2.1.1 Compiler Structuring………………………………………..25
3.2.1.1.1 Compiler Organization……………………………..26
3.2.1.2 Compiler Control Modeling………………………………….27
3.2.1.3 Compiler Modular Decomposition…………………………..28
3.2.1.4 Compiler Domain Specific Architectures…………………….28
3.3 Compiler USE Case Models……………………………………………..28
3.3.1 Lexical Analyzer USE Case Diagram………………………………29
3.3.2 Syntax Analyzer USE Case Diagram………………………………29
3.3.3 Symantec Analyzer USE Case Diagram…………………………..30
3.3.4 I.R Code Generator USE Case Diagram…………………………..30
3.3.5 I.R Code Optimizer USE Case Diagram…………………………..31
3.3.6 Code Generator USE Case Diagram……………………………..32
3.3.7 Symbol Table Manager USE Case Diagram……………………...32
3.3.8 Error Handler USE Case Diagram………………………………..33


3.4 Compiler’s Subsystems Activity Diagrams………………………........33
3.4.1 Lexical Analyzer Activity Diagram………………………………34
3.4.2 Syntax Analyzer Activity Diagram………………………………35
3.4.3 Symantec Analyzer Activity Diagram………………………..…36
3.4.4 I.R Code Generator Activity Diagram………………………..…37
3.4.5 I.R Code Optimizer Activity Diagram………………………..….38
3.4.6 Code Generator Activity Diagram………………………………39
3.4.7 Symbol Table Manager Activity Diagram……………………….40
3.3.8 Error Handler USE Case Diagram………………………………..41
3.5 Compiler Class Diagram………………………………………………..42
3.6 Compiler sequence Diagram……………………………………….….44
3.6.1 Lexical analyzing sequence diagram………………………..….44
3.6.2 syntax analyzing sequence diagram……………………………45
3.6.3 Symantec Analyzing sequence diagram……………………..…46
3.6.4 I.R Code Generation sequence diagram………………………..47
3.6.5 I.R Code Optimizing sequence diagram…………………………48
3.6.6 Code Generation sequence diagram…………………………...49
3.6.7 Additional sequence diagram…………………………………...50
Glossary ………………………………………………………51
Bibliography ………………………………………………..…54


Introduction

1.1 Preface

Compiler design and construction is an exercise in engineering design. The compiler writer must
choose a path through a decision space that is filled with diverse alternatives, each with distinct costs,
advantages, and complexity. Each decision has an impact on the resulting compiler. The quality of the end
product depends on informed decisions at each step of way.

This project or a short study tries to explore the design space – to present some of the ways that used
to design or construct a compiler, this is done side by side with the principles of the software engineering
phases, and we have a main of advantage that we doesn't find in any book that illustrate the compilers
construction, this advantage is the UML ( unified modeling language ) diagrams that illustrate the compiler
development process from many sides and also describe the small procedures that making the subsystems of
the compiler and the communication between them that lead to produce a target file of the compiler.

1.2 Project proposal
In the early days, the approach taken to compiler design used to be directly affected by the complexity of
the processing, the experience of the person(s) designing it, and the resources available.
A compiler for a relatively simple language written by one person might be a single, monolithic piece of
software. When the source language is large and complex, and high quality output is required the design may
be split into a number of relatively independent phases. Having separate phases means development can be
parceled up into small parts and given to different people. It also becomes much easier to replace a single
phase by an improved one, or to insert new phases later (eg, additional optimizations).
So, Our Project is to design a small functional compiler by take in consideration the above hints. Implement
the software design principles in our work and establishing it step by step till the end of this development.
1.3 The plan

We planned to make some common tasks that every software developer does on his/her work plan (
information gathering, analysis this information, designing the basic diagrams, and implement his/her project
).


1.4 Setting Milestones
In order to establish a sentiment of progression towards the goals of the project, certain milestones have
been set :
Implementation of the Object Oriented Strategy in the analysis phase of the development.
Implementation of the Object Oriented Strategy in the design phase of the development.
Implementation of the UML ( unified modeling language ) Diagrams in the Design phase as
follow …
o Implementation of the Use Case Diagrams.
o Implementation of the Activity Diagrams.
o Implementation of the Class Diagrams and the relations between this classes.
o Implementation of the Sequence Diagrams.
1.5 Report Overview

Chapter 1 A brief overview of what will follow in the report is given here.
Chapter 2 ( Compiler Analysis ) Describe why we use the object oriented technique in the analysis phase.
how the Compiler is produced, perception by giving the illustration of the theoretical concepts and principles
of compiler phases analysis.
Chapter 3 ( Compiler Design ) describes the main design aspects of the project commencing with the
Requirements Analysis and concluding with specific designing of project parts, and giving the main diagrams
during the design phase to simplify the operation of implementation that done by the programmers.


Chapter Two
Compiler
Analysis

2.1 What Is a Compiler ?

A compiler is a special type of computer program that translates a human readable text file into a form that the
computer can more easily understand. At its most basic level, a computer can only understand two things, a 1 and a 0.
At this level, a human will operate very slowly and find the information contained in the long string of 1s and 0s
incomprehensible. A compiler is a computer program that bridges this gap.

In the beginning, compilers were very simple programs that could only translate symbols into the bits, the 1s
and 0s, the computer understood. Programs were also very simple, composed of a series of steps that were originally
translated by hand into data the computer could understand. This was a very time consuming task, so portions of this
task were automated or programmed, and the first compiler was written. This program assembled, or compiled, the
steps required to execute the step by step program.

These simple compilers were used to write a more sophisticated compiler. With the newer version, more rules
could be added to the compiler program to allow a more natural language structure for the human programmer to
operate with. This made writing programs easier and allowed more people to begin writing programs. As more people
started writing programs, more ideas about writing programs were offered and used to make more sophisticated
compilers. In this way, compiler programs continue to evolve, improve and become easier to use.

Compiler programs can also be specialized. Certain language structures are better suited for a particular task
than others, so specific compilers were developed for specific tasks or languages. Some compilers are multistage or
multiple pass. A first pass could take a very natural language and make it closer to a computer understandable
language. A second or even a third pass could take it to the final stage, the executable file.

The intermediate output in a multistage compiler is usually called pseudo‐code, since it not usable by the
computer. Pseudo‐code is very structured, like a computer program, not free flowing and verbose like a more natural
language. The final output is called the executable file, since it is what is actually executed or run by the computer.
Splitting the task up like this made it easier to write more sophisticated compilers, as each sub task is different. It also
made it easier for the computer to point out where it had trouble understanding what it was being asked to do.

Errors that limit the compiler in understanding a program are called syntax errors. Errors in the way the
program functions are called logic errors. Logic errors are much harder to spot and correct. Syntax errors are like
spelling mistakes, whereas logic errors are a bit more like grammatical errors.

Cross compiler programs have also been developed. A cross compiler allows a text file set of instructions that is
written for one computer designed by a specific manufacturer to be compiled and run for a different computer by a
different manufacturer. For example, a program that was written to run on an Intel computer can sometimes be cross
compiled to run a on computer developed by Motorola. This frequently does not work very well. At the level at which
computer programs operate, the computer hardware can look very different, even if they may look similar to you.

Cross compilation is different from having one computer emulate another computer. If a computer is emulating
a different computer, it is pretending to be that other computer. Emulation is frequently slower than cross compilation,
since two programs are running at once, the program that is pretending to be the other computer and the program
that is running. However, for cross compilation to work, you need both the original natural language text that describes
the program and a computer that is sufficiently similar to the original computer that the program can function on to
run on a different computer. This is not always possible, so both techniques are in use.

2.2 History of a Compiler
Software for early computers was primarily written in assembly language for many years. Higher level
programming languages were not invented until the benefits of being able to reuse software on different kinds of CPUs
started to become significantly greater than the cost of writing a compiler. The very limited memory capacity of early
computers also created many technical problems when implementing a compiler.

Towards the end of the 1950s, machine‐independent programming languages were first proposed.
Subsequently, several experimental compilers were developed. The first compiler was written by Grace Hopper, in
1952, for the A‐0 programming language. The FORTRAN team led by John Backus at IBM is generally credited as having
introduced the first complete compiler, in 1957. COBOL was an early language to be compiled on multiple
architectures, in 1960.

In many application domains the idea of using a higher level language quickly caught on. Because of the
expanding functionality supported by newer programming languages and the increasing complexity of computer
architectures, compilers have become more and more complex.

Early compilers were written in assembly language. The first self‐hosting compiler — capable of compiling its
own source code in a high‐level language — was created for Lisp by Tim Hart and Mike Levin at MIT in 1962. Since the
1970s it has become common practice to implement a compiler in the language it compiles, although both Pascal and C
have been popular choices for implementation language. Building a self‐hosting compiler is a bootstrapping problem—
the first such compiler for a language must be compiled either by a compiler written in a different language, or (as in
Hart and Levin's Lisp compiler) compiled by running the compiler in an interpreter.

2.3 Compiler Vs Interpreter

We usually prefer to write computer programs in languages we understand rather than in machine
language, but the processor can only understand machine language. So we need a way of converting our
instructions (source code) into machine language. This is done by an interpreter or a compiler.

An interpreter reads the source code one instruction or line at a time, converts this line into machine
code and executes it. The machine code is then discarded and the next line is read. The advantage of this is
it's simple and you can interrupt it while it is running, change the program and either continue or start again.
The disadvantage is that every line has to be translated every time it is executed, even if it is executed many
times as the program runs. Because of this interpreters tend to be slow. Examples of interpreters are Basic on
older home computers, and script interpreters such as JavaScript, and languages such as Lisp and Forth.

A compiler reads the whole source code and translates it into a complete machine code program to
perform the required tasks which is output as a new file. This completely separates the source code from the
executable file. The biggest advantage of this is that the translation is done once only and as a separate
process. The program that is run is already translated into machine code so is much faster in execution. The
disadvantage is that you cannot change the program without going back to the original source code, editing

that and recompiling (though for a professional software developer this is more of an advantage because it
stops source code being copied). Current examples of compilers are Visual Basic, C, C++, C#, Fortran, Cobol,
Ada, Pascal and so on.

You will sometimes see reference to a third type of translation program: an assembler. This is like a compiler,
but works at a much lower level, where one source code line usually translates directly into one machine
code instruction. Assemblers are normally used only by people who want to squeeze the last bit of
performance out of a processor by working at machine code level.

2.4 Compiler Phases
In the early days, the approach taken to compiler design used to be directly affected by the
complexity of the processing, the experience of the person(s) designing it, and the resources available.

A compiler for a relatively simple language written by one person might be a single, monolithic piece
of software. When the source language is large and complex, and high quality output is required the design
may be split into a number of relatively independent phases. Having separate phases means development
can be parceled up into small parts and given to different people. It also becomes much easier to replace a
single phase by an improved one, or to insert new phases later (eg, additional optimizations).
2.4.1 Front End Phases

The front end analyzes the source code to build an internal representation of the program, called the
intermediate representation or IR. It also manages the symbol table, a data structure mapping each symbol in
the source code to associated information such as location, type and scope. This is done over several phases,
which includes some of the following:
1. Line reconstruction. Languages which strop their keywords or allow arbitrary spaces within identifiers
require a phase before parsing, which converts the input character sequence to a canonical form
ready for the parser. The top‐down, recursive‐descent, table‐driven parsers used in the 1960s typically
read the source one character at a time and did not require a separate tokenizing phase. Atlas
Autocode, and Imp (and some implementations of Algol and Coral66) are examples of stropped
languages whose compilers would have a Line Reconstruction phase.

2. Lexical analysis breaks the source code text into small pieces called tokens. Each token is a single
atomic unit of the language, for instance a keyword, identifier or symbol name. The token syntax is
typically a regular language, so a finite state automaton constructed from a regular expression can be

used to recognize it. This phase is also called lexing or scanning, and the software doing lexical analysis
is called a lexical analyzer or scanner.
3. Preprocessing. Some languages, e.g., C, require a preprocessing phase which supports macro
substitution and conditional compilation. Typically the preprocessing phase occurs before syntactic or
semantic analysis; e.g. in the case of C, the preprocessor manipulates lexical tokens rather than
syntactic forms. However, some languages such as Scheme support macro substitutions based on
syntactic forms.

4. Syntax analysis involves parsing the token sequence to identify the syntactic structure of the program.
This phase typically builds a parse tree, which replaces the linear sequence of tokens with a tree
structure built according to the rules of a formal grammar which define the language's syntax. The
parse tree is often analyzed, augmented, and transformed by later phases in the compiler.

5. Semantic analysis is the phase in which the compiler adds semantic information to the parse tree and
builds the symbol table. This phase performs semantic checks such as type checking (checking for type
errors), or object binding (associating variable and function references with their definitions), or
definite assignment (requiring all local variables to be initialized before use), rejecting incorrect
programs or issuing warnings. Semantic analysis usually requires a complete parse tree, meaning that
this phase logically follows the parsing phase, and logically precedes the code generation phase,
though it is often possible to fold multiple phases into one pass over the code in a compiler
implementation.

2.4.2 Back End Phases

The term back end is sometimes confused with code generator because of the overlapped
functionality of generating assembly code. Some literature uses middle end to distinguish the generic analysis
and optimization phases in the back end from the machine‐dependent code generators.

The main phases of the back end include the following:

1. Analysis: This is the gathering of program information from the intermediate representation derived
from the input. Typical analyses are data flow analysis to build use‐define chains, dependence
analysis, alias analysis, pointer analysis, escape analysis etc. Accurate analysis is the basis for any
compiler optimization. The call graph and control flow graph are usually also built during the analysis
phase.


2. Optimization: the intermediate language representation is transformed into functionally equivalent
but faster (or smaller) forms. Popular optimizations are inline expansion, dead code elimination,
constant propagation, loop transformation, register allocation or even automatic parallelization.

3. Code generation: the transformed intermediate language is translated into the output language,
usually the native machine language of the system. This involves resource and storage decisions, such
as deciding which variables to fit into registers and memory and the selection and scheduling of
appropriate machine instructions along with their associated addressing modes (see also Sethi‐Ullman
algorithm).

2.5 Object Oriented development

Object‐oriented strategy is a development strategy where system analyzers ,designers and
programmer think in terms of ‘things’ instead of operations or functions. The executing system is made up of
interacting objects that maintain their own local state and provide operations on that state information .
They hide information about the representation of the state and hence limit access to it. An object‐oriented
development shorty described as the following hints ...
• Object‐oriented analysis is concerned with developing an object‐oriented model of the
application domain. The identified objects reflect entities and operations that are associated with the
problem to be solved. And here with our project , we use this stage of development to analysis our compiler
because we are seeing that the object oriented analysis stage is the best stage to collect our data from the
previous compilers or from any references books that talk about the compiler , because as we know the
compiler is a software system and here the analysis stage of our project doesn’t completed without using the
correctly if we use another strategy to analyses it.
After this shortly illustration of the object oriented development process , we will give the full data
about the terminologies in our work, and illustrate each one carefully, because this is our outline information
that we can to start our compiler design depending to it.
• Object‐oriented design is concerned with developing an object‐oriented model of a software
system to implement the identified requirements. The objects in an object‐oriented design are related to the
solution to the problem that is being solved. There may be close relationships between some problem objects
and some solution objects but the designer inevitably has to add new objects and to transform problem
objects to implement the solution.
• Object‐oriented programming is concerned with realising a software design using an object‐
oriented programming language. An object‐oriented programming language, such as Java, supports the direct
implementation of objects and provides facilities to define object classes.

2.6 Compilation Processes

Source Code
Compilation refers to the compiler's process of translating a high‐level
language program into a low‐level language program. This process is very Lexical Analyzer Phase
complex; hence, from the logical as well as an implementation point of view, it
is customary to partition the compilation process into several phases, which
Syntax Analyzer Phase
are nothing more than logically cohesive operations that input one
representation of a source program and output another representation.
Symantic Analyzer Phase
A typical compilation, broken down into phases, is shown in Figure 1.1.
Intermediate Code

Generator Phase
2.6.1 Lexical Analyzer Phase

Code Optimizer Phase
Lexical analyzer is the part that process where the stream of
characters making up the source program is read from left‐to‐right and Code Generator Phase
grouped into tokens. `
Tokens are sequences of characters with a collective meaning. There are Object Code
usually only a small number of tokens for a programming language: constants
(integer, double, char, string, etc.), operators (arithmetic, relational, logical), Figure 2.1
punctuation, and reserved words. A simple example of lexical analyzer is shown
in Figure 1.2.

Figure 2.2

The lexical analyzer takes a source program as input, and produces a stream of tokens as
Output, this process could the Tokenization process. The lexical analyzer might recognize particular instances
of tokens such as 3 or 255 for an integer constant token, "Fred" or "Wilma" for a string constant token,
numTickets or queue for a variable token Such specific instances are called lexemes. A lexeme is the actual
character sequence forming a token, the token is the general class that a lexeme belongs to. Some tokens
have exactly one lexeme (e.g., the > character); for others, there are many lexemes (e.g., integer constants).


2.6.1.1 Regular Expression Notation
Regular expression notation can be used for specification of tokens because tokens constitute a
regular set. It is compact, precise, and contains a deterministic finite automata (DFA) that accepts the
language specified by the regular expression.
The DFA is used to recognize the language
specified by the regular expression notation,
making the automatic construction of recognizer
of tokens possible. Therefore, the study of
regular expression notation and finite automata
becomes necessary. And here is a simple Regular
Expression that used to separate any kinds of
tokens as shown in the Figure 1.3.

Figure 2.3
2.6.2 Syntax Analyzer Phase

In the syntax‐analysis phase, a compiler verifies whether or not the tokens generated by the lexical
analyzer are grouped according to the syntactic rules of the language. If the tokens in a string are grouped
according to the language's rules of syntax, then the string of tokens generated by the lexical analyzer is
accepted as a valid construct of the language; otherwise, an error handler is called. Hence, two issues are
involved when designing the syntax‐analysis phase of a compilation process:

1. All valid constructs of a programming language must be specified; and by using these
specifications, a valid program is formed. That is, we form a specification of what tokens the
lexical analyzer will return, and we specify in what manner these tokens are to be grouped so that
the result of the grouping will be a valid construct of the language.

2. A suitable recognizer will be designed to recognize whether a string of tokens generated by the
lexical analyzer is a valid construct or not.

Therefore, suitable notation must be used to specify the constructs of a language. The notation for
the construct specifications should be compact, precise, and easy to understand. The syntax‐structure

specification for the programming language (i.e., the valid constructs of the language) uses context‐free
grammar (CFG), because A regular expression is not powerful enough to represent languages which require
parenthesis matching to arbitrary depths. And for certain classes of grammar, we can automatically construct
an efficient parser that determines if a source program is syntactically correct. Hence, CFG notation is
required topic for study.

2.6.2.1 Context Free Grammar ( CFG )
CFG notation specifies a context‐free language that consists of terminals, nonterminals, a start
symbol, and productions. The terminals are nothing more than tokens of the language, used to form the
language constructs. Nonterminals are the variables that denote a set of strings. For example, S and E are
nonterminals that denote statement strings and expression strings, respectively, in a typical programming
language. The nonterminals define the sets of strings that are used to define the language generated by the
grammar. They also impose a hierarchical structure on the language, which is useful for both syntax analysis
and translation. Grammar productions specify the manner in which the terminals and string sets, defined by
the nonterminals, can be combined to form a set of strings
defined by a particular nonterminal. For example, as shown
in Figure 1.4 consider the production S → aSb. This production
specifies that the set of strings defined by the nonterminal S are
obtained by concatenating terminal a with any string belonging
to the set of strings defined by nonterminal S, and then with
terminal b. Each production consists of a nonterminal on the left‐hand side, and a Figure 2.4
string of terminals and nonterminals on the right‐hand side. The left‐hand side of a
production is separated from the right‐hand side using the "→" symbol, which is used to identify a relation on
a set (V T)*.
Therefore context‐free grammar is a four‐tuple denoted as:
G = (V,T,P,S)
where:

1. V is a finite set of symbols called as nonterminals or variables,
2. T is a set a symbols that are called as terminals,
3. P is a set of productions, and
4. S is a member of V, called as start symbol.

2.6.2.2 Derivation the Parse Tree

When deriving a string w from S, if every derivation is considered to be a step in the tree construction,
then we get the graphical display of the derivation of string w as a tree. This is called a "derivation tree" or a

"parse tree" of string w. Therefore, a derivation tree or parse tree is the display of the derivations as a tree.
Note that a tree is a derivation tree if it satisfies the following requirements:
1. All the leaf nodes of the tree are labeled by terminals of the grammar.
2. The root node of the tree is labeled by the start symbol of the grammar.
3. The interior nodes are labeled by the nonterminals.
4. If an interior node has a label A, and it has n descendents with labels X1, X2, …, Xn from left to right,
then the production rule A → X1 X2 X3 …… Xn must exist in the grammar.
For example, consider a grammar whose list of productions is:
E→ E+E
E→ E*E
E→ id
The tree shown in Figure 1.5 is a derivation tree for a string id + id * id.

Figure 2.5
2.6.3 Symantec Analyzer Phase

After completing the lexical and Syntax functions without errors, Now we’ll move forward to
semantic analyzer, where we delve even deeper to check whether they form a sensible set of instructions in
the programming language. Whereas any old noun phrase followed by some verb phrase makes a
syntactically correct English sentence, a semantically correct one has subject‐verb agreement, proper use of
gender, and the components go together to express an idea that makes sense. For a program to be
semantically valid, all variables, functions, classes, etc. must be properly defined, expressions and variables
must be used in ways that respect the type system, access control must be respected, and so forth. Semantic
analyzer is the front end’s penultimate phase and the compiler’s last chance to weed out incorrect programs.
We need to ensure the program is sound enough to carry on to code generation.
A large part of semantic analyzer consists of tracking variable/function/type declarations and type
checking. In many languages, identifiers have to be declared before they’re used. As the compiler encounters
a new declaration, it records the type information assigned to that identifier. Then, as it continues examining
the rest of the program, it verifies that the type of an identifier is respected in terms of the operations being
performed. For example, the type of the right side expression of an assignment statement should match the
type of the left side, and the left side needs to be a properly declared and assignable identifier. The
parameters of a function should match the arguments of a function call in both number and type. The
language may require that identifiers be unique, thereby forbidding two global declarations from sharing the
same name. Arithmetic operands will need to be of numeric—perhaps even the exact same type (no
automatic int‐to‐double conversion, for instance). These are examples of the things checked in the semantic
analyzer phase.

and here we will discover some functions that handled by Context Sensitive Grammer ( CSG ), that
done by the Symantec analyzer ..

2.6.3.1 Types and Declarations
We begin with some basic definitions to set the stage for performing semantic analysis. A type is a set of values
and a set of operations operating on those values. There are three categories of types in most programming languages:
• Base types int, float, double, char, bool, etc. These are the primitive types provided directly by the
underlying hardware. There may be a facility for user‐defined variants on the base types (such as C enums).
• Compound types arrays, pointers, records, structs, unions, classes, and so on. These types are
constructed as aggregations of the base types and simple compound types.
• Complex types lists, stacks, queues, trees, heaps, tables, etc. You may recognize these as abstract data
types. A language may or may not have support for these sort of higher‐level abstractions.

In many languages, a programmer must first establish the name and type of any data object (e.g., variable,
function, type, etc). In addition, the programmer usually defines the lifetime. A declaration is a statement in a program
that communicates this information to the compiler. The basic declaration is just a name and type, but in many
languages it may include modifiers that control visibility and lifetime (i.e., static in C, private in Java). Some languages
also allow declarations to initialize variables, such as in C, where you can declare and initialize in one statement.

2.6.3.2 Type Checking
Type checking is the process of verifying that each operation executed in a program respects the type system of
the language. This generally means that all operands in any expression are of appropriate types and number. Much of
what we do in the semantic analyzer phase is type checking. Sometimes the rules regarding operations are defined by
other parts of the code (as in function prototypes), and sometimes such rules are a part of the definition of the
language itself (as in "both operands of a binary arithmetic operation must be of the same type"). If a problem is found,
e.g., one tries to add a char pointer to a double in C, we encounter a type error. A language is considered stronglytyped
if each and every type error is detected during compilation. Type checking can be done compilation, during execution,
or divided across both.

2.6.3.3 Scope Checking

To understand how this is handled in a compiler, we need a few definitions. A scope is a section of program
text enclosed by basic program delimiters, e.g., {} in C, or begin‐end in Pascal. Many languages allow nested scopes that
are scopes defined within other scopes. The scope defined by the innermost such unit is called the current scope. The
scopes defined by the current scope and by any enclosing program units are known as open scopes. Any other scope is
a closed. As we encounter identifiers in a program, we need to determine if the identifier is accessible at that point in
the program. This is called scope checking. If we try to access a local variable declared in one function in another
function, we should get an error message. This is because only variables declared in the current scope and in the open
scopes containing the current scope are accessible.

2.6.4 Intermediate Code Generator Phase
Intermediate code generator is the part of the compilation phases that by which a compiler's code
generator converts some internal representation of source code into a form (e.g., machine code) that can be
readily executed by a machine (often a computer).

The input to the Intermediate code generator typically consists of a parse tree or an abstract syntax
tree. The tree is converted into a linear sequence of instructions, usually in an intermediate language such as
three address code. Further stages of compilation may or may not be referred to as "code generation",
depending on whether they involve a significant change in the representation of the program. (For example,
a peephole optimization pass would not likely be called "code generation", although a code generator might
incorporate a peephole optimization pass.

Sophisticated compilers typically perform multiple passes over various intermediate forms. This multi‐
stage process is used because many algorithms for code optimization are easier to apply one at a time, or
because the input to one optimization relies on the processing performed by another optimization. This
organization also facilitates the creation of a single compiler that can target multiple architectures, as only
the last of the code generation stages (the backend) needs to change from target to target.

2.6.5 Code Optimizer Phase
the code optimizer is take the Intermediate Representation Code that would be generated by
the Intermediate Code Generator as it’s input and doing the Compilation Optimization Process that
finally produce the Optimized Code to be the input to the code generator. And here we can define
the compiler optimization process that the process of tuning the output of the intermediate
representation code to minimize or maximize some attribute of an executable computer program.
The most common requirement is to minimize the time taken to execute a program. a less common
one is to minimize the amount of memory occupied. The growth of portable computers has created a
market for minimizing the power consumed by a program.

Code optimization refers to the techniques used by the compiler to improve the execution efficiency
of the generated object code. It involves a complex analysis of the intermediate code and the performance of
various transformations; but every optimizing transformation must also preserve the semantics of the
program. That is, a compiler should not attempt any optimization that would lead to a change in the
program's semantics.

Optimization can be machine‐independent or machine‐dependent. Machine‐independent
optimizations can be performed independently of the target machine for which the compiler is generating
code; that is, the optimizations are not tied to the target machine's specific platform or language. Examples
of machine‐independent optimizations are: elimination of loop invariant computation, induction variable
elimination, and elimination of common subexpressions.

On the other hand, machine‐dependent optimization requires knowledge of the target machine. An attempt to
generate object code that will utilize the target machine's registers more efficiently is an example of machine‐
dependent code optimization. Actually, code optimization is a misnomer; even after performing various optimizing
transformations, there is no guarantee that the generated object code will be optimal. Hence, we are actually

performing code improvement. When attempting any optimizing transformation, the following criteria
should be applied:

1‐ The optimization should capture most of the potential improvements without an unreasonable
amount of effort.

2‐ The optimization should be such that the meaning of the source program is preserved.

3‐ The optimization should, on average, reduce the time and space expended by the object code.

2.6.5.1 Types of optimizations
Techniques used in optimization can be broken up among various scopes which can affect anything
from a single statement to the entire program. Generally speaking, locally scoped techniques are easier to
implement than global ones but result in smaller gains. Some examples of scopes include :

• Local optimizations : These only consider information local to a function definition. This
reduces the amount of analysis that needs to be performed (saving time and reducing storage
requirements) but means that worst case assumptions have to be made when function calls
occur or global variables are accessed (because little information about them is available).

• Loop optimizations : These act on the statements which make up a loop, such as a for loop
(eg, loop‐invariant code motion). Loop optimizations can have a significant impact because
many programs spend a large percentage of their time inside loops.

• Peephole optimizations : Usually performed late in the compilation process after machine
code has been generated. This form of optimization examines a few adjacent instructions (like
"looking through a peephole" at the code) to see whether they can be replaced by a single
instruction or a shorter sequence of instructions. For instance, a multiplication of a value by 2
might be more efficiently executed by left‐shifting the value or by adding the value to itself.
(This example is also an instance of strength reduction.)

2.6.6 Code Generator Phase
It’s the last phase in the compiler operations, and this phase is being a machine‐dependent phase, it is
not possible to generate good code without considering the details of the particular machine for which the
compiler is expected to generate code. Even so, a carefully selected code‐generation algorithm can produce code
that is twice as fast as code generated by an ill‐considered code‐generation algorithm.

And the clearly operations of this phase is the representation of the logical addresses of our compiled
program in the physical addresses, allocating the registers the target machine, also linking the contents of the
symbol table and the optimized code that generated by the code optimizer to produce finally the target file or the
object file.
2.6.7 Error Handling methods

One of the important tasks that a compiler must perform is the detection of and recovery from errors.
Recovery from errors is important, because the compiler will be scanning and compiling the entire program,
perhaps in the presence of errors; so as many errors as possible need to be detected.

Every phase of a compilation expects the input to be in a particular format, and whenever that input is
not in the required format, an error is returned. When detecting an error, a compiler scans some of the
tokens that are ahead of the error's point of occurrence. The fewer the number of tokens that must be
scanned ahead of the point of error occurrence, the better the compiler's error‐detection capability. For
example, consider the following statement :

if a = b then x: = y + z;

The error in the above statement will be detected in the syntactic analysis phase, but not before the syntax
analyzer sees the token "then"; but the first token, itself, is in error.

After detecting an error, the first thing that a compiler is supposed to do is to report the error by producing a
suitable diagnostic. A good error diagnostic should possess the following properties.

The message should be produced in terms of the original source program rather than in terms of some
internal representation of the source program. For example, the message should be produced along with the
line numbers of the source program.

1‐ The error message should be easy to understand by the user.

2‐ The error message should be specific and should localize the problem. For example, an error message
should read, "x is not declared in function fun," and not just, "missing declaration".

3‐ The message should not be redundant; that is, the same message should not be produced again and
again.

2.6.7.1 Recovery from the lexical phase errors
The lexical analyzer detects an error when it discovers that an input's prefix does not fit the
specification of any token class. After detecting an error, the lexical analyzer can invoke an error recovery
routine. This can entail a variety of remedial actions.

The simplest possible error recovery is to skip the erroneous characters until the lexical analyzer finds
another token. But this is likely to cause the parser to read a deletion error, which can cause severe
difficulties in the syntaxanalysis and remaining phases. One way the parser can help the lexical analyzer can
improve its ability to recover from errors is to make its list of legitimate tokens (in the current context)
available to the error recovery routine. The error‐recovery routine can then decide whether a remaining
input's prefix matches one of these tokens closely enough to be treated as that token.
2.6.7.2 Recovery from the Symantec phase errors

A Symantec analyzer detects an error when it has no legal move from its current configuration. It is capable of
announcing an error as soon as they read an input that is not a valid continuation of the previous input's prefix. This is
earliest time that a left‐to‐right parser can announce an error. But there are a variety of other Symantec analyzer s that
do not necessarily have this property.

The advantages of using a Symantec analyzer with a valid‐prefix‐property capability is that it reports an error as soon as
possible, and it minimizes the amount of erroneous output passed to subsequent phases of the compiler.

2.6.8 Symbol Table & Symbol Table Manger

A symbol table is a data structure used by a compiler to keep track of scope/ binding information about names.
This information is used in the source program to identify the various program elements, like variables, constants,
procedures, and the labels of statements. The symbol table is searched every time a name is encountered in the source
text. When a new name or new information about an existing name is discovered, the content of the symbol table
changes. Therefore, a symbol table must have an efficient mechanism for accessing the information held in the table as
well as for adding new entries to the symbol table.

For efficiency, our choice of the implementation data structure for the symbol table and the organization its
contents should be stress a minimal cost when adding new entries or accessing the information on existing entries.
Also, if the symbol table can grow dynamically as necessary, then it is more useful for a compiler. So, in the advanced
compilers there is a symbol table manager that helping in arranging the data in the symbol table and provide a
specified access to the other component of the compiler that want to access the symbol table to does anything .

This is all the required information that wanted to construct a complete picture about the compiler in the
mind of any interested person that have a little knowledge about the compiler and the automation language. And
know we are ready to go to the next chapter or the important chapter – Compiler Design – because the project or
study is a practice of the principles of the Software Design subject.


Chapter Three
Compiler
Design

3.1 Compiler Design

As e mentioned above about the object oriented development strategy, and why we are use this
strategy in our work and implement it in all stages. We also used this strategy in the design process of our
compiler .
In this chapter, we will show how to design the compiler by using the object oriented strategy, and also using
the standard notations and diagrams of the UML (unified modeling language ) that helping the programmer to
implement this system.
3.2 The Context of the System

When we look to the compiler system as a subsystem, we can observe that, if we considered any low
or high language as a system, we can extract or determine that we have many subsystems such as the compiler,
the linker, the loader, the libraries and report generator( if it is a high
language ) as a subsystems. As shown in the Figure 2.1. this figure
showing that our system is a subsystem in a large system, and it is
interact with the others subsystems to complete theirs function
without any conflict.
By using the object oriented design, now we will show the
subsystems of our compiler, and how this subsystem is interconnect
with each other. i.e. we will establish the architectural design of our
compiler.
Figure 3.1
3.2.1 Compiler Architectural Design

Here, in this process we will define the subsystems that making up our main system, and how this
subsystems are interact to produce the target file that the compiler if found to generate it.
3.2.1.1 Compiler Structuring

The main output from the System Structuring is the definition of the subsystems, and how this
subsystems would be communicate. So When we look to the Figure 3.2, we can easily optimized the

subsystems and determine the multi stage in the compiler from the beginning by receiving the source code
through the lexical analyzer, syntax, Symantec, …etc, till the
producing the target file. The lines that linking between this
subsystems is represent the relations between them.
3.2.1.1.1 Compiler Organization

Also from the Figure 3.2, we can determine the organization models
Figure 3.2
that used in our compiler. This models can help to illustrate how this subsystems
share it’s data, and communicate with each others.
We can find easily the following models in our compiler…
• Repository Model : found in the symbol table manager that is connect with the all others
subsystems to dealing with any subsystem’s data to check, update, and save it to the symbol
table. See the Figure 3.2 to understand this. From this model we can say that our compiler has
a subsystem share the data with all the others, So all of our subsystems has a flexible interface
or the same interface when getting or setting the data with the symbol table manager. Also
the repository model appears in the error handler subsystem that communicate with all the
other subsystems to report the errors if found in any stage.

• Abstract Machine ( Layered Model ) : this is the main model that the compiler depends to it.
Because it handle the main function of the compiler. The compilation process goes to end
through the Layered model, at the beginning the lexical analyzer receive the source code from
the editor screen if the language has a UI or from the File that the source code written to it.
When the Lexal analyzer finishes it’s function of separation the token that the source have, it
save the stream of tokens in the symbol table by helping from the symbol table manager. after
that the syntax analyzer receive this stream of tokens and doing the phrasing operation to this
tokens and produce the phrase tree to the symbol table, also this is done with helping with the
symbol table manager. Then the Symantec analyzer receive this phrase tree and checking the

logical errors then produce the abstract syntax tree. After that
the next phases is begin with the Intermediate representation
code generator that receive the abstract phrase tree and doing
the low level operation as generate the dependent machine
instruction and the instruction set architecture, then it produce
the intermediate representation code. After that this
intermediate representation code was receive by the I.R code
optimizer that doing the additional effort in the optimizing this
code such as reducing the time needed from the CPU to
executed this program and also reduce the size of memory that is
wanted to complete the execution, this is done by using many
functions that work in the low level environment that solving
many problems such as loop optimization and the parallel
optimization and … etc. the output from this stage is the
optimized code that would be received by the code generator
that work with it and the content of the symbol table to execute
many operation such as controlling the run time errors and
transform the logical addresses in the program that we want to
compile to the physical addresses in the memory, also allocating
the registers to produce finally the target code. We can convey
the meaning of this illustration in the following Figure 3.3 that
represent this layered models in our system.
Figure 3.3
3.2.1.2 Compiler Control Modeling

In this sub level we work with the controls to answers many questions such as how the
sequence of control flows on this subsystems?, when this sequence change from subsystem to
another ? and etc …

After many times to determine that, we can says that the control of our compiler is the Event – Based
Control because the timing in our system is very important. i.e. the compilation process doesn’t take a few
second, it’s take a little bit, and the compilation process goes from any subsystem to another depending to the
output of this subsystem, we have the ordering way to do that. So we can determine the model of controls of our
system as the Broadcast Model because any subsystem is interest to doing it’s job, it is first do the monitoring

operation to determine the time to start depending to any event the broadcast to all subsystems. And we can say
that this is an efficient way has been done in the integrating subsystems. And here subsystem doesn’t know when
an event will be handled. !
3.2.1.3 Compiler Modular Decomposition

In this field e can show the strategy that used in the decomposition of our subsystems component. It
is very important step because if this step completed successfully, we gives a big help to the programmers that
will be written the codes of our compiler. As we mentioned above that we use the object oriented strategy to
analysis and design our system, we also use the Object Model to doing the modular decomposition in the
subsystems component. By identifying the needed classes and identifying theirs attributes and operations. We will
illustrate this step when we reach to the UML class diagram in this chapter.

3.2.1.4 Compiler Domain Specific Architectures

Here, we can determine the domain specific model of our compiler. This is one of the important step of
the architectural design of any systems that we want to build. In the SE4 compiler design we can say we use to
analysis the domain specific of our compiler the Generic Model that appear from the analysis the previous
compilers and extracting the good features to implement it in our compiler. i.e. our compiler is come from the
study of the existing compilers that have the same objectives with our compiler.

3.3 Compiler USE Case Models

Use case models or diagrams is important to describe our subsystems functions, because the
programmers need a diagrams nether that the natural language to implement our system. So we develop the use
case models for each of our subsystems of the compiler.
We start with the Lexical analyzer use case diagram .!

3.3.1 Lexical Analyzer USE Case Diagram

In this diagram shows in the Figure 3.4, we specify the lexical analyzer use case diagram whose main
function is to generate the stream of tokens. As we see this diagram it also have many additional functions .
Figure 3.4
3.3.2 Syntax Analyzer USE Case Diagram

Syntax analyzer use case diagram is shown in the Figure 3.5. it’s main use case is the phrasing this stream
of token that generated by the lexical analyzer.

Figure 3.5

3.3.3 Symantec Analyzer USE Case Diagram

Symantec analyzer doing the functions of determining the logical error by comparing the phrase tree
that generated from the syntax analyzer with it’s CSG (context sensitive grammar) . and also produce the AST
(abstract syntax tree) as shown in the Figure 3.6 .
Figure 3.6
3.3.4 I.R Code Generator USE Case Diagram

In the intermediate representation code generator we can say this subsystem’s main use case is to produce
the simple intermediate representation code. We shown it’s use case diagram in the Figure 3.7 .

Figure 3.7

3.3.5 I.R Code Optimizer USE Case Diagram

In the Figure 3.8 we would show the use case diagram of the I.R code optimizer that doing it’s operation on
the simple I.R code that was generated by the I.R code generator to finally produce Simple Optimized Code.
Figure 3.8

3.3.6 Code Generator USE Case Diagram

The main function of this subsystem and the system is to produce the target file, this function is done with
many environmental functions that are shown in the Figure 3.9.

Figure 3.9

3.3.7 Symbol Table Manager USE Case Diagram

with this subsystem, we can access to the symbol table manager. So this subsystem have many functions
applied on the symbol table. It’s use case diagram is shown in the Figure 3.10.

Figure 3.10

3.3.8 Error Handler USE Case Diagram

The function of the error handler is to report any occurred errors and recognize any types of errors occurred
during the compilation process. This is shown in the Figure 3.11.

Figure 3.11

3.4 Compiler’s Subsystems Activity Diagrams

The activity diagram the diagrams that translate the use case to the sequence of operations and allowed the
conditions through this sequence. So it is describe any function from the beginning till the end of it.

3.4.1 Lexical Analyzer Activity Diagram

The activity diagram of the lexical analyzer subsystem is illustrate the process of tokenization and how this
process is happened, also what is the probabilities of the result. Here the result May be the stream of tokens or
the error that occurred. Figure 3.12 is shown this diagram.
Figure 3.12

3.4.2 Syntax Analyzer Activity Diagram

The syntax analyzer activity diagram is illustrate how the syntax analyzer is internally interact to produce in
the final, the phrase tree or occurring the syntax error that doesn’t allow this stage to complete it’s sequences to
produce the phrase tree. Figure 3.13 show us the activity diagram of the syntax analyzer subsystem.

Figure 3.13

3.4.3 Symantec Analyzer Activity Diagram

The Symantec analyzer sequences to produce the abstract phrase tree if doesn’t error occurred is shown in
the Figure 3.14.

Figure 3.14

3.4.4 I.R Code Generator Activity Diagram

From this subsystem till the end of the compilation process, the work is done in the low level meaning that
this work is done dependently to the machine it self or from the other side if the compilation process reach to this
stage without any error, the user finishes it’s requirements and the compiler is interact to finish this operation and
produce the target file with less negative effect to the user’s machine. Figure 3.15 shown how the I.R code
Generator internally interact to produce the simple I.R code.

Figure 3.15

3.4.5 I.R Code Optimizer Activity Diagram

I.R code Optimizer activity diagram shown in the Figure 3.16. it’s main function is to produce the Optimized
code from the I.R simple code produced by the I.R code generator.

Figure 3.16

3.4.6 Code Generator Activity Diagram

The activity diagrams of this subsystem is shown in the Figure 3.17. this figure shown the sequence that used
in this subsystem to produce the target file or code.

Figure 3.17


3.4.7 Symbol Table Manager Activity Diagram

In the activity diagram of the symbol table manager, we illustrate the sequence of this subsystem operations
that are shown in the Figure 3.18.

Figure 3.18


3.4.8 Error Handler Activity Diagram

In the error handler activity diagram, we illustrate the operations that are executed when any type of error
are happened in any stage of the compilation process. The Figure 3.19 is shown this diagram.

Figure 3.19


3.5 Compiler Class Diagram

In this subsection of the design process, we show the generated class that needed to implement our
compiler. This class attributes is extracted from the use case and activity diagrams that are discussed above.
In the determination of this classes we also extract any redundant class that our main class in need it to
complete it’s functions. The following large Figure 3.20 is show this classes of our compiler.

Figure 3.20


3.6 Compiler sequence Diagram

The sequence diagram is the diagram that describe the interaction among many actors of any system. i.e. it is
describe how the subsystems will be communicated. And what is the sequence from many paths of the
communications between them.
3.6.1 Lexical analyzing sequence diagram

In the lexical analyzing stage, we generate the sequence diagram that is interact among our interface, the
lexical analyzer, symbol table manager, symbol table and error handler actors. This sequence diagram is shown in
the Figure 3.21.

Figure 3.21

3.6.2 syntax analyzing sequence diagram

In the syntax analyzing stage, we generate the sequence diagram that is interact among our interface, the
syntax analyzer, symbol table manager, symbol table and error handler actors. This sequence diagram is shown in
the Figure 3.22.
Figure 3.22

3.6.3 Symantec Analyzing sequence diagram

In the Symantec analyzing stage, we generate the sequence diagram that is interact among our interface, the
Symantec analyzer, symbol table manager, symbol table and error handler actors. This sequence diagram is shown
in the Figure 3.23.

Figure 3.23

3.6.4 I.R Code Generation sequence diagram

In the I.R Code Generation stage, we generate the sequence diagram that is interact among our Intermediate
Representation code generator, symbol table manager, and the symbol table . This sequence diagram is shown in
the Figure 3.24.

Figure 3.24

3.6.5 I.R Code Optimizing sequence diagram

In the I.R Code Optimizing stage, we generate the sequence diagram that is interact among our Intermediate
Representation code optimizer, symbol table manager, and the symbol table . This sequence diagram is shown in
the Figure 3.25.
Figure 3.25

3.6.6 Code Generation sequence diagram

In the Code generation stage, we generate the sequence diagram that is interact among our code generator ,
symbol table manager, and the symbol table . This sequence diagram is shown in the Figure 3.26.
Figure 3.26

3.6.7 Additional sequence diagram

The sequences diagrams that are shown in the Figure 3.27, is represent the sequence diagrams that are
executed once at the first time of installing our compiler. To feeding our subsystems with the grammars.
Figure 3.27


Glossary
Compiler
A compiler is a special type of computer program that translates a human readable text file into a
form that the computer can more easily understand.
Interpreter
An interpreter reads the source code one instruction or line at a time, converts this line into
machine code and executes it.
Object-oriented strategy
Is a development strategy where system analyzers ,designers and programmer think in terms of
‘things’ instead of operations or functions.
Lexical Analyzer
Lexical analyzer is the part where the stream of characters making up the source program is read
from left‐to‐right and grouped into tokens.
Tokens
Tokens are sequences of characters with a collective meaning. There are usually only a small
number of tokens for a programming language: constants (integer, double, char, string, etc.),
operators (arithmetic, relational, logical), punctuation, and reserved words.
A lexeme
is the actual character sequence forming a token.
Regular Expression
A regular set is a set of strings for which there exists some finite automata that accepts that set.
That is, if R is a regular set, then R = L(M) for some finite automata M. Similarly, if M is a finite
automata, then L(M) is always a regular set.
Syntax Analyzer
The syntax‐analysis is the part of a compiler that verifies whether or not the tokens generated by
the lexical analyzer are grouped according to the syntactic rules of the language.

Context Free Grammar ( CFG )

CFG notation specifies a context‐free language that consists of terminals, nonterminals, a start
symbol, and productions.
The terminals
The terminals are nothing more than tokens of the language, used to form the language
constructs.
Nonterminals
The nonterminals define the sets of strings that are used to define the language generated by the
grammar.
Derivation
Derivation refers to replacing an instance of a given string's nonterminal, by the right‐hand side
of the production rule, whose left‐hand side contains the nonterminal to be replaced.
Symantec Analyzer
Is the part of compiler where delve even deeper to check whether they form a sensible set of
instructions in the programming language.
Types and Declarations

A type is a set of values and a set of operations operating on those values.

Type Checking
Type checking is the process of verifying that each operation executed in a program respects the type system
of the language.

A scope
A scope is a section of program text enclosed by basic program delimiters.

Scope Checking
Is the determination if the identifier is accessible at that point in the program.
Intermediate Code Generator
Is the part of the compilation phases that by which a compiler's code generator converts some internal
representation of source code into a form that can be readily executed by a machine.


Code Optimizer
Refers to techniques a compiler can employ in order to produce an improved object code for a given source
program.
Code Generator
It’s the last phase in the compiler operations, that convert the optimized code to depended machine
code>
Error Handling
One of the important tasks that a compiler must perform is the detection of and recovery from errors.

Symbol Table
A symbol table is a data structure used by a compiler to keep track of scope/ binding information
about names. This information is used in the source program to identify the various program
elements, like variables, constants, procedures, and the labels of statements.

Symbol Table Manger

Is the part of the compiler that provide to other compiler components the access to symbol
table.

Bibliography

[1] O.G. Kakde , “ Algorithms for Compiler Design ” , Charles River Media 2002.
[2] Alfred v.Aho ,Ravi Sethi and Jeffrey D.Ullman.“ Compilers Principles ,Techniques and Tools “.
[3] Ian Sommerville , “Software Engineering “.
[4] Y.N. Srikant, Priti Shankar, “The Compiler Design Handbook 2nd Edition “Dec.2007
[5] Cohen, “ Introduction to Computer Theory”, New York: Wiley, 1986.
[6] Hopcroft, J. Ullman, “Introduction to Automata Theory, Languages, and Computation,
Reading” MA: Addison‐Wesley, 1979.
[7] Bennett , “Introduction to Compiling Techniques. Berkshire”, England: McGraw‐Hill, 1990.

Websites

www.wikipedia.com


Compiler Design - Software Design Project

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Compiler Design - Software Design Project

Hochgeladen von

Copyright:

Verfügbare Formate

We dedicate this work ..

to the great persons Founded through the history giving

Also to any person around this world that searching about

SE4 Compiler Design Documentation Page 2

SE4 Compiler Design Documentation Page 3

SE4 Compiler Design Documentation Page 4

SE4 Compiler Design Documentation Page 5

SE4 Compiler Design Documentation Page 6

1.3 The plan

SE4 Compiler Design Documentation Page 7

1.5 Report Overview

SE4 Compiler Design Documentation Page 8

2.1 What Is a Compiler ?

SE4 Compiler Design Documentation Page 10

SE4 Compiler Design Documentation Page 11

2.4.1 Front End Phases

SE4 Compiler Design Documentation Page 12

2.4.2 Back End Phases

SE4 Compiler Design Documentation Page 13

2.5 Object Oriented development

SE4 Compiler Design Documentation Page 14

2.6 Compilation Processes

2.6.1 Lexical Analyzer Phase

SE4 Compiler Design Documentation Page 15

2.6.1.1 Regular Expression Notation

2.6.2 Syntax Analyzer Phase

SE4 Compiler Design Documentation Page 16

SE4 Compiler Design Documentation Page 17

2.6.3 Symantec Analyzer Phase

SE4 Compiler Design Documentation Page 18

2.6.3.1 Types and Declarations

SE4 Compiler Design Documentation Page 19

SE4 Compiler Design Documentation Page 20

SE4 Compiler Design Documentation Page 21

2.6.7 Error Handling methods

SE4 Compiler Design Documentation Page 22

2.6.7.1 Recovery from the lexical phase errors

2.6.7.2 Recovery from the Symantec phase errors

2.6.8 Symbol Table & Symbol Table Manger

SE4 Compiler Design Documentation Page 23

3.1 Compiler Design

3.2 The Context of the System

3.2.1 Compiler Architectural Design

3.2.1.1 Compiler Structuring

SE4 Compiler Design Documentation Page 25

3.2.1.1.1 Compiler Organization

SE4 Compiler Design Documentation Page 26

3.2.1.2 Compiler Control Modeling

SE4 Compiler Design Documentation Page 27

3.2.1.3 Compiler Modular Decomposition

3.2.1.4 Compiler Domain Specific Architectures

3.3 Compiler USE Case Models

SE4 Compiler Design Documentation Page 28

3.3.1 Lexical Analyzer USE Case Diagram

3.3.2 Syntax Analyzer USE Case Diagram

SE4 Compiler Design Documentation Page 29

3.3.3 Symantec Analyzer USE Case Diagram

3.3.4 I.R Code Generator USE Case Diagram

SE4 Compiler Design Documentation Page 30

3.3.5 I.R Code Optimizer USE Case Diagram

SE4 Compiler Design Documentation Page 31

3.3.6 Code Generator USE Case Diagram