Compiler Lab Manual

Chameli Devi School Of Engineering, Indore
CDGI'S
CHAMELIDEVI SCHOOL OF ENGINEERING, INDORE.
DEPARTMENT OF COMPUTER SCIENCE
COURSE FILE CONTENT
Year
2013-14
Class/Sem
Sem-VII
CS-A &B
Branch
CSE
Subject
Compiler
Design
Faculty Name
Mr. Ajay Jaiswal
Content
1.
2.
3.
4.
5.
Scope of the course

Disciplines involved in it
Abstract view for a compiler
Front-end and back-end tasks
Modules
6.
List of Practicals
7.
LINUX O/S
8.
C++ / JAVA program backup
Lab Manual of Compiler Design
Page 1

CHAMELI DEVI GROUP OF INSTITUTES, INDORE.
Chameli Devi School Of Engineering.
Department of Computer Science & Information Technology
Compiler Design Laboratory
Compiler Design [CS-701] Practical

( Year
2013-2014)
Name: ________________________________________
Roll No.: _______________________________________
Branch: _______________________________________
Semester:_______________________________________
Section: ________________________________________
Subject: _______________________________________
Certified by:
Total Practical :
Practicals performed:
Faculty Name/Signature
Page 2

CHAMELI DEVI GROUP OF INSTITUTES, INDORE
DEPARTMENT OF COMPUTER SCIENCE & INFORMATION TECHNOLOGY
COMPILER DESIGN LABORATORY (CS-701)
PRACTICAL LIST
S.NO
1.
Name of Practical
Implement a Program to count character of a given string without using
space & with using space for the string a handle of a string is a substring
that matches the right side of a production rule. .
2.
Create a file (Comiler.cc) & Implement a Program to read all the content of a
Compiler.cc (how many lines, how many words and how many character in
the file) .
3.
Write a program for implementation of Deterministic Finite Automata (DFA)
for the strings accepted by (abbb, abb, ab,a).
4.
Construction of Minimization of Deterministic Finite Automata for the given
diagram & recognize the string (aa + b)*ab(bb)*.
5.
Construct a program for how to Compute FIRST () & FOLLOW () symbol for
LL(1 ) grammar, if the Context free grammar for LL(1) Construction is..?
6.
Construct a Operator Precedence Parser for the following given grammar
and also compute Leading () and trailing () symbols of the given grammar.
7.
Program using LEX to count the number of characters, words, spaces and
lines in a given input file.
8.
Program using LEX to count the numbers of comment lines in a given C
program. Also eliminate them and copy the resulting program into separate
file.
Page 3

9.
Program using LEX to recognize a valid arithmetic expression and to
recognize the identifiers and operators present. Print them separately.
10.
Program using LEX to recognize whether a given sentence is simple or
compound.
11.
Program using LEX to recognize and count the number of identifiers in a
given input file.
12.
Implement a YACC (Yet Another Compiler Compiler ) program to recognize a
valid arithmetic expression that uses operators +, -, * and /.
13.
Implement YACC (Yet Another Compiler Compiler ) program to recognize a
valid variable, which starts with a letter, followed by any number of letters or
digits.
14.
YACC (Yet Another Compiler Compiler ) program to recognize strings
aaab, abbb, ab and a using the grammar (anbn, n>= 0).
15.
Program to recognize the Context free grammar (anbn, n>= 10), Where a & b
are input symbols of the grammar.
16.
Write a C program to implement the syntax-directed definition of if E then
S1 and if E then S1 else S2.
Page 4
CHAMELI DEVI GROUP OF INSTITUTES, INDORE

Chameli Devi School Of Engineering
Department of Computer Science & Information Technology
Basic Computer Engineering Laboratory
Practical List
S.No.
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
Practical
Date of Experiment Date of Submission Signature & Remarks
Practical 1
Practical 2
Practical 3
Practical 4
Practical 5
Practical 6
Practical 7
Practical 8
Practical 9
Practical 10
Practical 11
Practical 12
Practical 13
Practical 14
Practical 15
Practical 16
Practical 17
Practical 18
Practical 19
Practical 20
Practical 21
Practical 22
Practical 23
Practical 24
Practical 25
Head of Department
Faculty
Page 5
LAB MANUAL
SUBJECT NAME-------------SUBJECT CODE-----------------CLASS-----------------------------SEMESTER-------------------------
FACULTY NAME / SIGNATURE------------------FACULTY NAME / SIGNATURE--------------------
Page 6
Course scope
Aim:
To learn techniques of a modern compiler
Main reference:
Compilers Principles, Techniques and Tools, Second Edition by Alfred V. Aho, Ravi Sethi,
Jeffery D. Ullman
Supplementary references:
Modern compiler construction in Java 2
Implementation by Muchnick.
nd
edition Advanced Compiler Design and
Subjects
Lexical analysis (Scanning)
Syntax Analysis (Parsing)
Syntax Directed Translation
Intermediate Code Generation
Run-time environments
Code Generation
Machine Independent Optimization
Compiler learning
Isnt it an old discipline?

Yes, it is a well-established discipline
Algorithms, methods and techniques are researched and developed in early stages of
computer science growth
There are many compilers around and many tools to generate them automatically
So, why we need to learn it?
Although you may never write a full compiler
But the techniques we learn is useful in many tasks like writing an interpreter for a scripting
language, validation checking for forms and so on.
Page 7
Terminology
Compiler:
a program that translates an executable program in one language into an executable program
in another language. we expect the program produced by the compiler to be better, in some
way, than the original
Interpreter:
a program that reads an executable program and produces the results of running that
program. usually, this involves executing the source program in some fashion. Our course is
mainly about compilers but many of the same issues arise in interpreters.
Disciplines involved
Algorithms
Languages and machines
Operating systems
Computer architectures
Why Study Compilers?

General background information for good software engineer.
Increases understanding of language semantics.
Seeing the machine code generated for language constructs helps understand performance
issues for languages.
Teaches good language design.
New devices may need device-specific languages.
New business fields may need domain-specific languages.
Applications of Compiler Technology & Tools
Processing XML/other to generate documents, code, etc.
Processing domain-specific and device-specific languages.
Implementing a server that uses a protocol such as http or imap
Natural language processing, for example, spam filter, search, document

comprehension, summary generation
Translating from a hardware description language to the schematic of a circuit
Page 8
Automatic graph layout (graphviz, for example)
Extending an existing programming language
Program analysis and improvement tools
Abstract view
Compilers translate from a source language (typically a high level language) to a functionally
equivalent target language (typically the machine code of a particular machine or a machineindependent virtual machine).
Compilers for high level programming languages are among the larger and more complex
pieces of software
Original languages included Fortran and Cobol
Often multi-pass compilers (to facilitate memory reuse)
Compiler development helped in better programming language design
Early development focused on syntactic analysis and optimization
Commercially, compilers are developed by very large software groups
Current focus is on optimization and smart use of resources for modern RISC (reduced
instruction set computer) architectures.
Source
code
Compiler
errors
Mach ine
code
Recognizes legal (and illegal) programs
Generate correct code
Manage storage of all variables and code
Agreement on format for object (or assembly) code
Page 9
Principles of Compiler Design Syllabus

Introduction to Compiler:
Translator issues, why to write compiler, compilation process in brief, front end and backend
model, compiler construction tools, Interpreter and the related issues, Cross compiler,
Incremental compiler, Boot strapping.
1.
Lexical Analysis
Review of lexical analysis: alphabet, token, lexical error, Block schematic of lexical
analyser, Automatic construction of lexical analyser (LEX), LEX specification details.
2. Syntax Analysis
Introduction: Role of parsers, Parsing technique: Top down-RD parser, Predictive LL
(k) parser, Bottom up-shift-Reduce, SLR, LR(k), LALR etc. using ambiguous grammars,
Error detection and recovery, Automatic construction of parser (YACC), YACC
specifications.
semantic analysis
Need of semantic analysis, type checking and type conversation.
3. Syntax directed translation
Syntax directed definitions, construction of syntax trees, bottom-up evaluation of Sattribute definition, L-attributed definition , Top-down translation, Bottom-up evaluation of
inherited attributes.
Intermediate code Generation: Intermediate code generation for declaration,
assignment, iterative statements, case statements, arrays, structures, conditional
statements, Boolean expressions, procedure calls, Intermediate code Generation using
YACC
4. Run Time Storage Organisation
Storage allocation strategies, static, dynamic storage allocation, allocation strategies for
block structured and non-block structured languages; O.S. support required for IO
statements. (e.g. printf, scanf) and memory allocation deallocation related statement.
(e.g. new, malloc)
5. Code GenerationIntroduction: Issues in code generation, Target machine description,

Basic blocks and flow graphs, next use representation of basic blocks, Peephole optimisation,
Page 10

DAG generating code from a DAG, Dynamic programming, Code generator-generator
concept.
6. Code OptimisationIntroduction, classification of Optimisation, principle sources of
Optimisation, m/c dependent Optimisation, m/c independent optimisation, Optimisation
of basic blocks, loops in flowgraphs, Optimising transformation: compile time evaluation,
Common sub-expression elimination, variable propagation, code Movement, strength
reduction, dead code elimination and loop optimisation, local optimisation, DAG based
local optimisation. Global optimisation: control and data flow analysis, control flow
analysis-concepts and definition, data flow analysis, data flow analysis, Computing data
flow information, meet over paths,Data flow equations. Iterative data flow analysis:
Available exprns, live range identification.
Definition
A compiler is a computer program (or set of programs) that transforms source code written in
a programming language (the source language) into another computer language (the target
language, often having a binary form known as object code).
The Analysis-Synthesis Model of Compilation

There are two parts to compilation:
Analysis determines the operations implied by the source program which are recorded in a
tree structure
Synthesis takes the tree structure and translates the operations therein into the target
program
Other Tools that Use the Analysis-Synthesis Model

Editors (syntax highlighting)
Pretty printers (e.g. Doxygen)
Static checkers (e.g. Lint and Splint)
Interpreters
Text formatters (e.g. TeX and LaTeX)
Silicon compilers (e.g. VHDL)
Query interpreters/compilers (Databases)
Page 11
Grouping of phases
Incremental compiler
The term incremental compiler may refer to two different types of compiler.
Imperative programming
Interactive Programming
In imperative programming and software development, an incremental compiler is one that
when invoked, takes only the changes of a known set of source files and updates any
corresponding output files (in the compiler's target language, often bytecode) that may
already exist from previous compilations. By effectively building upon previously compiled
output files, the incremental compiler avoids the wasteful recompilation entire source files,
Page 12

where most of the code remains unchanged. For most incremental compilers, compiling a
program with small changes to its source code is usually near instantaneous. It can be said
that an incremental compiler reduces the granularity of a language's traditional compilation
units while maintaining the language's semantics, such that the compiler can append and
replace smaller parts.
Cross compiler
A cross compiler is a compiler capable of creating executable code for a platform other than
the one on which the compiler is run.
Cross compiler tools are used to generate executables for embedded system or multiple
platforms.
It is used to compile for a platform upon which it is not feasible to do the compiling, like micro
controllers that don't support an operating system.
Phases of a Compiler
Source Program
1
3
Symbol-table
Manager
Lexical Analyzer
Syntax Analyzer
Semantic Analyzer
Error Handler
Intermediate Code
Generator
Code Optimizer
Code Generator
Target Program
Page 13
Program #01
1. Implement a Program to count character of a given string without

using space & with using space for the string a handle of a string is a
substring that matches the right side of a production rule. /*
lexical analysis (scanning)
token stream
1
ident
"val"
3
assign
-
2
number
10
4
times
-
1
ident
"val"
5
plus
-
1
ident
"i"
token number
token value
syntax analysis (parsing)
Statement
syntax tree
Expression
Term
ident = number * ident + ident
Lexical Analysis
Stream of characters is grouped into tokens
Examples of tokens are identifiers, reserved words, integers, doubles or floats, delimiters,
operators and special symbols
int a; a = a + 2;
int
reserved word
a
identifier
;
special symbol
a
identifier
=
operator
a
identifier
+
operator
2
integer constant
;
special symbol
Examples of Token
Token: A sequence of characters to be treated as a
single unit.
Examples of tokens.
Reserved words (e.g. begin, end, struct, if etc.)
Keywords (integer, true etc.)
Operators (+, &&, ++ etc)
Identifiers (variable names, procedure names, parameter names)
Literal constants (numeric, string, character constants etc.)
Punctuation marks (:, , etc.)
Page 14
A ./* string with using space.*/

Source Code:
#include<iostream.h>
void main()
{
char c[30];
int n=0;
cout<<"Enter the String" <<"\n";
cin>>c;
for(int i=0;c[i]!='\0';i++)
{
n=n+1;
}
cout<<"Length of the string is"<<n;
getch();
}
Input:
Type any strings with combinations of letters
Output:
Total No. of letters/ characters of given string.
Page 15
B ./* string without using space.*/

Source Code:
#include <iostream>
using namespace std;
#include <conio.h>
#include <iomanip>
# include <string>
const int SIZE = 100;
void input (char*);
void wordCount (char*);
void longWord (char*, int);
void numbers (char*);
void outputLetterCounts (int letterCount[]);
int main()
{
char string [100] = {'/0'};
char word[100] = {'/0'};
int x = 0; int n = 0;
int letterCount[100] = {0};
input(string);
wordCount(string);
longWord(string, x);
numbers (string);
outputLetterCounts(letterCount);
return 0;
}
void input (char *enter)
{
cout<<"Enter sentence(s) "<< std::endl;
std::cin.getline(enter, SIZE);
int len = strlen( enter );
}
void wordCount (char *word2)
{
int cnt = 0;
while(*word2 != '\0')
{
while(isspace(*word2))
{
++word2;
}
Page 16

if(*word2 != '\0')
{
++cnt;
while(!isspace(*word2) && *word2 != '\0')
++word2;
}
}
std::cout<<"Number of words: "<<cnt<<endl;
}
void outputLetterCounts(int letterCount[])
{
for (int l = 0; l < 26; l++)
{
if (letterCount[l] > 0)
{
cout << letterCount[l] << " " << char('a' + l) << endl;
}
}
}
void numbers (char *word2)
{
int num = 0;
int amount = strlen(word2);
for(int i = 0; i < amount; i++)
{
if(isdigit(word2[i]))
num++;
}
std::cout<<"Digits: "<<num<<endl;
}
void longWord (char *temp, int x )
{
int counter = 0;
int max_word = -1;
int length = int(strlen(temp));
for(int i=0; i<length; i++)
{
if(temp[i] !=' ')
{
counter++;
}
else if(temp[i]==' ')
Page 17

{
if(counter > max_word)
{
max_word = counter;
}
counter = 0;
}
}
std::cout <<"Longest word:" << max_word;
}
Input:
Type any strings with combinations of letters / characters of sentence..
Output:
Total No. of letters/ characters of given string without spaces (Excluding white spaces.).
Page 18
Program #02
/* Create a file (Comiler.cc) & Implement a Program to read all the content
of a Compiler.cc (how many lines, how many words and how many
character in the file) .*/
Source Code:
#include<stdio.h>
int main()
{
int noc=0,now=0,nol=0;
FILE *fw,*fr;
char fname[20],ch;
printf("\n enter the source file name Comiler.cc");
gets(fname);
fr=fopen(fname,"r");
if(fr==NULL)
{
printf("\n error \n");
exit(0);
}
ch=fgetc(fr);
while(ch!=EOF)
{
noc++;
if(ch==' ');
now++;
if(ch=='\n')
{
nol++;
now++;
}
ch=fgetc(fr);
}
fclose(fr);
printf("\n total no of character=%d",noc);
printf("\n total no of words=%d",now);
printf("\n total no of lines=%d",nol);
return 0;
}
Page 19

#include <fstream>
#include <iostream>
#include <string>
//#include <cctype>
using namespace std;
int main()
{
string name;
// name of the file to be opened
char ch;
// character to be read into the loop
fstream fileName;
// declare the fstream object
// list of integers to hold the count information

int characters = 0,
words = 0,
sentences = 0,
lines = 0;
// list of constants used to define words, lines, and sentences
const char EOLN = '\n',
// end of line character
SENT = '.',
// end of sentence character
BLANK =
' ';
// end of word character
do
{
int count;
// just a counter
// Prompt for user input and open the specified file
if (count != 1)
{
cout << "Enter the name of a file: ";
}
else
{
cout << "File Not Found!\nEnter the name of a file: ";
}
cin >> name;
cin.ignore();
// ignore the next character in the buffer
fileName.open(name.c_str());
// convert name to a c-style string
count = 1;
} while (!fileName);
// use a while loop to perform the required operations
while ( !fileName.eof())
{
Page 20

char prevChar;
fileName.get(ch);
cout << ch;
// track the last character analyzed

// get each character from the file
// and print it to the screen
characters ++;
// count the characters in the file
// count the words in the file

if ((ch == BLANK) && (prevChar != BLANK))
{
words ++;
}
if ( ch == SENT )
// count the sentences in the file
{
sentences ++;
}
// end of the sentence if
if ( ch == EOLN )
// count the lines in the file
{
lines ++;
words ++;
// count the next word here
}
// end of end-of-line if
prevChar = ch;
}
// end of while loop
fileName.clear();
fileName.close();
// clear the fail state

// close the file
// display a summary of the file analysis

cout << "\nThere are " << characters << " characters in this file.\n";
cout << "There are " << words << " words in this file.\n";
cout << "There are " << sentences << " sentences in this file.\n";
cout << "There are " << lines << " lines in this file.\n";
return 0;
}
Page 21

Input:
Type any strings with combinations of letters / characters of sentence..Construct automata.
Output:
Sequences of transition states with accepting states.
Page 22
Program #03
/*Write a program for implementation of Deterministic Finite Automata
(DFA) for the strings accepted by (abbb, abb, ab,a).*/
Deterministic finite automata (DFA) :
A deterministic finite automaton (DFA) is a 5-tuple: (S, , T, s, A)
an alphabet ()
a set of states (S)
a transition function (T : S S).
a start state
a set of accept states
The machine starts in the start state and reads in a string of symbols from its alphabet. It
uses the transition function T to determine the next state using the current state and the
symbol just read. If, when it has finished reading, it is in an accepting state, it is said to accept
the string, otherwise it is said to reject the string. The set of strings it accepts form a
language, which is the language the DFA recognizes.
Non-Deterministic Finite Automaton (N-DFA):
A Non-Deterministic Finite Automaton (NFA) is a 5-tuple: (S, , T, s, A)
an alphabet ()
a set of states (S)
a transition function (T: S S).
a start state
a set of accept states
Where P(S) is the power set of S and is the empty string. The machine starts in the start
state and reads in a string of symbols from its alphabet. It uses the transition relation T to
determine the next state(s) using the current state and the symbol just read or the empty
string. If, when it has finished reading, it is in an accepting state, it is said to accept the string,
otherwise it is said to reject the string. The set of strings it accepts form a language, which is
Page 23

the language the NFA recognizes.
Source Code:
#include<stdio.h>
#include<stdlib.h>
#include<conio.h>
void main()
{int n,m,start,nf,ps;
char str[20];
clrscr();
printf("enter no of states");
scanf("%d",&n);
printf("enter no of inputs");
scanf("%d",&m);
//constructing buffers
int **tran=new int* [n];
for(int i=0;i<n;i++)
{
tran[i]=new int[m];
}
for(i=0;i<n;i++)
{
for(int j=0;j<m;j++)
{
printf("enter next state for present state %d on input%d ",i,j);
scanf("%d",&tran[i][j]);
}
}
printf("enter starting state");
scanf("%d",&start);
printf("enter no of final states");
scanf("%d",&nf);
int* final=new int[nf];
for(i=0;i<nf;i++)
{
printf("enter the state");
scanf("%d",&final[i]);
}
printf("enter string");
scanf("%s",str);
i=0;
ps=start;
Page 24

while(str[i]!='\0')
{
ps=tran[ps][str[i]-48];
i++;
}
for(i=0;i<n;i++)
{
if(ps==final[i])
{
printf("accepted");
break;
}
}
//deleting buffer
delete final;
for(i=0;i<n;i++)
{
delete tran[i];
}
delete tran;
}
Page 25
Program #04
/*Construction of minimization of Deterministic Finite Automata
for the given diagram & recognize the string (aa + b)*ab(bb)*. */
Minimizing Finite Automata
Consider the finite automaton shown in figure 1 which accepts the regular set denoted by the
regular expression (aa + b)*ab(bb)*. Accepting states are colored yellow while rejecting states
are blue.
Figure 1 - Recognizer for (aa + b)*ab(bb)*

Closer examination reveals that states s2 and s7 are really the same since they are both
accepting states and both go to s6 under the input b and both go to s3 under an a. So, why
not merge them and form a smaller machine? In the same manner, we could argue for
merging states s0 and s5. Merging states like this should produce a smaller automaton that
accomplishes exactly the same task as our original one.
From these observations, it seems that the key to making finite automata smaller is to
recognize and merge equivalent states. To do this, we must agree upon the definition of
equivalent states. Here is one formulation of what Moore defined as indistinguishable states.
Definition. Two states in a finite automaton M are equivalent if and only if for
every string x, if M is started in either state with x as input, it either accepts in
both cases or rejects in both cases.
Another way to say this is that the machine does the same thing when started in either state.
This is especially necessary when finite automata produce output.
Page 26

Two questions remain. First, how does one find equivalent states, and then, exactly how
valuable is this information? We shall answer the second question first by providing a
corollary to a famous theorem proven long ago by Myhill [3] and Nerode [4].
Corollary. For a deterministic finite automaton M, the minimum number of
states in any equivalent deterministic finite automaton is the same as the
number of equivalence classes of M's states.
With one more observation, we shall be able to present an algorithm for transforming an
automaton into its smallest equivalent machine.
Fact. Equivalent states go to equivalent states under all inputs.
Now we know that if we can find the equivalence classes (or groups of equivalent states) for
an automaton, then we can use these as the states of the smallest equivalent machine. The
machine shown in figure 1 will be used as an example for the intuitive discussion that follows.
Let us first divide the machine's states into two groups: accepting and rejecting states. These
groups are: A = {s2, s7} and B = {s0, s1, s3, s4, s5, s6}. Note that these are equivalent under
the empty string as input.
Then, let us find out if the states in these groups go to the same group under inputs a and b.
As we noted at the beginning of this discussion, the states of group A both go to states in
group B under both inputs. Things are different for the states of group B. The following table
shows the result of applying the inputs to these states. (For example, the input a leads from
s1 to s5 in group B and input b leads to to s2 in group A.)
in state:
s0
s1
s3
s4
s5
s6
a leads to:
b leads to:
Looking at the table we find that the input b helps us distinguish between two of the states (s1
and s6) and the rest of the states in the group since it leads to group A for these two instead
of group B. Thus the states in the set {s0, s3, s4, s5} cannot be equivalent to those in the set
{s1, s6} and we must partition B into two groups. Now we have the groups:
A = {s2, s7}, B = { s0, s3, s4, s5}, C = { s1, s6}
and the next examination of where the inputs lead shows us that s3 is not equivalent to the
rest of group B. We must partition again.
Page 27

Continuing this process until we cannot distinguish between the states in any group by
employing our input tests, we end up with the groups:
A = {s2, s7}, B = {s0, s4, s5}, C = {s1}, D = {s3}, E = { s6}.
In view of the above theoretical definitions and results, it is easy to argue that all of the states
in each group are equivalent because they all go to the same groups under the inputs a and
b. Thus in the sense of Moore the states in each group are truly indistinguishable. We also
can claim that due to the corollary to the Myhill-Nerode theorem, any automaton that accepts
(aa + b)*ab(bb)* must have at least five states.Building the minimum state finite automaton is
now rather straightforward. We merely use the equivalence classes (our groups) as states
and provide the proper transitions. This gives us the finite automaton pictured in figure 2.
Figure 2 - A Minimal Automaton

Here is the state minimization algorithm.
The complexity of this algorithm is O(n2) since we check all of the states each time we
execute the repeat loop and might have to execute the loop n times since it might take an
input of length n to distinguish between two states. A faster algorithm was later developed by
Hopcroft.
Page 28
Source Code:
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
#include<conio.h>
void main()
{
int nstates,minputs,start,nf,ps;
char str[20];
clrscr();
printf("enter no of states");
scanf("%d",&nstates);
printf("enter no of inputs");
scanf("%d",&minputs);
//constructing buffers
int **tran=new int* [nstates];
for(int i=0;i<nstates;i++)
{
tran[i]=new int[minputs];
}
for(i=0;i<nstates;i++)
{
for(int j=0;j<minputs;j++)
{
printf("enter next state for present state %d on input%d ",i,j);
scanf("%d",&tran[i][j]);
}
}
printf("enter starting state");
scanf("%d",&start);
printf("enter no of final states");
scanf("%d",&nf);
int* final=new int[nf];
for(i=0;i<nf;i++)
{
printf("enter the state");
scanf("%d",&final[i]);
}
int *stategroup=new int[nstates],**groupgroup=new int*[nstates];
memset(stategroup,-1,nstates*sizeof(int));
int **groupstate=new int*[nstates];
{
groupstate[i]=new int[nstates+1];
groupgroup[i]=new int[2];
Page 29

memset(groupgroup[i],-1,2*sizeof(int));
memset(groupstate[i],-1,(nstates+1)*sizeof(int));
}
for(i=0;i<nf;i++)
{
stategroup[final[i]]=0;
groupstate[0][final[i]]=1;
}
groupstate[0][nstates]=10;//means a partition having final states
{
if(stategroup[i]!=0)
{
stategroup[i]=1;
groupstate[1][i]=1;
}
}
groupstate[1][nstates]=100;//means a partition having no final states
int groupcount=2;
int count=0,change;
///////////////////////////////minimization starts here/////////////////////////////////////////////////
do
{
change=0;
for(int j=0;j<minputs;j++)
{
int lastgroup,presentgroup,state,latestgroupcount,maxgroupcount=0;
for(int group=0;group<groupcount;group++)
{
count=0;
while(groupstate[group][count]!=1&&count<nstates)
{count++;
}
groupgroup[group][0]=stategroup[tran[ count ][j] ];
groupgroup[group][1]=groupstate[group][nstates];
}
for( group=0;group<groupcount;group++)
{
latestgroupcount=groupcount;
lastgroup=groupgroup[group][0];
count=0;
while(count<nstates)
{
if((state=groupstate[group][count])==1)
{
presentgroup=stategroup[tran[count][j]];
if(presentgroup!=lastgroup)
Page 30

{
change=1;
//find for any group going to presentgroup
int flag=0;
for(int
anygroup=0;anygroup<latestgroupcount;anygroup++)
{
if(groupgroup[anygroup]
[0]==presentgroup
&&groupgroup[anygroup]
[1]==groupgroup[group][1])
{flag=1;
break;
}
}
//change groupgroup
//change stategroup
//change groupstate and groupcount
anygroup=flag==1?
anygroup:latestgroupcount++;
groupgroup[anygroup][0]=presentgroup;
groupgroup[anygroup][1]=groupgroup[group]
[1];
stategroup[count]=anygroup;
groupstate[anygroup][count]=1;
groupstate[group][count]=-1;
groupstate[anygroup]
[nstates]=groupgroup[group][1];
}
}
count++;
}//end of while
if(maxgroupcount<latestgroupcount){maxgroupcount=latestgroupcount;}
}//checking all the groups for loop
groupcount=maxgroupcount;
}//checking all the inputs for loop
}while(change!=0);
/////////////////////////////end of minimization////////////////////////////////////////////////
printf("\n\nGroups\n\n");
for(i=0;i<groupcount;i++)
{
printf("%d ",i);
for(int j=0;j<nstates;j++)
Page 31

{
if(groupstate[i][j]!=-1)
printf(" %d ",j);
}
printf("\n");
}
//deleting buffer
delete stategroup;
delete final;
{
delete tran[i];
delete groupstate[i];
delete groupgroup[i];
}
delete groupgroup;
delete tran;
delete groupstate;
}
Page 32
INPUT:
Recognizer for (aa + b)*ab(bb)*
OUTPUT:
A Minimal Automaton for (aa + b)*ab(bb)*
Page 33
Program #05
/*Construct a program for how to calculate FIRST () & FOLLOW () symbol
for LL(1 ) grammar, if the Context free grammar for LL(1) Construction is
S/aBDh
B/cC
C/bC/@
D/E/F
E/g/@
F/f/@
Compute FIRST () & FOLLOW ( ) symbol for LL(1 ) grammar ? */
The construction of a predictive parser is aided by two functions associated with a grammar
G. These functions, FIRST and FOLLOW, allow us to fill in the entries of a predictive parsing
table for G, whenever possible. Sets of tokens yielded by the FOLLOW function can also be
used as synchronizing tokens during panic-mode error recovery.just suppose for a sec that
you r ll(1) parser and you have n supernatural power of seeing the future of string by one
step.
FIRST()
If is any string of grammar symbols, let FIRST() be the set of terminals that begin the
strings derived from . If then is also in FIRST().
To compute FIRST(X) for all grammar symbols X, apply the following rules until no more
terminals or can be added to any FIRST set:
1. If X is terminal, then FIRST(X) is {X}.
2.If X is a production, then add to FIRST(X).
3.If X is nonterminal and X Y1 Y2 Yk . is a production, then place a in FIRST(X) if for
some i, a is in FIRST(Yi), and is in all of FIRST(Y1), , FIRST(Yi-1); that is, Y1, ,Yi-1
. If is in FIRST(Yj) for all j = 1, 2, , k, then add to FIRST(X). For example, everything in
FIRST(Y1) is surely in
FIRST(X). If Y1 does not derive , then we add nothing more to FIRST(X), but if Y1 , then
we add FIRST(Y2) and so on.
Now, we can compute FIRST for any string X1X2 . . . Xn as follows. Add to FIRST(X1X2
Xn) all the non- symbols of FIRST(X1). Also add the non- symbols of FIRST(X2) if is in
FIRST(X1), the non- symbols of FIRST(X 3) if is in both FIRST(X 1) and FIRST(X2), and so
on. Finally, add to FIRST(X1X2 Xn) if, for all i, FIRST(X i) contains .
Page 34
FOLLOW(A)
Define FOLLOW(A), for nonterminal A, to be the set of terminals a that can appear
immediately to the right of A in some sentential form, that is, the set of terminals a such that
there exists a derivation of the form Sa for some and . Note that there may, at some
time during the derivation, have been symbols between A and a, but if so, they derived and
disappeared. If A can be the rightmost symbol in some sentential form, then $, representing
the input right endmarker, is in FOLLOW(A).
To compute FOLLOW(A) for all nonterminals A, apply the following rules until nothing can be
added to any
FOLLOW set:
1.Place $ in FOLLOW(S), where S is the start symbol and $ is the input right endmarker.
2.If there is a production A , then everything in FIRST(), except for , is placed in
FOLLOW(B).
3.If there is a production A , or a production A where FIRST() contains (i.e.,
),
then everything in FOLLOW(A) is in FOLLOW(B).
EXAMPLE:
Consider the expression grammar :
E T E
E + T E |
T F T
T * F T |
F ( E ) | id
Then:
FIRST(E) = FIRST(T) = FIRST(F) = {( , id}
FIRST(E) = {+, }
FIRST(T) = {*, }
FOLLOW(E) = FOLLOW(E) = {) , $}
Page 35

FOLLOW(T) = FOLLOW(T) = {+, ), $}
FOLLOW(F) = {+, *, ), $
Algorithm:
FIRST :
1. If first character of production is terminal then becomes first.
eg. first(abAb)={a};
2.If a production of this type A->BCD... means all are variable or non-terminal then
if first(B) donot contains null then first(A)=first(B)
stop here.
else
then also check for next non terminal like C here same as above step and
first(A)=First(B)+first(C).
if we get null stop there.
FOLLOW:
if you know first then you can easily go with follow.
1.if a variable is start symbol then follow=$.
2.if a production is of A->(any string1)B(any string2) then follow(B)=first(any string2) {null}
3.if a production is of A->(any string)B then follow(B)=follow(A) .
stop
it is very simple, please try to understand!
Page 36
Source Code:
#include"stdio.h"
#include<conio.h>
char array[10][20],temp[10];
int c,n;void fun(int,int[]);
int fun2(int i,int j,int p[],int key)
{
int k;
if(!key)
{
for(k=0;kc)return 1;
else return 0;
}
}
void fun(int i,int p[])
{
int j,k,key;
for(j=2;array[i][j]!='';j++)
{
if(array[i][j-1]=='/')
{
if(array[i][j]>='A'&&array[i][j]<='Z')
{
key=0;
fun2(i,j,p,key);
}
else
{
key=1;
if(fun2(i,j,p,key))
temp[++c]=array[i][j];
if(array[i][j]=='@'&&p[0]!=-1)
{ //taking ,@, as null symbol.
if(array[p[0]][p[1]]>='A'&&array[p[0]][p[1]]<='Z')
{
key=0;
fun2(p[0],p[1],p,key);
}
else
if(array[p[0]][p[1]]!='/'&&array[p[0]][p[1]]!='')
{
if(fun2(p[0],p[1],p,key))
temp[++c]=array[p[0]][p[1]];
}
}
}
}
}
}
Page 37

void main()
{
int p[2],i,j;
clrscr();
printf("Enter the no. of productions :");
scanf("%d",&n);
printf("Enter the productions :\n");
for(i=0;i<n;i++)
scanf("%s",array[i]);
for(i=0;i<n;i++)
{
c=-1,p[0]=-1,p[1]=-1;
fun(i,p);
printf("First(%c) : [ ",array[i][0]);
for(j=0;j<=c;j++)
printf("%c,",temp[j]);
printf("\b ].\n");
getch();
}
}
INPUT:
S/aBDh
B/cC
C/bC/@
D/E/F
E/g/@
F/f/@
OUTPUT:
Enter the no. of productions :6
Enter the productions :
S/aBDh
B/cC
C/bC/@
D/E/F
E/g/@
F/f/@
First(S) : [ a ].
First(B) : [ c ].
First(C) : [ b,@ ].
First(D) : [ g,@,f ].
First(E) : [ g,@ ].
First(F) : [ f,@ ].
Page 38
Program #06
/*Construct a Operator Precedence Parser for the following given
grammar and also compute Leading () and trailing () symbols of
the given grammar. */
Operator-Precedence Parser
Operator grammar
small, but an important class of grammars we may have an efficient operator precedence
parser (a shift-reduce parser) for an operator grammar.
In an operator grammar, no production rule can have:
-> at the right side
-> two adjacent non-terminals at the right side.
Precedence Relations
In operator-precedence parsing, we define three disjoint precedence relations between

certain pairs of terminals.
.
a < b b has higher precedence than a
a = b b has same precedence as a
.
a > b b has lower precedence than a
The determination of correct precedence relations between terminals are based on the
traditional notions of associativity and precedence of operators. (Unary minus causes a
problem).
The intention of the precedence relations is to find the handle of a right-sentential form,
.
< with marking the left end,
= appearing in the interior of the handle, and
.
> marking the right hand.
In our input string $a a ...a $, we insert the precedence relation between the pairs of
1 2 n
terminals (the precedence relation holds between the terminals in that pair).
Page 39
Using Operator -Precedence Relations

E -> E+E | E-E | E*E | E/E | EÊ | (E) | -E | id
Then the input string id+id*id with the precedence relations inserted will be:
. .
. .
. .
$ < id > + < id > * < id > $
Operator-Precedence Parsing Algorithm

The input string is w$, the initial stack is $ and a table holds precedence relations
between certain terminals
Algorithm:
set p to point to the first symbol of w$ ;
repeat forever
if ( $ is on top of the stack and p points to $ ) then return
else {
let a be the topmost terminal symbol on the stack and let b be the symbol pointed to
by p;
if ( a <. b or a = b ) then {
/* SHIFT */
push b onto the stack;
advance p to the next input symbol;
}
else if ( a .> b ) then
/* REDUCE */
repeat pop stack
until ( the top of stack terminal is related by <. to the terminal most recently popped
);
else error();
}
Page 40
Operator-Precedence Parsing Algorithm Example

stack
input
id+id*id$
$id
$
$+
+id*id$
+id*id$
id*id$
$+id
$+
$+*
*id$
*id$
id$
$+*id
$+*
$+
$
$
$
action
.
$ < id
shift
.
id > + reduce
shift
shift
.
id > * reduce
shift
shift
.
id > $ reduce
.
* > $ reduce
.
+ > $ reduce
accept
E id
E id
E id
E E*E
E E+E
Disadvantages of Operator Precedence Parsing

Disadvantages :
It cannot handle the unary minus (the lexical analyzer should handle the unary minus).
Small class of grammars.
Difficult to decide which language is recognized by the grammar.
Advantages :
simple
powerful enough for expressions in programming languages
Page 41

Source Code:
#include<stdio.h>
#include<conio.h>
#include<ctype.h>
#include<string.h>
char **arr;// contains productions for different Non terminals
//having non terminals at arr[i][0] and rest contains productions
//************************IMPORTANT******************************
//remember symbol @ is just another terminal there
/****************************************************************/
//examples of productions used by program
//S->a/^/(T)
'/' is used to define multiple productions for the same Non
terminal
//T->T,S/S
//arr will contain the productions as follows
//
0 1 2 3 4 5 6 7 8 9 10
//
//arr[0] S a / ^ / ( T )
//arr[1] T T , S / S
int *flagstate;// to see whether leading has already been found for a Non terminal
char **foundlead;//contains the already found leading for a Non terminal
int *NtSymbols;//used to reduce time complexity by storing where the
productions for a non terminal are stored in arr
char **foundtrail;//contains the already found trailing for a Non terminal
int **trailgoesto;//to tell which Non Terminals trailing goes to whose trailing
int **leadgoesto;//to tell which Non Terminals leading goes to whose leading
int strmergeunique(char*dest,const char *source)
{
int strlength=strlen(source),change=0;
for(int i=0;i<strlength;i++)
{
if(!strchr(dest,source[i]))
{
dest[strlen(dest)+1]='\0';
dest[strlen(dest)]=source[i];
change=1;
Page 42

}
}
return change;
}
void leading(int no_of_nonterminals)
{
int nonterminals=0;
char Gamma,str[10]={'\0'};
while(nonterminals<no_of_nonterminals)
{
for(int eachletter=1;arr[nonterminals][eachletter]!='\0';eachletter++)
{
Gamma=arr[nonterminals][eachletter];
if(isupper(Gamma))
{
leadgoesto[ NtSymbols[toascii(Gamma)-65] ][
leadgoesto[NtSymbols[toascii(Gamma)-65]][0]+1 ]=nonterminals;
leadgoesto[ NtSymbols[toascii(Gamma)-65] ][0]++;
continue;
}
else
{
if(Gamma=='\x0')
{break;}
if(Gamma=='/')
{continue;}
str[0]=Gamma;
str[1]='\0';
strmergeunique(foundlead[nonterminals],str);
}
while(arr[nonterminals][eachletter+1]!='\x0'&&
arr[nonterminals][eachletter+1]!='/')
{
eachletter++;
}
}
nonterminals++;
}
int change=0;
Page 43

do
{
change=0;
for(int i=0;i<nonterminals;i++)
{
for(int j=1;j<=leadgoesto[i][0];j++)
{
change|=strmergeunique(foundlead[leadgoesto[i][j]],foundlead[i]);
}
}
}
while(change);
}
void trailing(int no_of_nonterminals)
{
int nonterminals=0;
char Delta,str[10]={'\0'};
while(nonterminals<no_of_nonterminals)
{
int eachletter=strlen(arr[nonterminals])-1;
for(;eachletter>0;eachletter--)
{
Delta=arr[nonterminals][eachletter];
// *******alpha B
if(isupper(Delta))
{
trailgoesto[ NtSymbols[toascii(Delta)-65] ][
trailgoesto[NtSymbols[toascii(Delta)-65]][0]+1 ]=nonterminals;
trailgoesto[ NtSymbols[toascii(Delta)-65] ][0]++;
if(arr[nonterminals][eachletter-1]!='/'&&
eachletter-1>0)
{Delta=arr[nonterminals][eachletter-1];
if(!isupper(Delta))
{
str[0]=Delta;
str[1]='\0';
strmergeunique(foundtrail[nonterminals],str);
}
}
Page 44

}
// B alpha
// ***** alpha
else
{
if(Delta=='/')
{continue;}
str[0]=Delta;
str[1]='\0';
strmergeunique(foundtrail[nonterminals],str);
Delta=arr[nonterminals][eachletter-1];
if(isupper(Delta)&&eachletter-1>0)
{
trailgoesto[ NtSymbols[toascii(Delta)-65] ][
trailgoesto[NtSymbols[toascii(Delta)-65]][0]+1 ]=nonterminals;
trailgoesto[ NtSymbols[toascii(Delta)-65] ][0]++;
}
}
while(eachletter-1>0&&
arr[nonterminals][eachletter-1]!='/')
{
eachletter--;
}
}
nonterminals++;
}
int change=0;
do
{
change=0;
for(int i=0;i<nonterminals;i++)
{
for(int j=1;j<=trailgoesto[i][0];j++)
{
change|=strmergeunique(foundtrail[trailgoesto[i][j]],foundtrail[i]);
}
}
}
while(change);
Page 45

}
void main()
{
int nt;
clrscr();
printf("Enter no.of nonterminals :");
scanf("%d",&nt);
arr=new char*[nt];
foundlead=new char*[nt];
foundtrail=new char*[nt];
flagstate=new int[nt];
leadgoesto=new int*[nt];
trailgoesto=new int*[nt];
NtSymbols=new int[26];
for (int i=0;i<nt;i++)
{
arr[i]=new char[100];
foundlead[i]=new char[10];
memset(foundlead[i],'\0',10);
foundtrail[i]=new char[10];
memset(foundtrail[i],'\0',10);
flagstate[i]=0;
leadgoesto[i]=new int[nt];
leadgoesto[i][0]=0;
trailgoesto[i]=new int[nt];
trailgoesto[i][0]=0;
printf("Enter non terminal ");
cin>>arr[i][0];
flushall();
printf("Enter Production for %c------>",arr[i][0]);
gets(arr[i]+1);
NtSymbols[toascii(arr[i][0])-65]=i;
}
char prod[50];
leading(nt);
trailing(nt);
cout<<endl<<endl;
for(i=0;i<nt;i++)
{
printf("leading (%c)--> { %s }\n",arr[i][0],foundlead[i]);
Page 46

printf("trailing (%c)--> { %s }\n",arr[i][0],foundtrail[i]);
delete arr[i];
delete foundlead[i];
delete foundtrail[i];
delete leadgoesto[i];
delete trailgoesto[i];
}
delete arr;
delete flagstate;
delete foundlead;
delete NtSymbols;
delete foundtrail;
delete trailgoesto;
delete leadgoesto;
}
Page 47
Program #07
Program using LEX to count the number of characters, words, spaces and
Lines in a given input file.
Lexical Analyzer
The main task of the lexical analyzer is to read the input source program, scanning the
characters, and produce a sequence of tokens that the parser can use for syntactic analysis.
The interface may be to be called by the parser to produce one token at a time Maintain
internal state of reading the input program (with lines) Have a function getNextToken that
will read some characters at the current state of the input and return a token to the parser
Other tasks of the lexical analyzer include Skipping or hiding whitespace and comments
Keeping track of line numbers for error reporting Sometimes it can also produce the
annotated lines for error reports Produce the value of the token Optional: Insert identifiers into
the symbol table
Character Level Scanning

The lexical analyzer needs to have a well-defined valid character set Produce invalid
character errors Delete invalid characters from token stream so as not to be used in the
parser analysis
E.g. dont want invisible characters in error messages For every end-of-line, keep track of line
numbers for error reporting Skip over or hide whitespace and comments If comments are
nested (not common), must keep track of nesting to find end of comments May produce
hidden tokens, for convenience of scanner structure Always produce an end-of-file token
Important that quoted strings and comments dont get stuck if an unexpected end of file
occurs
Page 48
Source Code:
%{
int ch=0, bl=0, ln=0, wr=0;
%}
%%
[\n] {ln++;wr++;}
[\t] {bl++;wr++;}
[" "] {bl++;wr++;}
[^\n\t] {ch++;}
%%
int main()
{
FILE *fp;
char file[10];
printf("Enter the filename: ");
scanf("%s", file);
yyin=fp;
yylex();
printf("Character=%d\nBlank=%d\nLines=%d\nWords=%d", ch, bl, ln, wr);
return 0;
}
Page 49
INPUT:
A input file (.doc or any format), counts number of characters, words, spaces and Lines in a
given input file.
OUTPUT:
$cat > input
Girish rao salanke
$lex p1a.l
$cc lex.yy.c ll
$./a.out
Enter the filename: input
Character=16
Blank=2
Lines=1
Word=3
Page 50
Program #08
Program using LEX to count the numbers of comment lines in a given C/
C++/JAVA program. Also eliminate them and copy the resulting program
into separate file.
Compiler-construction tools
Originally, compilers were written from scratch, but now the situation is quite different. A
number of tools are available to ease the burden.
We will study tools that generate scanners and parsers. This will involve us in some theory,
regular expressions for scanners and various grammars for parsers. These techniques are
fairly successful. One drawback can be that they do not execute as fast as hand-crafted
scanners and parsers.
We will also see tools for syntax-directed translation and automatic code generation. The
automation in these cases is not as complete.
Finally, there is the large area of optimization. This is not automated; however, a basic
component of optimization is data-flow analysis (how values are transmitted between parts
of a program) and there are tools to help with this task.
Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes, which are then
mapped into tokens, the latter constituting the output of the lexical analyzer. For example,
any one of the following
x3 = y + 3;
x3 = y + 3 ;
x3 =y+ 3 ;
but not
x 3 = y + 3;
would be grouped into the lexemes x3, =, y, +, 3, and ;.

A token is a <token-name,attribute-value> pair. For example
1. The lexeme x3 would be mapped to a token such as <id,1>. The name id is short for
identifier. The value 1 is the index of the entry for x3 in the symbol table produced by
the compiler. This table is used to pass information to subsequent phases.
2. The lexeme = would be mapped to the token <=>. In reality it is probably mapped to a
pair, whose second component is ignored. The point is that there are many different
identifiers so we need the second component, but there is only one assignment symbol
=.
3. The lexeme y is mapped to the token <id,2>
Page 51

4. The lexeme + is mapped to the token <+>.
5. The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters.
It is mapped to <number,something>, but what is the something. On the one hand
there is only one 3 so we could just use the token <number,3>. However, there can be
a difference between how this should be printed (e.g., in an error message produced
by subsequent phases) and how it should be stored (fixed vs. float vs double). Perhaps
the token should point to the symbol table where an entry for this kind of 3 is stored.
Another possibility is to have a separate numbers table.
6. The lexeme ; is mapped to the token <;>.
Note that non-significant blanks are normally removed during scanning. In C, most blanks are
non-significant. Blanks inside strings are an exception.
Note that we can define identifiers, numbers, and the various symbols and punctuation
without using recursion (compare with parsing below).
Page 52
Source Code:
%{
int com=0;
%}
%%
"/*"[^\n]+"*/" {com++;fprintf(yyout, " ");}
%%
int main()
{
printf("Write a C program\n");
yyout=fopen("output", "w");
yylex();
printf("Comment=%d\n",com);
return 0;
}
Page 53

OUTPUT:
$lex p1b.l
$cc lex.yy.c ll
$./a.out
Write a C program
#include<stdio.h>
int main()
{
int a, b;
/*float c;*/
printf(Hai);
/*printf(Hello);*/
}
[Ctrl-d]
Comment=1
$cat output
#include<stdio.h>
int main()
{
int a, b;
printf(Hai);
}
Page 54
Program #09
Program using LEX to recognize a valid arithmetic expression and to
recognize the identifiers and operators present. Print them separately.
Some Regular Expressions for Flex
\"[^"]*\"
string
"\t"|"\n"\" "
whitespace (most common forms)
[a-zA-Z]
[a-zA-Z_][a-zA-Z0-9_]* identifier: allows a, aX, a45__
[0-9]*"."[0-9]+
allows .5 but not 5.
[0-9]+"."[0-9]*
allows 5. but not .5
[0-9]*"."[0-9]*
allows . by itself !!
The user must supply a lexical analyzer to read the input stream and communicate tokens
(with values, if desired) to the parser. The lexical analyzer is an integer-valued function called
yylex. The function returns an integer, the token number, representing the kind of token read.
If there is a value associated with that token, it should be assigned to the external variable
yylval.
The parser and the lexical analyzer must agree on these token numbers in order for
communication between them to take place. The numbers may be chosen by Yacc, or chosen
by the user. In either case, the ``# define'' mechanism of C is used to allow the lexical
analyzer to return these numbers symbolically. For example, suppose that the token name
DIGIT has been defined in the declarations section of the Yacc specification file. The relevant
portion of the lexical analyzer might look like:
yylex(){
extern int yylval;
int c;
...
c = getchar();
...
switch( c ) {
Page 55

...
case '0':
case '1':
...
case '9':
yylval = c-'0';
return( DIGIT );
...
}
...
The intent is to return a token number of DIGIT, and a value equal to the numerical value of
the digit. Provided that the lexical analyzer code is placed in the programs section of the
specification file, the identifier DIGIT will be defined as the token number associated with the
token DIGIT.
This mechanism leads to clear, easily modified lexical analyzers; the only pitfall is the need to
avoid using any token names in the grammar that are reserved or significant in C or the
parser; for example, the use of token names if or while will almost certainly cause severe
difficulties when the lexical analyzer is compiled. The token name error is reserved for error
handling, and should not be used naively.
As mentioned above, the token numbers may be chosen by Yacc or by the user. In the default
situation, the numbers are chosen by Yacc. The default token number for a literal character is
the numerical value of the character in the local character set. Other names are assigned
token numbers starting at 257.
To assign a token number to a token (including literals), the first appearance of the token
name or literal in the declarations section can be immediately followed by a nonnegative
integer. This integer is taken to be the token number of the name or literal. Names and literals
not defined by this mechanism retain their default definition. It is important that all token
numbers be distinct.
Page 56

For historical reasons, the endmarker must have token number 0 or negative. This token
number cannot be redefined by the user; thus, all lexical analyzers should be prepared to
return 0 or negative as a token number upon reaching the end of their input.
A very useful tool for constructing lexical analyzers is the Lex program developed by Mike
Lesk.[8] These lexical analyzers are designed to work in close harmony with Yacc parsers.
The specifications for these lexical analyzers use regular expressions instead of grammar
rules. Lex can be easily used to produce quite complicated lexical analyzers, but there remain
some languages (such as FORTRAN) which do not fit any theoretical framework, and whose
lexical analyzers must be crafted by hand.
Source Code:
%{
#include<stdio.h>
int a=0,s=0,m=0,d=0,ob=0,cb=0;
int flaga=0, flags=0, flagm=0, flagd=0;
%}
id [a-zA-Z]+
%%
{id} {printf("\n %s is an identifier\n",yytext);}
[+] {a++;flaga=1;}
[-] {s++;flags=1;}
[*] {m++;flagm=1;}
[/] {d++;flagd=1;}
[(] {ob++;}
[)] {cb++;}
%%
int main()
{
printf("Enter the expression\n");
yylex();
if(ob-cb==0)
{
printf("Valid expression\n");
}
else
{
printf("Invalid expression");
}
printf("\nAdd=%d\nSub=%d\nMul=%d\nDiv=%d\n",a,s,m,d);
printf("Operators are: \n");
Page 57

if(flaga)
printf("+\n");
if(flags)
printf("-\n");
if(flagm)
printf("*\n");
if(flagd)
printf("/\n");
return 0;
}
OUTPUT:
$lex p2a.l
$cc lex.yy.c ll
$./a.out
Enter the expression
(a+b*c)
a is an identifier
b is an identifier
c is an identifier
[Ctrl-d]
Valid expression
Add=1
Sub=0
Mul=1
Div=0
Operators are:
+
*
Page 58
Program #13
Program using LEX to recognize whether a given sentence is simple or
compound.
%{
int flag=0;
%}
%%
(""[aA][nN][dD]"")|(""[oO][rR]"")|(""[bB][uU][tT]"") {flag=1;}
%%
int main()
{
printf("Enter the sentence\n");
yylex();
if(flag==1)
printf("\nCompound sentence\n");
else
printf("\nSimple sentence\n");
return 0;
}
Page 59
OUTPUT:
$lex p2b.l
$cc lex.yy.c ll
$./a.out
Enter the sentence
I am Pooja
I am Pooja
[Ctrl-d]
Simple sentence
$./a.out
Enter the sentence
CSE or ISE
CSE or ISE
[Ctrl-d]
Compound sentence
Page 60
Program #14
Program using LEX to recognize and count the number of identifiers in a
given input file.
Lex helps write programs whose control flow is directed by instances of regular expressions in
the input stream. It is well suited for editor-script type transformations and for segmenting
input in preparation for a parsing routine.
Lex source is a table of regular expressions and corresponding program fragments. The table
is translated to a program which reads an input stream, copying it to an output stream and
partitioning the input into strings which match the given expressions. As each such string is
recognized the corresponding program fragment is executed. The recognition of the
expressions is performed by a deterministic finite automaton generated by Lex. The program
fragments written by the user are executed in the order in which the corresponding regular
expressions occur in the input stream.
Source Code:
%{
#include<stdio.h>
int count=0;
%}
op [+-*/]
letter [a-zA-Z]
digitt [0-9]
id {letter}*|({letter}{digitt})+
notid ({digitt}{letter})+
%%
[\t\n]+
("int")|("float")|("char")|("case")|("default")| ("if")|("for")|("printf")|("scanf") {printf("%s is a
keyword\n", yytext);}
{id} {printf("%s is an identifier\n", yytext); count++;}
{notid} {printf("%s is not an identifier\n", yytext);}
%%
int main()
{
FILE *fp;
char file[10];
printf("\nEnter the filename: ");
Page 61

scanf("%s", file);
fp=fopen(file,"r");
yyin=fp;
yylex();
printf("Total identifiers are: %d\n", count);
return 0;
}
OUTPUT:
$cat > input
int
float
78f
90gh
a
d
are case
default
printf
scanf
$lex p3.l
$cc lex.yy.c ll
$./a.out
Enter the filename: input
int is a keyword
float is a keyword
78f is not an identifier
90g is not an identifier
h is an identifier
a is an identifier
d is an identifier
are is an identifier
case is a keyword
default is a keyword
printf is a keyword
scanf is a keyword
total identifiers are: 4
Page 62
Program #15
YACC
(Yet Another Compiler Compiler ) program to recognize a valid
arithmetic expression that uses operators +, -, * and /.
Basic Specifications
Names refer to either tokens or nonterminal symbols. Yacc requires token names to be
declared as such. In addition, for reasons discussed in Section 3, it is often desirable to
include the lexical analyzer as part of the specification file; it may be useful to include other
programs as well. Thus, every specification file consists of three sections: the declarations,
(grammar) rules, and programs. The sections are separated by double percent ``%%'' marks.
(The percent ``%'' is generally used in Yacc specifications as an escape character.)
In other words, a full specification file looks like
declarations
%%
rules
%%
programs
The declaration section may be empty. Moreover, if the programs section is omitted, the
second %% mark may be omitted also;
thus, the smallest legal Yacc specification is
%%
rules
Blanks, tabs, and newlines are ignored except that they may not appear in names or multicharacter reserved symbols. Comments may appear wherever a name is legal; they are
enclosed in /* . . . */, as in C and PL/I.
The rules section is made up of one or more grammar rules. A grammar rule has the form:
A : BODY ;
A represents a nonterminal name, and BODY represents a sequence of zero or more names
and literals. The colon and the semicolon are Yacc punctuation.
Names may be of arbitrary length, and may be made up of letters, dot ``.'', underscore ``_'',
and non-initial digits. Upper and lower case letters are distinct. The names used in the body of
a grammar rule may represent tokens or nonterminal symbols.
Page 63

A literal consists of a character enclosed in single quotes ``'''. As in C, the backslash ``\'' is an
escape character within literals, and all the C escapes are recognized. Thus
'\n' newline
'\r' return
'\'' single quote ``'''
'\\' backslash ``\''
'\t' tab
'\b' backspace
'\f' form feed
'\xxx' ``xxx'' in octal
For a number of technical reasons, the NUL character ('\0' or 0) should never be used in
grammar rules.
If there are several grammar rules with the same left hand side, the vertical bar ``|'' can be
used to avoid rewriting the left hand side. In addition, the semicolon at the end of a rule can
be dropped before a vertical bar. Thus the grammar rules
A
A
A
:
:
:
B C D ;
E F ;
G ;
can be given to Yacc as

A
:
|
|
;
B C D
E F
G
It is not necessary that all grammar rules with the same left side appear together in the
grammar rules section, although it makes the input much more readable, and easier to
change.
If a nonterminal symbol matches the empty string, this can be indicated in the obvious way:
empty : ;
Names representing tokens must be declared; this is most simply done by writing
%token name1 name2 . . .
in the declarations section. (See Sections 3 , 5, and 6 for much more discussion). Every name
not defined in the declarations section is assumed to represent a nonterminal symbol. Every
nonterminal symbol must appear on the left side of at least one rule.
Of all the nonterminal symbols, one, called the start symbol, has particular importance. The
parser is designed to recognize the start symbol; thus, this symbol represents the largest,
most general structure described by the grammar rules. By default, the start symbol is taken
to be the left hand side of the first grammar rule in the rules section. It is possible, and in fact
Page 64

desirable, to declare the start symbol explicitly in the declarations section using the %start
keyword:
%start symbol
The end of the input to the parser is signaled by a special token, called the endmarker. If the
tokens up to, but not including, the endmarker form a structure which matches the start
symbol, the parser function returns to its caller after the endmarker is seen; it accepts the
input. If the endmarker is seen in any other context, it is an error.
It is the job of the user-supplied lexical analyzer to return the endmarker when appropriate;
see section 3, below. Usually the endmarker represents some reasonably obvious I/O status,
such as `ènd-of-file'' or `ènd-of-record''.
2: Actions
With each grammar rule, the user may associate actions to be
performed each time the rule is recognized in the input process. These actions may return
values, and may obtain the values returned by previous actions. Moreover, the lexical
analyzer can return values for tokens, if desired.
An action is an arbitrary C statement, and as such can do input and output, call subprograms,
and alter external vectors and variables. An action is specified by one or more statements,
enclosed in curly braces ``{'' and ``}''. For example,
A
'(' B ')'
{
hello( 1, "abc" ); }
and
XXX
YYY ZZZ
{
printf("a message\n");
flag = 25; }
are grammar rules with actions.
Page 65

Source Code:
LEX
%{
#include"y.tab.h"
extern yylval;
%}
%%
[0-9]+ {yylval=atoi(yytext); return NUMBER;}
[a-zA-Z]+ {return ID;}
[\t]+ ;
\n {return 0;}
. {return yytext[0];}
%%
YACC
%{
#include<stdio.h>
%}
%token NUMBER ID
%left '+' '-'
%left '*' '/'
%%
expr: expr '+' expr
|expr '-' expr
|expr '*' expr
|expr '/' expr
|'-'NUMBER
|'-'ID
|'('expr')'
|NUMBER
|ID
;
%%
main()
{
printf("Enter the expression\n");
yyparse();
printf("\nExpression is valid\n");
exit(0);
}
int yyerror(char *s)
{
printf("\nExpression is invalid");
exit(0);
}
Page 66
OUTPUT:
$lex p4a.l
$yacc d p4a.y
$cc lex.yy.c y.tab.c ll
$./a.out
(a*b+5)
Expression is valid
$./a.out
(a+6-)
Expression is invalid
Page 67
Program #15
YACC (Yet Another Compiler Compiler ) program to recognize a valid
variable, which starts with a letter, followed by any number of letters or
digits.
Yacc turns the specification file into a C program, which parses the input according to
the specification given. The algorithm used to go from the specification to the parser is
complex, and will not be discussed here (see the references for more information). The
parser itself, however, is relatively simple, and understanding how it works, while not
strictly necessary, will nevertheless make treatment of error recovery and ambiguities
much more comprehensible.
Source Code:
LEX
%{
#include"y.tab.h"
extern yylval;
%}
%%
[0-9]+ {yylval=atoi(yytext); return DIGIT;}
[a-zA-Z]+ {return LETTER;}
[\t] ;
\n return 0;
. {return yytext[0];}
%%
YACC
%{
#include<stdio.h>
%}
%token LETTER DIGIT
%%
variable: LETTER|LETTER rest
;
rest: LETTER rest
|DIGIT rest
|LETTER
|DIGIT
;
%%
Page 68

main()
{
yyparse();
printf("The string is a valid variable\n");
}
int yyerror(char *s)
{
printf("this is not a valid variable\n");
exit(0);
}
OUTPUT:
$lex p4b.l
$yacc d p4b.y
$./a.out
input34
The string is a valid variable
$./a.out
89file
This is not a valid variable
Page 69
Program #16
Implement a program of YACC (Yet Another Compiler Compiler ) to recognize
strings aaab, abbb, ab and a using the grammar (anbn, n>= 0).
Yacc: Yet Another Compiler-Compiler
Yacc provides a general tool for imposing structure on the input to a computer program. The
Yacc user prepares a specification of the input process; this includes rules describing the
input structure, code to be invoked when these rules are recognized, and a low-level routine
to do the basic input. Yacc then generates a function to control the input process. This
function, called a parser, calls the user-supplied low-level input routine (the lexical analyzer)
to pick up the basic items (called tokens) from the input stream. These tokens are organized
according to the input structure rules, called grammar rules; when one of these rules has
been recognized, then user code supplied for this rule, an action, is invoked; actions have the
ability to return values and make use of the values of other actions.
Yacc is written in a portable dialect of C[1] and the actions, and output subroutine, are in C as
well. Moreover, many of the syntactic conventions of Yacc follow C.
The heart of the input specification is a collection of grammar rules. Each rule describes an
allowable structure and gives it a name. For example, one grammar rule might be
date : month_name day ',' year ;
Here, date, month_name, day, and year represent structures of interest in the input process;
presumably, month_name, day, and year are defined elsewhere. The comma ``,'' is enclosed
in single quotes; this implies that the comma is to appear literally in the input. The colon and
semicolon merely serve as punctuation in the rule, and have no significance in controlling the
input. Thus, with proper definitions, the input
July 4, 1776
might be matched by the above rule.
Page 70

An important part of the input process is carried out by the lexical analyzer. This user routine
reads the input stream, recognizing the lower level structures, and communicates these
tokens to the parser. For historical reasons, a structure recognized by the lexical analyzer is
called a terminal symbol, while the structure recognized by the parser is called a nonterminal
symbol. To avoid confusion, terminal symbols will usually be referred to as tokens.
There is considerable leeway in deciding whether to recognize structures using the lexical
analyzer or grammar rules. For example, the rules
month_name : 'J' 'a' 'n' ;
month_name : 'F' 'e' 'b' ;
...
month_name : 'D' 'e' 'c' ;
might be used in the above example. The lexical analyzer would only need to recognize
individual letters, and month_name would be a nonterminal symbol. Such low-level rules tend
to waste time and space, and may complicate the specification beyond Yacc's ability to deal
with it. Usually, the lexical analyzer would recognize the month names, and return an
indication that a month_name was seen; in this case, month_name would be a token.
Literal characters such as ``,'' must also be passed through the lexical analyzer, and are also
considered tokens.
Specification files are very flexible. It is realively easy to add to the above example the rule
date : month '/' day '/' year ;
allowing
7 / 4 / 1776
as a synonym for
July 4, 1776
Page 71

In most cases, this new rule could be ``slipped in'' to a working system with minimal effort,
and little danger of disrupting existing input.
The input being read may not conform to the specifications. These input errors are detected
as early as is theoretically possible with a left-to-right scan; thus, not only is the chance of
reading and computing with bad input data substantially reduced, but the bad data can usually
be quickly found. Error handling, provided as part of the input specifications, permits the
reentry of bad data, or the continuation of the input process after skipping over the bad data.
In some cases, Yacc fails to produce a parser when given a set of specifications. For
example, the specifications may be self contradictory, or they may require a more powerful
recognition mechanism than that available to Yacc. The former cases represent design errors;
the latter cases can often be corrected by making the lexical analyzer more powerful, or by
rewriting some of the grammar rules. While Yacc cannot handle all possible specifications, its
power compares favorably with similar systems; moreover, the constructions which are
difficult for Yacc to handle are also frequently difficult for human beings to handle. Some
users have reported that the discipline of formulating valid Yacc specifications for their input
revealed errors of conception or design early in the program development.
Source Code:
LEX
%{
#include"y.tab.h"
%}
%%
[a] return A;
[b] return B;
%%
YACC
%{
#include<stdio.h>
%}
%token A B
%%
S:A S B
|
Page 72

;
%%
main()
{
printf("Enter the string\n");
if(yyparse()==0)
{
printf("Valid\n");
}
}
yyerror(char *s)
{
printf("%s\n",s);
}
OUTPUT:
$lex p5b.l
$yacc d p5b.y
$./a.out
Enter the string
aabb
[Ctrl-d]
Valid
$./a.out
Enter the string
aab
syntax error
Page 73
Program #17
Program to recognize the Context free grammar (anbn, n>= 10), Where a & b
are input symbols of the grammar.
A context-free grammar (CFG) is a set of recursive rewriting rules (or productions)
used to generate patterns of strings.
A CFG consists of the following components:
a set of terminal symbols, which are the characters of the alphabet that appear in the strings
generated by the grammar.
a set of nonterminal symbols, which are placeholders for patterns of terminal symbols that can
be generated by the nonterminal symbols.
a set of productions, which are rules for replacing (or rewriting) nonterminal symbols (on the
left side of the production) in a string with other nonterminal or terminal symbols (on the right
side of the production).
a start symbol, which is a special nonterminal symbol that appears in the initial string generated
by the grammar.
To generate a string of terminal symbols from a CFG, we:

Begin with a string consisting of the start symbol;
Apply one of the productions with the start symbol on the left hand size, replacing the start
symbol with the right hand side of the production;
Repeat the process of selecting nonterminal symbols in the string, and replacing them with the
right hand side of some corresponding production, until all nonterminals have been replaced by
terminal symbols.
Page 74
Finding all the Strings Generated by a CFG

There are several ways to generate the (possibly infinite) set of strings generated by a grammar. We
will show a technique based on the number of productions used to generate the string.
Find the strings generated by the following CFG:
<S> --> w c d <S> | b <L> e | s
<L> --> <L> ; <S> | <S>
0. Applying at most zero productions, we cannot generate any strings.

1. Applying at most one production (starting with the start symbol) we can generate {wcd<S>, b<L>e,
s}. Only one of these strings consists entirely of terminal symbols, so the set of terminal strings we can
generate using at most one production is {s}.
2. Applying at most two productions, we can generate all the strings we can generate with one
production, plus any additional strings we can generate with an additional production.
{wcdwcd<S>, wcdb<L>e, wcds, b<S>e, b<L>;<S>e,s}
The set of terminal strings we can generate with at most two productions is therefore {s, wcds}.
3. Applying at most three productions, we can generate:
{wcdwcdwcd<S>, wcdwcdb<L>e, wcdwcds, wcdb<L>;<S>e,
wcdb<S>e, bwcd<S>e, bb<L>ee, bse, b<L>;<S>Se,
b<S><S>e, b<L>wcd<S>e, b<L>b<L>ee, b<L>se }
The set of terminal strings we can generate with at most three

productions is therefore {s, wcds, wcdwcds, bse}.
Page 75

We can repeat this process for an arbitrary number of steps N, and find all the strings the grammar can
generate by applying N productions.
Source Code:
LEX
%{
#include"y.tab.h"
%}
%%
[a] return A;
[b] return B;
%%
YACC
%{
#include<stdio.h>
%}
%token A B
%%
stat:exp B
;
exp:A A A A A A A A A exp1
;
exp1:A exp2
|A
|A A exp2
|A A A exp2
|A A A A exp2
;
exp2:A
;
%%
main()
{
printf("Enter the string\n");
if(yyparse()==0)
{
printf("Valid\n");
}
}
yyerror(char *s)
{
printf("error\n");
Page 76

}
OUTPUT:
$lex p6.l
$yacc d p6.y
$./a.out
Enter the string
aaaaaaaaaaab
Valid
$./a.out
Enter the string
aab
error
Page 77
Program #18
Write a C program to implement the syntax-directed definition of if E then
S1 and if E then S1 else S2.
/* Input to the program is assumed to be syntactically correct. The expression of if statement,
for true condition and statement for false condition are enclosed in parenthesis */
Some programming languages permit the user to use words like `ìf'', which are normally
reserved, as label or variable names, provided that such use does not conflict with the legal
use of these names in the programming language. This is extremely hard to do in the
framework of Yacc; it is difficult to pass information to the lexical analyzer telling it ``this
instance of ìf' is a keyword, and that instance is a variable''. The user can make a stab at it,
using the mechanism described in the last subsection, but it is difficult.
A number of ways of making this easier are under advisement. Until then, it is better that the
keywords be reserved; that is, be forbidden for use as variable names. There are powerful
stylistic reasons for preferring this, anyway.
10: Advanced Topics
This section discusses a number of advanced features of Yacc.
Simulating Error and Accept in Actions
The parsing actions of error and accept can be simulated in an action by use of macros
YYACCEPT and YYERROR. YYACCEPT causes yyparse to return the value 0; YYERROR
causes the parser to behave as if the current input symbol had been a syntax error; yyerror is
called, and error recovery takes place. These mechanisms can be used to simulate parsers
with multiple endmarkers or context-sensitive syntax checking.
Accessing Values in Enclosing Rules.
An action may refer to values returned by actions to the left of the current rule. The
mechanism is simply the same as with ordinary actions, a dollar sign followed by a digit, but in
this case the digit may be 0 or negative. Consider
sent
adj noun verb adj noun

{ look at the sentence . . . }
;
adj
:
THE
|
YOUNG {
...
;
$$ = THE; }
$$ = YOUNG; }
Page 78

noun
:
|
DOG
{
$$ = DOG; }
CRONE
{
if( $0 == YOUNG ){
printf( "what?\n" );
}
$$ = CRONE;
}
;
...
In the action following the word CRONE, a check is made that the preceding token shifted
was not YOUNG. Obviously, this is only possible when a great deal is known about what
might precede the symbol noun in the input. There is also a distinctly unstructured flavor
about this. Nevertheless, at times this mechanism will save a great deal of trouble, especially
when a few combinations are to be excluded from an otherwise regular structure.
Source Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int parsecondition(char[],int,char*,int);
void gen(char [],char [],char[],int);
int main()
{
int counter = 0,stlen =0,elseflag=0;
char stmt[60]; // contains the input statement
char strB[54]; // holds the expression for 'if'
condition
char strS1[50]; // holds the statement for true
condition
char strS2[45]; // holds the statement for false
condition
printf("Format of if statement \n Example...\n");
printf("if (a<b) then (s=a);\n");
printf("if (a<b) then (s=a) else (s=b);\n\n");
printf("Enter the statement \n");
gets(stmt);
stlen = strlen(stmt);
counter = counter + 2; // increment over 'if'
counter = parsecondition(stmt,counter,strB,stlen);
if(stmt[counter]==')')
Page 79

counter++;
counter = counter + 3; // increment over 'then'
counter = parsecondition(stmt,counter,strS1,stlen);
if(stmt[counter+1]==';')
{ //reached end of statement, generate the output
printf("\n Parsing the input statement....");
gen(strB,strS1,strS2,elseflag);
return 0;
}
if(stmt[counter]==')')
counter++; // increment over ')'
counter = counter + 3; // increment over 'else'
counter = parsecondition(stmt,counter,strS2,stlen);
counter = counter + 2; // move to the end of
statement
if(counter == stlen)
{ //generate the output
elseflag = 1;
printf("\n Parsing the input statement....");
gen(strB,strS1,strS2,elseflag);
return 0;
}
return 0;
}
/* Function : parsecondition
Description : This function parses the statement
from the given index to get the statement enclosed
in ()
Input : Statement, index to begin search, string
to store the condition, total string length
Output : Returns 0 on failure, Non zero counter
value on success
*/
int parsecondition(char input[],int cntr,char
*dest,int totallen)
{
int index = 0,pos = 0;
while(input[cntr]!= '(' && cntr <= totallen)
cntr++;
if(cntr >= totallen)
return 0;
index = cntr;
while (input[cntr]!=')')
cntr++;
if(cntr >= totallen)
Page 80

return 0;
while(index<=cntr)
dest[pos++] = input[index++];
dest[pos]='\0'; //null terminate the string
return cntr; //non zero value
}
/* Function : gen ()
Description : This function generates three
address code
Input : Expression, statement for true condition,
statement for false condition, flag to denote if
the 'else' part is present in the statement
output :Three address code
*/
void gen(char B[],char S1[],char S2[],int elsepart)
{
int Bt =101,Bf = 102,Sn =103;
printf("\n\tIf %s goto %d",B,Bt);
printf("\n\tgoto %d",Bf);
printf("\n%d: ",Bt);
printf("%s",S1);
if(!elsepart)
printf("\n%d: ",Bf);
else
{ printf("\n\tgoto %d",Sn);
printf("\n%d: %s",Bf,S2);
printf("\n%d:",Sn);
}
}
Page 81

OUTPUT
Format of if statement
Example ...
if (a<b) then (s=a);
if (a<b) then (s=a) else (s=b);
Enter the statement
if (a<b) then (x=a) else (x=b);
Parsing the input statement....
If (a<b) goto 101
goto 102
101: (x=a)
goto 103
102: (x=b)
103:
Page 82

Compiler Lab Manual

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Compiler Lab Manual

Hochgeladen von

Copyright:

Verfügbare Formate

Chameli Devi School Of Engineering, Indore

Scope of the course

Lab Manual of Compiler Design

Chameli Devi School Of Engineering, Indore

Department of Computer Science & Information Technology

Compiler Design Laboratory

Compiler Design [CS-701] Practical

Lab Manual of Compiler Design

Chameli Devi School Of Engineering, Indore

Lab Manual of Compiler Design

Chameli Devi School Of Engineering, Indore

Lab Manual of Compiler Design

Chameli Devi School Of Engineering, Indore

CHAMELI DEVI GROUP OF INSTITUTES, INDORE

Date of Experiment Date of Submission Signature & Remarks

Lab Manual of Compiler Design

Chameli Devi School Of Engineering, Indore

SUBJECT NAME-------------SUBJECT CODE-----------------CLASS-----------------------------SEMESTER-------------------------

FACULTY NAME / SIGNATURE------------------FACULTY NAME / SIGNATURE--------------------

Lab Manual of Compiler Design

Chameli Devi School Of Engineering, Indore

edition Advanced Compiler Design and

Isnt it an old discipline?

Lab Manual of Compiler Design

Chameli Devi School Of Engineering, Indore

Languages and machines

Why Study Compilers?

Applications of Compiler Technology & Tools

Processing XML/other to generate documents, code, etc.

Processing domain-specific and device-specific languages.

Implementing a server that uses a protocol such as http or imap

Natural language processing, for example, spam filter, search, document

Translating from a hardware description language to the schematic of a circuit

Lab Manual of Compiler Design

Chameli Devi School Of Engineering, Indore

Automatic graph layout (graphviz, for example)

Extending an existing programming language

Program analysis and improvement tools

Original languages included Fortran and Cobol

Often multi-pass compilers (to facilitate memory reuse)

Compiler development helped in better programming language design

Early development focused on syntactic analysis and optimization

Commercially, compilers are developed by very large software groups

Recognizes legal (and illegal) programs

Generate correct code

Manage storage of all variables and code

Agreement on format for object (or assembly) code

Lab Manual of Compiler Design

Chameli Devi School Of Engineering, Indore

Principles of Compiler Design Syllabus

5. Code GenerationIntroduction: Issues in code generation, Target machine description,

Chameli Devi School Of Engineering, Indore

The Analysis-Synthesis Model of Compilation

Other Tools that Use the Analysis-Synthesis Model

Text formatters (e.g. TeX and LaTeX)

Silicon compilers (e.g. VHDL)

Query interpreters/compilers (Databases)

Lab Manual of Compiler Design

Chameli Devi School Of Engineering, Indore

Chameli Devi School Of Engineering, Indore

Lab Manual of Compiler Design

Chameli Devi School Of Engineering, Indore

1. Implement a Program to count character of a given string without

Figure 1 - Recognizer for (aa + b)ab(bb)