Sie sind auf Seite 1von 82

Chameli Devi School Of Engineering, Indore

CDGI'S
CHAMELIDEVI SCHOOL OF ENGINEERING, INDORE.
DEPARTMENT OF COMPUTER SCIENCE
COURSE FILE CONTENT
Year
2013-14

Class/Sem
Sem-VII
CS-A &B

Branch
CSE

Subject
Compiler
Design

Faculty Name
Mr. Ajay Jaiswal

Content
1.
2.
3.
4.
5.

Scope of the course


Disciplines involved in it
Abstract view for a compiler
Front-end and back-end tasks
Modules

6.
List of Practicals
7.
LINUX O/S
8.
C++ / JAVA program backup

Lab Manual of Compiler Design

Page 1

Chameli Devi School Of Engineering, Indore


CHAMELI DEVI GROUP OF INSTITUTES, INDORE.
Chameli Devi School Of Engineering.

Department of Computer Science & Information Technology

Compiler Design Laboratory

Compiler Design [CS-701] Practical


( Year

2013-2014)

Name: ________________________________________
Roll No.: _______________________________________
Branch: _______________________________________
Semester:_______________________________________
Section: ________________________________________
Subject: _______________________________________

Certified by:
Total Practical :
Practicals performed:
Faculty Name/Signature

Lab Manual of Compiler Design

Page 2

Chameli Devi School Of Engineering, Indore


CHAMELI DEVI GROUP OF INSTITUTES, INDORE
DEPARTMENT OF COMPUTER SCIENCE & INFORMATION TECHNOLOGY
COMPILER DESIGN LABORATORY (CS-701)
PRACTICAL LIST
S.NO
1.

Name of Practical
Implement a Program to count character of a given string without using
space & with using space for the string a handle of a string is a substring
that matches the right side of a production rule. .

2.

Create a file (Comiler.cc) & Implement a Program to read all the content of a
Compiler.cc (how many lines, how many words and how many character in
the file) .

3.
Write a program for implementation of Deterministic Finite Automata (DFA)
for the strings accepted by (abbb, abb, ab,a).
4.
Construction of Minimization of Deterministic Finite Automata for the given
diagram & recognize the string (aa + b)*ab(bb)*.
5.
Construct a program for how to Compute FIRST () & FOLLOW () symbol for
LL(1 ) grammar, if the Context free grammar for LL(1) Construction is..?
6.
Construct a Operator Precedence Parser for the following given grammar
and also compute Leading () and trailing () symbols of the given grammar.
7.
Program using LEX to count the number of characters, words, spaces and
lines in a given input file.
8.
Program using LEX to count the numbers of comment lines in a given C
program. Also eliminate them and copy the resulting program into separate
file.

Lab Manual of Compiler Design

Page 3

Chameli Devi School Of Engineering, Indore


9.
Program using LEX to recognize a valid arithmetic expression and to
recognize the identifiers and operators present. Print them separately.
10.
Program using LEX to recognize whether a given sentence is simple or
compound.
11.
Program using LEX to recognize and count the number of identifiers in a
given input file.
12.
Implement a YACC (Yet Another Compiler Compiler ) program to recognize a
valid arithmetic expression that uses operators +, -, * and /.
13.
Implement YACC (Yet Another Compiler Compiler ) program to recognize a
valid variable, which starts with a letter, followed by any number of letters or
digits.
14.
YACC (Yet Another Compiler Compiler ) program to recognize strings
aaab, abbb, ab and a using the grammar (anbn, n>= 0).
15.
Program to recognize the Context free grammar (anbn, n>= 10), Where a & b
are input symbols of the grammar.
16.
Write a C program to implement the syntax-directed definition of if E then
S1 and if E then S1 else S2.

Lab Manual of Compiler Design

Page 4

Chameli Devi School Of Engineering, Indore

CHAMELI DEVI GROUP OF INSTITUTES, INDORE


Chameli Devi School Of Engineering
Department of Computer Science & Information Technology
Basic Computer Engineering Laboratory

Practical List
S.No.
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.

Practical

Date of Experiment Date of Submission Signature & Remarks

Practical 1
Practical 2
Practical 3
Practical 4
Practical 5
Practical 6
Practical 7
Practical 8
Practical 9
Practical 10
Practical 11
Practical 12
Practical 13
Practical 14
Practical 15
Practical 16
Practical 17
Practical 18
Practical 19
Practical 20
Practical 21
Practical 22
Practical 23
Practical 24
Practical 25

Head of Department

Lab Manual of Compiler Design

Faculty

Page 5

Chameli Devi School Of Engineering, Indore

LAB MANUAL

SUBJECT NAME-------------SUBJECT CODE-----------------CLASS-----------------------------SEMESTER-------------------------

FACULTY NAME / SIGNATURE------------------FACULTY NAME / SIGNATURE--------------------

Lab Manual of Compiler Design

Page 6

Chameli Devi School Of Engineering, Indore

Course scope
Aim:
To learn techniques of a modern compiler

Main reference:
Compilers Principles, Techniques and Tools, Second Edition by Alfred V. Aho, Ravi Sethi,
Jeffery D. Ullman

Supplementary references:
Modern compiler construction in Java 2
Implementation by Muchnick.

nd

edition Advanced Compiler Design and

Subjects
Lexical analysis (Scanning)
Syntax Analysis (Parsing)
Syntax Directed Translation
Intermediate Code Generation
Run-time environments
Code Generation
Machine Independent Optimization

Compiler learning

Isnt it an old discipline?


Yes, it is a well-established discipline
Algorithms, methods and techniques are researched and developed in early stages of
computer science growth
There are many compilers around and many tools to generate them automatically
So, why we need to learn it?
Although you may never write a full compiler
But the techniques we learn is useful in many tasks like writing an interpreter for a scripting
language, validation checking for forms and so on.

Lab Manual of Compiler Design

Page 7

Chameli Devi School Of Engineering, Indore

Terminology
Compiler:
a program that translates an executable program in one language into an executable program
in another language. we expect the program produced by the compiler to be better, in some
way, than the original

Interpreter:
a program that reads an executable program and produces the results of running that
program. usually, this involves executing the source program in some fashion. Our course is
mainly about compilers but many of the same issues arise in interpreters.

Disciplines involved

Algorithms

Languages and machines

Operating systems

Computer architectures

Why Study Compilers?


General background information for good software engineer.
Increases understanding of language semantics.
Seeing the machine code generated for language constructs helps understand performance
issues for languages.
Teaches good language design.
New devices may need device-specific languages.
New business fields may need domain-specific languages.

Applications of Compiler Technology & Tools

Processing XML/other to generate documents, code, etc.

Processing domain-specific and device-specific languages.

Implementing a server that uses a protocol such as http or imap

Natural language processing, for example, spam filter, search, document


comprehension, summary generation

Translating from a hardware description language to the schematic of a circuit

Lab Manual of Compiler Design

Page 8

Chameli Devi School Of Engineering, Indore

Automatic graph layout (graphviz, for example)

Extending an existing programming language

Program analysis and improvement tools

Abstract view
Compilers translate from a source language (typically a high level language) to a functionally
equivalent target language (typically the machine code of a particular machine or a machineindependent virtual machine).
Compilers for high level programming languages are among the larger and more complex
pieces of software

Original languages included Fortran and Cobol

Often multi-pass compilers (to facilitate memory reuse)

Compiler development helped in better programming language design

Early development focused on syntactic analysis and optimization

Commercially, compilers are developed by very large software groups

Current focus is on optimization and smart use of resources for modern RISC (reduced
instruction set computer) architectures.

Source
code

Compiler
errors

Mach ine
code

Recognizes legal (and illegal) programs

Generate correct code

Manage storage of all variables and code

Agreement on format for object (or assembly) code

Lab Manual of Compiler Design

Page 9

Chameli Devi School Of Engineering, Indore

Principles of Compiler Design Syllabus


Introduction to Compiler:
Translator issues, why to write compiler, compilation process in brief, front end and backend
model, compiler construction tools, Interpreter and the related issues, Cross compiler,
Incremental compiler, Boot strapping.

1.

Lexical Analysis
Review of lexical analysis: alphabet, token, lexical error, Block schematic of lexical
analyser, Automatic construction of lexical analyser (LEX), LEX specification details.

2. Syntax Analysis
Introduction: Role of parsers, Parsing technique: Top down-RD parser, Predictive LL
(k) parser, Bottom up-shift-Reduce, SLR, LR(k), LALR etc. using ambiguous grammars,
Error detection and recovery, Automatic construction of parser (YACC), YACC
specifications.
semantic analysis
Need of semantic analysis, type checking and type conversation.
3. Syntax directed translation
Syntax directed definitions, construction of syntax trees, bottom-up evaluation of Sattribute definition, L-attributed definition , Top-down translation, Bottom-up evaluation of
inherited attributes.
Intermediate code Generation: Intermediate code generation for declaration,
assignment, iterative statements, case statements, arrays, structures, conditional
statements, Boolean expressions, procedure calls, Intermediate code Generation using
YACC
4. Run Time Storage Organisation
Storage allocation strategies, static, dynamic storage allocation, allocation strategies for
block structured and non-block structured languages; O.S. support required for IO
statements. (e.g. printf, scanf) and memory allocation deallocation related statement.
(e.g. new, malloc)

5. Code GenerationIntroduction: Issues in code generation, Target machine description,


Basic blocks and flow graphs, next use representation of basic blocks, Peephole optimisation,
Lab Manual of Compiler Design

Page 10

Chameli Devi School Of Engineering, Indore


DAG generating code from a DAG, Dynamic programming, Code generator-generator
concept.
6. Code OptimisationIntroduction, classification of Optimisation, principle sources of
Optimisation, m/c dependent Optimisation, m/c independent optimisation, Optimisation
of basic blocks, loops in flowgraphs, Optimising transformation: compile time evaluation,
Common sub-expression elimination, variable propagation, code Movement, strength
reduction, dead code elimination and loop optimisation, local optimisation, DAG based
local optimisation. Global optimisation: control and data flow analysis, control flow
analysis-concepts and definition, data flow analysis, data flow analysis, Computing data
flow information, meet over paths,Data flow equations. Iterative data flow analysis:
Available exprns, live range identification.

Definition
A compiler is a computer program (or set of programs) that transforms source code written in
a programming language (the source language) into another computer language (the target
language, often having a binary form known as object code).

The Analysis-Synthesis Model of Compilation


There are two parts to compilation:
Analysis determines the operations implied by the source program which are recorded in a
tree structure
Synthesis takes the tree structure and translates the operations therein into the target
program

Other Tools that Use the Analysis-Synthesis Model


Editors (syntax highlighting)
Pretty printers (e.g. Doxygen)
Static checkers (e.g. Lint and Splint)

Interpreters

Text formatters (e.g. TeX and LaTeX)

Silicon compilers (e.g. VHDL)

Query interpreters/compilers (Databases)

Lab Manual of Compiler Design

Page 11

Chameli Devi School Of Engineering, Indore

Grouping of phases

Incremental compiler
The term incremental compiler may refer to two different types of compiler.

Imperative programming
Interactive Programming
In imperative programming and software development, an incremental compiler is one that
when invoked, takes only the changes of a known set of source files and updates any
corresponding output files (in the compiler's target language, often bytecode) that may
already exist from previous compilations. By effectively building upon previously compiled
output files, the incremental compiler avoids the wasteful recompilation entire source files,
Lab Manual of Compiler Design

Page 12

Chameli Devi School Of Engineering, Indore


where most of the code remains unchanged. For most incremental compilers, compiling a
program with small changes to its source code is usually near instantaneous. It can be said
that an incremental compiler reduces the granularity of a language's traditional compilation
units while maintaining the language's semantics, such that the compiler can append and
replace smaller parts.

Cross compiler
A cross compiler is a compiler capable of creating executable code for a platform other than
the one on which the compiler is run.
Cross compiler tools are used to generate executables for embedded system or multiple
platforms.
It is used to compile for a platform upon which it is not feasible to do the compiling, like micro
controllers that don't support an operating system.

Phases of a Compiler

Source Program
1

3
Symbol-table
Manager

Lexical Analyzer

Syntax Analyzer

Semantic Analyzer
Error Handler

Intermediate Code
Generator

Code Optimizer

Code Generator

Target Program

Lab Manual of Compiler Design

Page 13

Chameli Devi School Of Engineering, Indore

Program #01

1. Implement a Program to count character of a given string without


using space & with using space for the string a handle of a string is a
substring that matches the right side of a production rule. /*
lexical analysis (scanning)
token stream

1
ident
"val"

3
assign
-

2
number
10

4
times
-

1
ident
"val"

5
plus
-

1
ident
"i"

token number
token value

syntax analysis (parsing)

Statement

syntax tree

Expression
Term
ident = number * ident + ident

Lexical Analysis
Stream of characters is grouped into tokens
Examples of tokens are identifiers, reserved words, integers, doubles or floats, delimiters,
operators and special symbols
int a; a = a + 2;
int
reserved word
a
identifier
;
special symbol
a
identifier
=
operator
a
identifier
+
operator
2
integer constant
;
special symbol

Examples of Token
Token: A sequence of characters to be treated as a
single unit.
Examples of tokens.
Reserved words (e.g. begin, end, struct, if etc.)
Keywords (integer, true etc.)
Operators (+, &&, ++ etc)
Identifiers (variable names, procedure names, parameter names)
Literal constants (numeric, string, character constants etc.)
Punctuation marks (:, , etc.)
Lab Manual of Compiler Design

Page 14

Chameli Devi School Of Engineering, Indore

A ./* string with using space.*/


Source Code:
#include<iostream.h>
void main()
{
char c[30];
int n=0;
cout<<"Enter the String" <<"\n";
cin>>c;
for(int i=0;c[i]!='\0';i++)
{
n=n+1;
}
cout<<"Length of the string is"<<n;
getch();
}

Input:
Type any strings with combinations of letters

Output:
Total No. of letters/ characters of given string.

Lab Manual of Compiler Design

Page 15

Chameli Devi School Of Engineering, Indore

B ./* string without using space.*/


Source Code:
#include <iostream>
using namespace std;
#include <conio.h>
#include <iomanip>
# include <string>
const int SIZE = 100;
void input (char*);
void wordCount (char*);
void longWord (char*, int);
void numbers (char*);
void outputLetterCounts (int letterCount[]);
int main()
{
char string [100] = {'/0'};
char word[100] = {'/0'};
int x = 0; int n = 0;
int letterCount[100] = {0};
input(string);
wordCount(string);
longWord(string, x);
numbers (string);
outputLetterCounts(letterCount);
return 0;
}
void input (char *enter)
{
cout<<"Enter sentence(s) "<< std::endl;
std::cin.getline(enter, SIZE);
int len = strlen( enter );
}
void wordCount (char *word2)
{
int cnt = 0;
while(*word2 != '\0')
{
while(isspace(*word2))
{
++word2;
}
Lab Manual of Compiler Design

Page 16

Chameli Devi School Of Engineering, Indore


if(*word2 != '\0')
{
++cnt;
while(!isspace(*word2) && *word2 != '\0')
++word2;
}
}
std::cout<<"Number of words: "<<cnt<<endl;
}
void outputLetterCounts(int letterCount[])
{
for (int l = 0; l < 26; l++)
{
if (letterCount[l] > 0)
{
cout << letterCount[l] << " " << char('a' + l) << endl;
}
}
}
void numbers (char *word2)
{
int num = 0;
int amount = strlen(word2);
for(int i = 0; i < amount; i++)
{
if(isdigit(word2[i]))
num++;
}
std::cout<<"Digits: "<<num<<endl;
}
void longWord (char *temp, int x )
{
int counter = 0;
int max_word = -1;
int length = int(strlen(temp));
for(int i=0; i<length; i++)
{
if(temp[i] !=' ')
{
counter++;
}
else if(temp[i]==' ')
Lab Manual of Compiler Design

Page 17

Chameli Devi School Of Engineering, Indore


{
if(counter > max_word)
{
max_word = counter;
}
counter = 0;
}
}
std::cout <<"Longest word:" << max_word;
}

Input:
Type any strings with combinations of letters / characters of sentence..

Output:
Total No. of letters/ characters of given string without spaces (Excluding white spaces.).

Lab Manual of Compiler Design

Page 18

Chameli Devi School Of Engineering, Indore

Program #02
/* Create a file (Comiler.cc) & Implement a Program to read all the content
of a Compiler.cc (how many lines, how many words and how many
character in the file) .*/

Source Code:
#include<stdio.h>
int main()
{
int noc=0,now=0,nol=0;
FILE *fw,*fr;
char fname[20],ch;
printf("\n enter the source file name Comiler.cc");
gets(fname);
fr=fopen(fname,"r");
if(fr==NULL)
{
printf("\n error \n");
exit(0);
}
ch=fgetc(fr);
while(ch!=EOF)
{
noc++;
if(ch==' ');
now++;
if(ch=='\n')
{
nol++;
now++;
}
ch=fgetc(fr);
}
fclose(fr);
printf("\n total no of character=%d",noc);
printf("\n total no of words=%d",now);
printf("\n total no of lines=%d",nol);
return 0;
}

Lab Manual of Compiler Design

Page 19

Chameli Devi School Of Engineering, Indore


#include <fstream>
#include <iostream>
#include <string>
//#include <cctype>
using namespace std;
int main()
{
string name;

// name of the file to be opened

char ch;

// character to be read into the loop

fstream fileName;

// declare the fstream object

// list of integers to hold the count information


int characters = 0,
words = 0,
sentences = 0,
lines = 0;
// list of constants used to define words, lines, and sentences
const char EOLN = '\n',
// end of line character
SENT = '.',
// end of sentence character
BLANK =
' ';
// end of word character
do
{
int count;
// just a counter
// Prompt for user input and open the specified file
if (count != 1)
{
cout << "Enter the name of a file: ";
}
else
{
cout << "File Not Found!\nEnter the name of a file: ";
}
cin >> name;
cin.ignore();
// ignore the next character in the buffer
fileName.open(name.c_str());
// convert name to a c-style string
count = 1;
} while (!fileName);
// use a while loop to perform the required operations
while ( !fileName.eof())
{
Lab Manual of Compiler Design

Page 20

Chameli Devi School Of Engineering, Indore


char prevChar;
fileName.get(ch);
cout << ch;

// track the last character analyzed


// get each character from the file
// and print it to the screen

characters ++;

// count the characters in the file

// count the words in the file


if ((ch == BLANK) && (prevChar != BLANK))
{
words ++;
}
if ( ch == SENT )
// count the sentences in the file
{
sentences ++;
}
// end of the sentence if
if ( ch == EOLN )
// count the lines in the file
{
lines ++;
words ++;
// count the next word here
}
// end of end-of-line if
prevChar = ch;
}

// end of while loop

fileName.clear();
fileName.close();

// clear the fail state


// close the file

// display a summary of the file analysis


cout << "\nThere are " << characters << " characters in this file.\n";
cout << "There are " << words << " words in this file.\n";
cout << "There are " << sentences << " sentences in this file.\n";
cout << "There are " << lines << " lines in this file.\n";
return 0;
}

Lab Manual of Compiler Design

Page 21

Chameli Devi School Of Engineering, Indore


Input:
Type any strings with combinations of letters / characters of sentence..Construct automata.

Output:
Sequences of transition states with accepting states.

Lab Manual of Compiler Design

Page 22

Chameli Devi School Of Engineering, Indore

Program #03
/*Write a program for implementation of Deterministic Finite Automata
(DFA) for the strings accepted by (abbb, abb, ab,a).*/
Deterministic finite automata (DFA) :
A deterministic finite automaton (DFA) is a 5-tuple: (S, , T, s, A)
an alphabet ()
a set of states (S)
a transition function (T : S S).
a start state
a set of accept states
The machine starts in the start state and reads in a string of symbols from its alphabet. It
uses the transition function T to determine the next state using the current state and the
symbol just read. If, when it has finished reading, it is in an accepting state, it is said to accept
the string, otherwise it is said to reject the string. The set of strings it accepts form a
language, which is the language the DFA recognizes.
Non-Deterministic Finite Automaton (N-DFA):
A Non-Deterministic Finite Automaton (NFA) is a 5-tuple: (S, , T, s, A)
an alphabet ()
a set of states (S)
a transition function (T: S S).
a start state
a set of accept states
Where P(S) is the power set of S and is the empty string. The machine starts in the start
state and reads in a string of symbols from its alphabet. It uses the transition relation T to
determine the next state(s) using the current state and the symbol just read or the empty
string. If, when it has finished reading, it is in an accepting state, it is said to accept the string,
otherwise it is said to reject the string. The set of strings it accepts form a language, which is
Lab Manual of Compiler Design

Page 23

Chameli Devi School Of Engineering, Indore


the language the NFA recognizes.

Source Code:
#include<stdio.h>
#include<iostream.h>
#include<stdlib.h>
#include<conio.h>
void main()
{int n,m,start,nf,ps;
char str[20];
clrscr();
printf("enter no of states");
scanf("%d",&n);
printf("enter no of inputs");
scanf("%d",&m);
//constructing buffers
int **tran=new int* [n];
for(int i=0;i<n;i++)
{
tran[i]=new int[m];
}
for(i=0;i<n;i++)
{
for(int j=0;j<m;j++)
{
printf("enter next state for present state %d on input%d ",i,j);
scanf("%d",&tran[i][j]);
}
}
printf("enter starting state");
scanf("%d",&start);
printf("enter no of final states");
scanf("%d",&nf);
int* final=new int[nf];
for(i=0;i<nf;i++)
{
printf("enter the state");
scanf("%d",&final[i]);
}
printf("enter string");
scanf("%s",str);
i=0;
ps=start;
Lab Manual of Compiler Design

Page 24

Chameli Devi School Of Engineering, Indore


while(str[i]!='\0')
{
ps=tran[ps][str[i]-48];
i++;
}
for(i=0;i<n;i++)
{
if(ps==final[i])
{
printf("accepted");
break;
}
}
//deleting buffer
delete final;
for(i=0;i<n;i++)
{
delete tran[i];
}
delete tran;
}

Lab Manual of Compiler Design

Page 25

Chameli Devi School Of Engineering, Indore

Program #04
/*Construction of minimization of Deterministic Finite Automata
for the given diagram & recognize the string (aa + b)*ab(bb)*. */
Minimizing Finite Automata
Consider the finite automaton shown in figure 1 which accepts the regular set denoted by the
regular expression (aa + b)*ab(bb)*. Accepting states are colored yellow while rejecting states
are blue.

Figure 1 - Recognizer for (aa + b)*ab(bb)*


Closer examination reveals that states s2 and s7 are really the same since they are both
accepting states and both go to s6 under the input b and both go to s3 under an a. So, why
not merge them and form a smaller machine? In the same manner, we could argue for
merging states s0 and s5. Merging states like this should produce a smaller automaton that
accomplishes exactly the same task as our original one.
From these observations, it seems that the key to making finite automata smaller is to
recognize and merge equivalent states. To do this, we must agree upon the definition of
equivalent states. Here is one formulation of what Moore defined as indistinguishable states.
Definition. Two states in a finite automaton M are equivalent if and only if for
every string x, if M is started in either state with x as input, it either accepts in
both cases or rejects in both cases.
Another way to say this is that the machine does the same thing when started in either state.
This is especially necessary when finite automata produce output.

Lab Manual of Compiler Design

Page 26

Chameli Devi School Of Engineering, Indore


Two questions remain. First, how does one find equivalent states, and then, exactly how
valuable is this information? We shall answer the second question first by providing a
corollary to a famous theorem proven long ago by Myhill [3] and Nerode [4].
Corollary. For a deterministic finite automaton M, the minimum number of
states in any equivalent deterministic finite automaton is the same as the
number of equivalence classes of M's states.
With one more observation, we shall be able to present an algorithm for transforming an
automaton into its smallest equivalent machine.
Fact. Equivalent states go to equivalent states under all inputs.
Now we know that if we can find the equivalence classes (or groups of equivalent states) for
an automaton, then we can use these as the states of the smallest equivalent machine. The
machine shown in figure 1 will be used as an example for the intuitive discussion that follows.
Let us first divide the machine's states into two groups: accepting and rejecting states. These
groups are: A = {s2, s7} and B = {s0, s1, s3, s4, s5, s6}. Note that these are equivalent under
the empty string as input.
Then, let us find out if the states in these groups go to the same group under inputs a and b.
As we noted at the beginning of this discussion, the states of group A both go to states in
group B under both inputs. Things are different for the states of group B. The following table
shows the result of applying the inputs to these states. (For example, the input a leads from
s1 to s5 in group B and input b leads to to s2 in group A.)
in state:

s0

s1

s3

s4

s5

s6

a leads to:

b leads to:

Looking at the table we find that the input b helps us distinguish between two of the states (s1
and s6) and the rest of the states in the group since it leads to group A for these two instead
of group B. Thus the states in the set {s0, s3, s4, s5} cannot be equivalent to those in the set
{s1, s6} and we must partition B into two groups. Now we have the groups:
A = {s2, s7}, B = { s0, s3, s4, s5}, C = { s1, s6}
and the next examination of where the inputs lead shows us that s3 is not equivalent to the
rest of group B. We must partition again.

Lab Manual of Compiler Design

Page 27

Chameli Devi School Of Engineering, Indore


Continuing this process until we cannot distinguish between the states in any group by
employing our input tests, we end up with the groups:
A = {s2, s7}, B = {s0, s4, s5}, C = {s1}, D = {s3}, E = { s6}.
In view of the above theoretical definitions and results, it is easy to argue that all of the states
in each group are equivalent because they all go to the same groups under the inputs a and
b. Thus in the sense of Moore the states in each group are truly indistinguishable. We also
can claim that due to the corollary to the Myhill-Nerode theorem, any automaton that accepts
(aa + b)*ab(bb)* must have at least five states.Building the minimum state finite automaton is
now rather straightforward. We merely use the equivalence classes (our groups) as states
and provide the proper transitions. This gives us the finite automaton pictured in figure 2.

Figure 2 - A Minimal Automaton


Here is the state minimization algorithm.

The complexity of this algorithm is O(n2) since we check all of the states each time we
execute the repeat loop and might have to execute the loop n times since it might take an
input of length n to distinguish between two states. A faster algorithm was later developed by
Hopcroft.
Lab Manual of Compiler Design

Page 28

Chameli Devi School Of Engineering, Indore

Source Code:
#include<stdio.h>
#include<iostream.h>
#include<string.h>
#include<stdlib.h>
#include<conio.h>
void main()
{
int nstates,minputs,start,nf,ps;
char str[20];
clrscr();
printf("enter no of states");
scanf("%d",&nstates);
printf("enter no of inputs");
scanf("%d",&minputs);
//constructing buffers
int **tran=new int* [nstates];
for(int i=0;i<nstates;i++)
{
tran[i]=new int[minputs];
}
for(i=0;i<nstates;i++)
{
for(int j=0;j<minputs;j++)
{
printf("enter next state for present state %d on input%d ",i,j);
scanf("%d",&tran[i][j]);
}
}
printf("enter starting state");
scanf("%d",&start);
printf("enter no of final states");
scanf("%d",&nf);
int* final=new int[nf];
for(i=0;i<nf;i++)
{
printf("enter the state");
scanf("%d",&final[i]);
}
int *stategroup=new int[nstates],**groupgroup=new int*[nstates];
memset(stategroup,-1,nstates*sizeof(int));
int **groupstate=new int*[nstates];
for(i=0;i<nstates;i++)
{
groupstate[i]=new int[nstates+1];
groupgroup[i]=new int[2];
Lab Manual of Compiler Design

Page 29

Chameli Devi School Of Engineering, Indore


memset(groupgroup[i],-1,2*sizeof(int));
memset(groupstate[i],-1,(nstates+1)*sizeof(int));
}
for(i=0;i<nf;i++)
{
stategroup[final[i]]=0;
groupstate[0][final[i]]=1;
}
groupstate[0][nstates]=10;//means a partition having final states
for(i=0;i<nstates;i++)
{
if(stategroup[i]!=0)
{
stategroup[i]=1;
groupstate[1][i]=1;
}
}
groupstate[1][nstates]=100;//means a partition having no final states
int groupcount=2;
int count=0,change;
///////////////////////////////minimization starts here/////////////////////////////////////////////////
do
{
change=0;
for(int j=0;j<minputs;j++)
{
int lastgroup,presentgroup,state,latestgroupcount,maxgroupcount=0;
for(int group=0;group<groupcount;group++)
{
count=0;
while(groupstate[group][count]!=1&&count<nstates)
{count++;
}
groupgroup[group][0]=stategroup[tran[ count ][j] ];
groupgroup[group][1]=groupstate[group][nstates];
}
for( group=0;group<groupcount;group++)
{
latestgroupcount=groupcount;
lastgroup=groupgroup[group][0];
count=0;
while(count<nstates)
{
if((state=groupstate[group][count])==1)
{
presentgroup=stategroup[tran[count][j]];
if(presentgroup!=lastgroup)
Lab Manual of Compiler Design

Page 30

Chameli Devi School Of Engineering, Indore


{
change=1;
//find for any group going to presentgroup
int flag=0;
for(int
anygroup=0;anygroup<latestgroupcount;anygroup++)
{
if(groupgroup[anygroup]
[0]==presentgroup
&&groupgroup[anygroup]
[1]==groupgroup[group][1])
{flag=1;
break;
}
}
//change groupgroup
//change stategroup
//change groupstate and groupcount
anygroup=flag==1?
anygroup:latestgroupcount++;
groupgroup[anygroup][0]=presentgroup;
groupgroup[anygroup][1]=groupgroup[group]
[1];
stategroup[count]=anygroup;
groupstate[anygroup][count]=1;
groupstate[group][count]=-1;
groupstate[anygroup]
[nstates]=groupgroup[group][1];
}
}
count++;
}//end of while
if(maxgroupcount<latestgroupcount){maxgroupcount=latestgroupcount;}
}//checking all the groups for loop
groupcount=maxgroupcount;
}//checking all the inputs for loop
}while(change!=0);
/////////////////////////////end of minimization////////////////////////////////////////////////

printf("\n\nGroups\n\n");
for(i=0;i<groupcount;i++)
{
printf("%d ",i);
for(int j=0;j<nstates;j++)
Lab Manual of Compiler Design

Page 31

Chameli Devi School Of Engineering, Indore


{
if(groupstate[i][j]!=-1)
printf(" %d ",j);
}
printf("\n");
}
//deleting buffer
delete stategroup;
delete final;
for(i=0;i<nstates;i++)
{
delete tran[i];
delete groupstate[i];
delete groupgroup[i];
}
delete groupgroup;
delete tran;
delete groupstate;
}

Lab Manual of Compiler Design

Page 32

Chameli Devi School Of Engineering, Indore

INPUT:
Recognizer for (aa + b)*ab(bb)*

OUTPUT:
A Minimal Automaton for (aa + b)*ab(bb)*

Lab Manual of Compiler Design

Page 33

Chameli Devi School Of Engineering, Indore

Program #05
/*Construct a program for how to calculate FIRST () & FOLLOW () symbol
for LL(1 ) grammar, if the Context free grammar for LL(1) Construction is
S/aBDh
B/cC
C/bC/@
D/E/F
E/g/@
F/f/@
Compute FIRST () & FOLLOW ( ) symbol for LL(1 ) grammar ? */
The construction of a predictive parser is aided by two functions associated with a grammar
G. These functions, FIRST and FOLLOW, allow us to fill in the entries of a predictive parsing
table for G, whenever possible. Sets of tokens yielded by the FOLLOW function can also be
used as synchronizing tokens during panic-mode error recovery.just suppose for a sec that
you r ll(1) parser and you have n supernatural power of seeing the future of string by one
step.
FIRST()
If is any string of grammar symbols, let FIRST() be the set of terminals that begin the
strings derived from . If then is also in FIRST().
To compute FIRST(X) for all grammar symbols X, apply the following rules until no more
terminals or can be added to any FIRST set:
1. If X is terminal, then FIRST(X) is {X}.
2.If X is a production, then add to FIRST(X).
3.If X is nonterminal and X Y1 Y2 Yk . is a production, then place a in FIRST(X) if for
some i, a is in FIRST(Yi), and is in all of FIRST(Y1), , FIRST(Yi-1); that is, Y1, ,Yi-1
. If is in FIRST(Yj) for all j = 1, 2, , k, then add to FIRST(X). For example, everything in
FIRST(Y1) is surely in
FIRST(X). If Y1 does not derive , then we add nothing more to FIRST(X), but if Y1 , then
we add FIRST(Y2) and so on.
Now, we can compute FIRST for any string X1X2 . . . Xn as follows. Add to FIRST(X1X2
Xn) all the non- symbols of FIRST(X1). Also add the non- symbols of FIRST(X2) if is in
FIRST(X1), the non- symbols of FIRST(X 3) if is in both FIRST(X 1) and FIRST(X2), and so
on. Finally, add to FIRST(X1X2 Xn) if, for all i, FIRST(X i) contains .

Lab Manual of Compiler Design

Page 34

Chameli Devi School Of Engineering, Indore

FOLLOW(A)
Define FOLLOW(A), for nonterminal A, to be the set of terminals a that can appear
immediately to the right of A in some sentential form, that is, the set of terminals a such that
there exists a derivation of the form Sa for some and . Note that there may, at some
time during the derivation, have been symbols between A and a, but if so, they derived and
disappeared. If A can be the rightmost symbol in some sentential form, then $, representing
the input right endmarker, is in FOLLOW(A).
To compute FOLLOW(A) for all nonterminals A, apply the following rules until nothing can be
added to any
FOLLOW set:
1.Place $ in FOLLOW(S), where S is the start symbol and $ is the input right endmarker.
2.If there is a production A , then everything in FIRST(), except for , is placed in
FOLLOW(B).
3.If there is a production A , or a production A where FIRST() contains (i.e.,
),
then everything in FOLLOW(A) is in FOLLOW(B).
EXAMPLE:
Consider the expression grammar :
E T E
E + T E |
T F T
T * F T |
F ( E ) | id
Then:
FIRST(E) = FIRST(T) = FIRST(F) = {( , id}
FIRST(E) = {+, }
FIRST(T) = {*, }
FOLLOW(E) = FOLLOW(E) = {) , $}
Lab Manual of Compiler Design

Page 35

Chameli Devi School Of Engineering, Indore


FOLLOW(T) = FOLLOW(T) = {+, ), $}
FOLLOW(F) = {+, *, ), $

Algorithm:
FIRST :
1. If first character of production is terminal then becomes first.
eg. first(abAb)={a};
2.If a production of this type A->BCD... means all are variable or non-terminal then
if first(B) donot contains null then first(A)=first(B)
stop here.
else
then also check for next non terminal like C here same as above step and
first(A)=First(B)+first(C).
if we get null stop there.
FOLLOW:
if you know first then you can easily go with follow.
1.if a variable is start symbol then follow=$.
2.if a production is of A->(any string1)B(any string2) then follow(B)=first(any string2) {null}
3.if a production is of A->(any string)B then follow(B)=follow(A) .
stop
it is very simple, please try to understand!

Lab Manual of Compiler Design

Page 36

Chameli Devi School Of Engineering, Indore

Source Code:
#include"stdio.h"
#include<conio.h>
char array[10][20],temp[10];
int c,n;void fun(int,int[]);
int fun2(int i,int j,int p[],int key)
{
int k;
if(!key)
{
for(k=0;kc)return 1;
else return 0;
}
}
void fun(int i,int p[])
{
int j,k,key;
for(j=2;array[i][j]!='';j++)
{
if(array[i][j-1]=='/')
{
if(array[i][j]>='A'&&array[i][j]<='Z')
{
key=0;
fun2(i,j,p,key);
}
else
{
key=1;
if(fun2(i,j,p,key))
temp[++c]=array[i][j];
if(array[i][j]=='@'&&p[0]!=-1)
{ //taking ,@, as null symbol.
if(array[p[0]][p[1]]>='A'&&array[p[0]][p[1]]<='Z')
{
key=0;
fun2(p[0],p[1],p,key);
}
else
if(array[p[0]][p[1]]!='/'&&array[p[0]][p[1]]!='')
{
if(fun2(p[0],p[1],p,key))
temp[++c]=array[p[0]][p[1]];
}
}
}
}
}
}
Lab Manual of Compiler Design

Page 37

Chameli Devi School Of Engineering, Indore


void main()
{
int p[2],i,j;
clrscr();
printf("Enter the no. of productions :");
scanf("%d",&n);
printf("Enter the productions :\n");
for(i=0;i<n;i++)
scanf("%s",array[i]);
for(i=0;i<n;i++)
{
c=-1,p[0]=-1,p[1]=-1;
fun(i,p);
printf("First(%c) : [ ",array[i][0]);
for(j=0;j<=c;j++)
printf("%c,",temp[j]);
printf("\b ].\n");
getch();
}
}

INPUT:
S/aBDh
B/cC
C/bC/@
D/E/F
E/g/@
F/f/@

OUTPUT:
Enter the no. of productions :6
Enter the productions :
S/aBDh
B/cC
C/bC/@
D/E/F
E/g/@
F/f/@
First(S) : [ a ].
First(B) : [ c ].
First(C) : [ b,@ ].
First(D) : [ g,@,f ].
First(E) : [ g,@ ].
First(F) : [ f,@ ].

Lab Manual of Compiler Design

Page 38

Chameli Devi School Of Engineering, Indore

Program #06
/*Construct a Operator Precedence Parser for the following given
grammar and also compute Leading () and trailing () symbols of
the given grammar. */
Operator-Precedence Parser
Operator grammar

small, but an important class of grammars we may have an efficient operator precedence
parser (a shift-reduce parser) for an operator grammar.
In an operator grammar, no production rule can have:
-> at the right side
-> two adjacent non-terminals at the right side.

Precedence Relations

In operator-precedence parsing, we define three disjoint precedence relations between


certain pairs of terminals.
.
a < b b has higher precedence than a
a = b b has same precedence as a
.
a > b b has lower precedence than a
The determination of correct precedence relations between terminals are based on the
traditional notions of associativity and precedence of operators. (Unary minus causes a
problem).
The intention of the precedence relations is to find the handle of a right-sentential form,
.
< with marking the left end,
= appearing in the interior of the handle, and
.
> marking the right hand.
In our input string $a a ...a $, we insert the precedence relation between the pairs of
1 2 n
terminals (the precedence relation holds between the terminals in that pair).

Lab Manual of Compiler Design

Page 39

Chameli Devi School Of Engineering, Indore

Using Operator -Precedence Relations


E -> E+E | E-E | E*E | E/E | E^E | (E) | -E | id
Then the input string id+id*id with the precedence relations inserted will be:
. .
. .
. .
$ < id > + < id > * < id > $

Operator-Precedence Parsing Algorithm


The input string is w$, the initial stack is $ and a table holds precedence relations
between certain terminals
Algorithm:
set p to point to the first symbol of w$ ;
repeat forever
if ( $ is on top of the stack and p points to $ ) then return
else {
let a be the topmost terminal symbol on the stack and let b be the symbol pointed to
by p;
if ( a <. b or a = b ) then {
/* SHIFT */
push b onto the stack;
advance p to the next input symbol;
}
else if ( a .> b ) then
/* REDUCE */
repeat pop stack
until ( the top of stack terminal is related by <. to the terminal most recently popped
);
else error();
}

Lab Manual of Compiler Design

Page 40

Chameli Devi School Of Engineering, Indore

Operator-Precedence Parsing Algorithm Example


stack

input

id+id*id$

$id
$
$+

+id*id$
+id*id$
id*id$

$+id
$+
$+*

*id$
*id$
id$

$+*id

$+*

$+
$

$
$

action
.
$ < id
shift
.
id > + reduce
shift
shift
.
id > * reduce
shift
shift
.
id > $ reduce
.
* > $ reduce
.
+ > $ reduce
accept

E id

E id

E id
E E*E
E E+E

Disadvantages of Operator Precedence Parsing


Disadvantages :
It cannot handle the unary minus (the lexical analyzer should handle the unary minus).
Small class of grammars.
Difficult to decide which language is recognized by the grammar.
Advantages :
simple
powerful enough for expressions in programming languages

Lab Manual of Compiler Design

Page 41

Chameli Devi School Of Engineering, Indore


Source Code:
#include<iostream.h>
#include<stdio.h>
#include<conio.h>
#include<ctype.h>
#include<string.h>
char **arr;// contains productions for different Non terminals
//having non terminals at arr[i][0] and rest contains productions
//************************IMPORTANT******************************
//remember symbol @ is just another terminal there
/****************************************************************/
//examples of productions used by program
//S->a/^/(T)
'/' is used to define multiple productions for the same Non
terminal
//T->T,S/S
//arr will contain the productions as follows
//
0 1 2 3 4 5 6 7 8 9 10
//
//arr[0] S a / ^ / ( T )
//arr[1] T T , S / S
int *flagstate;// to see whether leading has already been found for a Non terminal
char **foundlead;//contains the already found leading for a Non terminal
int *NtSymbols;//used to reduce time complexity by storing where the
productions for a non terminal are stored in arr
char **foundtrail;//contains the already found trailing for a Non terminal
int **trailgoesto;//to tell which Non Terminals trailing goes to whose trailing
int **leadgoesto;//to tell which Non Terminals leading goes to whose leading
int strmergeunique(char*dest,const char *source)
{
int strlength=strlen(source),change=0;
for(int i=0;i<strlength;i++)
{
if(!strchr(dest,source[i]))
{
dest[strlen(dest)+1]='\0';
dest[strlen(dest)]=source[i];
change=1;
Lab Manual of Compiler Design

Page 42

Chameli Devi School Of Engineering, Indore


}
}
return change;
}
void leading(int no_of_nonterminals)
{
int nonterminals=0;
char Gamma,str[10]={'\0'};
while(nonterminals<no_of_nonterminals)
{
for(int eachletter=1;arr[nonterminals][eachletter]!='\0';eachletter++)
{
Gamma=arr[nonterminals][eachletter];
if(isupper(Gamma))
{
leadgoesto[ NtSymbols[toascii(Gamma)-65] ][
leadgoesto[NtSymbols[toascii(Gamma)-65]][0]+1 ]=nonterminals;
leadgoesto[ NtSymbols[toascii(Gamma)-65] ][0]++;
continue;
}
else
{
if(Gamma=='\x0')
{break;}
if(Gamma=='/')
{continue;}
str[0]=Gamma;
str[1]='\0';
strmergeunique(foundlead[nonterminals],str);
}
while(arr[nonterminals][eachletter+1]!='\x0'&&
arr[nonterminals][eachletter+1]!='/')
{
eachletter++;
}
}
nonterminals++;
}
int change=0;
Lab Manual of Compiler Design

Page 43

Chameli Devi School Of Engineering, Indore


do
{
change=0;
for(int i=0;i<nonterminals;i++)
{
for(int j=1;j<=leadgoesto[i][0];j++)
{
change|=strmergeunique(foundlead[leadgoesto[i][j]],foundlead[i]);
}
}
}
while(change);
}
void trailing(int no_of_nonterminals)
{
int nonterminals=0;
char Delta,str[10]={'\0'};
while(nonterminals<no_of_nonterminals)
{
int eachletter=strlen(arr[nonterminals])-1;
for(;eachletter>0;eachletter--)
{
Delta=arr[nonterminals][eachletter];
// *******alpha B
if(isupper(Delta))
{
trailgoesto[ NtSymbols[toascii(Delta)-65] ][
trailgoesto[NtSymbols[toascii(Delta)-65]][0]+1 ]=nonterminals;
trailgoesto[ NtSymbols[toascii(Delta)-65] ][0]++;
if(arr[nonterminals][eachletter-1]!='/'&&
eachletter-1>0)
{Delta=arr[nonterminals][eachletter-1];
if(!isupper(Delta))
{
str[0]=Delta;
str[1]='\0';
strmergeunique(foundtrail[nonterminals],str);
}
}
Lab Manual of Compiler Design

Page 44

Chameli Devi School Of Engineering, Indore


}
// B alpha
// ***** alpha
else
{
if(Delta=='/')
{continue;}
str[0]=Delta;
str[1]='\0';
strmergeunique(foundtrail[nonterminals],str);
Delta=arr[nonterminals][eachletter-1];
if(isupper(Delta)&&eachletter-1>0)
{
trailgoesto[ NtSymbols[toascii(Delta)-65] ][
trailgoesto[NtSymbols[toascii(Delta)-65]][0]+1 ]=nonterminals;
trailgoesto[ NtSymbols[toascii(Delta)-65] ][0]++;
}
}
while(eachletter-1>0&&
arr[nonterminals][eachletter-1]!='/')
{
eachletter--;
}
}
nonterminals++;
}
int change=0;
do
{
change=0;
for(int i=0;i<nonterminals;i++)
{
for(int j=1;j<=trailgoesto[i][0];j++)
{
change|=strmergeunique(foundtrail[trailgoesto[i][j]],foundtrail[i]);
}
}
}
while(change);
Lab Manual of Compiler Design

Page 45

Chameli Devi School Of Engineering, Indore


}
void main()
{
int nt;
clrscr();
printf("Enter no.of nonterminals :");
scanf("%d",&nt);
arr=new char*[nt];
foundlead=new char*[nt];
foundtrail=new char*[nt];
flagstate=new int[nt];
leadgoesto=new int*[nt];
trailgoesto=new int*[nt];
NtSymbols=new int[26];
for (int i=0;i<nt;i++)
{
arr[i]=new char[100];
foundlead[i]=new char[10];
memset(foundlead[i],'\0',10);
foundtrail[i]=new char[10];
memset(foundtrail[i],'\0',10);
flagstate[i]=0;
leadgoesto[i]=new int[nt];
leadgoesto[i][0]=0;
trailgoesto[i]=new int[nt];
trailgoesto[i][0]=0;
printf("Enter non terminal ");
cin>>arr[i][0];
flushall();
printf("Enter Production for %c------>",arr[i][0]);
gets(arr[i]+1);
NtSymbols[toascii(arr[i][0])-65]=i;
}
char prod[50];
leading(nt);
trailing(nt);
cout<<endl<<endl;
for(i=0;i<nt;i++)
{
printf("leading (%c)--> { %s }\n",arr[i][0],foundlead[i]);
Lab Manual of Compiler Design

Page 46

Chameli Devi School Of Engineering, Indore


printf("trailing (%c)--> { %s }\n",arr[i][0],foundtrail[i]);
delete arr[i];
delete foundlead[i];
delete foundtrail[i];
delete leadgoesto[i];
delete trailgoesto[i];
}
delete arr;
delete flagstate;
delete foundlead;
delete NtSymbols;
delete foundtrail;
delete trailgoesto;
delete leadgoesto;
}

Lab Manual of Compiler Design

Page 47

Chameli Devi School Of Engineering, Indore

Program #07
Program using LEX to count the number of characters, words, spaces and
Lines in a given input file.

Lexical Analyzer
The main task of the lexical analyzer is to read the input source program, scanning the
characters, and produce a sequence of tokens that the parser can use for syntactic analysis.
The interface may be to be called by the parser to produce one token at a time Maintain
internal state of reading the input program (with lines) Have a function getNextToken that
will read some characters at the current state of the input and return a token to the parser
Other tasks of the lexical analyzer include Skipping or hiding whitespace and comments
Keeping track of line numbers for error reporting Sometimes it can also produce the
annotated lines for error reports Produce the value of the token Optional: Insert identifiers into
the symbol table

Character Level Scanning


The lexical analyzer needs to have a well-defined valid character set Produce invalid
character errors Delete invalid characters from token stream so as not to be used in the
parser analysis
E.g. dont want invisible characters in error messages For every end-of-line, keep track of line
numbers for error reporting Skip over or hide whitespace and comments If comments are
nested (not common), must keep track of nesting to find end of comments May produce
hidden tokens, for convenience of scanner structure Always produce an end-of-file token
Important that quoted strings and comments dont get stuck if an unexpected end of file
occurs

Lab Manual of Compiler Design

Page 48

Chameli Devi School Of Engineering, Indore

Source Code:
%{
int ch=0, bl=0, ln=0, wr=0;
%}
%%
[\n] {ln++;wr++;}
[\t] {bl++;wr++;}
[" "] {bl++;wr++;}
[^\n\t] {ch++;}
%%
int main()
{
FILE *fp;
char file[10];
printf("Enter the filename: ");
scanf("%s", file);
yyin=fp;
yylex();
printf("Character=%d\nBlank=%d\nLines=%d\nWords=%d", ch, bl, ln, wr);
return 0;
}

Lab Manual of Compiler Design

Page 49

Chameli Devi School Of Engineering, Indore

INPUT:
A input file (.doc or any format), counts number of characters, words, spaces and Lines in a
given input file.

OUTPUT:
$cat > input
Girish rao salanke
$lex p1a.l
$cc lex.yy.c ll
$./a.out
Enter the filename: input
Character=16
Blank=2
Lines=1
Word=3

Lab Manual of Compiler Design

Page 50

Chameli Devi School Of Engineering, Indore

Program #08
Program using LEX to count the numbers of comment lines in a given C/
C++/JAVA program. Also eliminate them and copy the resulting program
into separate file.

Compiler-construction tools
Originally, compilers were written from scratch, but now the situation is quite different. A
number of tools are available to ease the burden.
We will study tools that generate scanners and parsers. This will involve us in some theory,
regular expressions for scanners and various grammars for parsers. These techniques are
fairly successful. One drawback can be that they do not execute as fast as hand-crafted
scanners and parsers.
We will also see tools for syntax-directed translation and automatic code generation. The
automation in these cases is not as complete.
Finally, there is the large area of optimization. This is not automated; however, a basic
component of optimization is data-flow analysis (how values are transmitted between parts
of a program) and there are tools to help with this task.

Lexical Analysis (or Scanning)


The character stream input is grouped into meaningful units called lexemes, which are then
mapped into tokens, the latter constituting the output of the lexical analyzer. For example,
any one of the following
x3 = y + 3;
x3 = y + 3 ;
x3 =y+ 3 ;

but not
x 3 = y + 3;

would be grouped into the lexemes x3, =, y, +, 3, and ;.


A token is a <token-name,attribute-value> pair. For example
1. The lexeme x3 would be mapped to a token such as <id,1>. The name id is short for
identifier. The value 1 is the index of the entry for x3 in the symbol table produced by
the compiler. This table is used to pass information to subsequent phases.
2. The lexeme = would be mapped to the token <=>. In reality it is probably mapped to a
pair, whose second component is ignored. The point is that there are many different
identifiers so we need the second component, but there is only one assignment symbol
=.
3. The lexeme y is mapped to the token <id,2>
Lab Manual of Compiler Design

Page 51

Chameli Devi School Of Engineering, Indore


4. The lexeme + is mapped to the token <+>.
5. The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters.
It is mapped to <number,something>, but what is the something. On the one hand
there is only one 3 so we could just use the token <number,3>. However, there can be
a difference between how this should be printed (e.g., in an error message produced
by subsequent phases) and how it should be stored (fixed vs. float vs double). Perhaps
the token should point to the symbol table where an entry for this kind of 3 is stored.
Another possibility is to have a separate numbers table.
6. The lexeme ; is mapped to the token <;>.
Note that non-significant blanks are normally removed during scanning. In C, most blanks are
non-significant. Blanks inside strings are an exception.
Note that we can define identifiers, numbers, and the various symbols and punctuation
without using recursion (compare with parsing below).

Lab Manual of Compiler Design

Page 52

Chameli Devi School Of Engineering, Indore

Source Code:
%{
int com=0;
%}
%%
"/*"[^\n]+"*/" {com++;fprintf(yyout, " ");}
%%
int main()
{
printf("Write a C program\n");
yyout=fopen("output", "w");
yylex();
printf("Comment=%d\n",com);
return 0;
}

Lab Manual of Compiler Design

Page 53

Chameli Devi School Of Engineering, Indore


OUTPUT:
$lex p1b.l
$cc lex.yy.c ll
$./a.out
Write a C program
#include<stdio.h>
int main()
{
int a, b;
/*float c;*/
printf(Hai);
/*printf(Hello);*/
}
[Ctrl-d]
Comment=1
$cat output
#include<stdio.h>
int main()
{
int a, b;
printf(Hai);
}

Lab Manual of Compiler Design

Page 54

Chameli Devi School Of Engineering, Indore

Program #09
Program using LEX to recognize a valid arithmetic expression and to
recognize the identifiers and operators present. Print them separately.
Some Regular Expressions for Flex
\"[^"]*\"

string

"\t"|"\n"\" "

whitespace (most common forms)

[a-zA-Z]
[a-zA-Z_][a-zA-Z0-9_]* identifier: allows a, aX, a45__
[0-9]*"."[0-9]+

allows .5 but not 5.

[0-9]+"."[0-9]*

allows 5. but not .5

[0-9]*"."[0-9]*

allows . by itself !!

The user must supply a lexical analyzer to read the input stream and communicate tokens
(with values, if desired) to the parser. The lexical analyzer is an integer-valued function called
yylex. The function returns an integer, the token number, representing the kind of token read.
If there is a value associated with that token, it should be assigned to the external variable
yylval.
The parser and the lexical analyzer must agree on these token numbers in order for
communication between them to take place. The numbers may be chosen by Yacc, or chosen
by the user. In either case, the ``# define'' mechanism of C is used to allow the lexical
analyzer to return these numbers symbolically. For example, suppose that the token name
DIGIT has been defined in the declarations section of the Yacc specification file. The relevant
portion of the lexical analyzer might look like:
yylex(){
extern int yylval;
int c;
...
c = getchar();
...
switch( c ) {
Lab Manual of Compiler Design

Page 55

Chameli Devi School Of Engineering, Indore


...
case '0':
case '1':
...
case '9':
yylval = c-'0';
return( DIGIT );
...
}
...
The intent is to return a token number of DIGIT, and a value equal to the numerical value of
the digit. Provided that the lexical analyzer code is placed in the programs section of the
specification file, the identifier DIGIT will be defined as the token number associated with the
token DIGIT.
This mechanism leads to clear, easily modified lexical analyzers; the only pitfall is the need to
avoid using any token names in the grammar that are reserved or significant in C or the
parser; for example, the use of token names if or while will almost certainly cause severe
difficulties when the lexical analyzer is compiled. The token name error is reserved for error
handling, and should not be used naively.
As mentioned above, the token numbers may be chosen by Yacc or by the user. In the default
situation, the numbers are chosen by Yacc. The default token number for a literal character is
the numerical value of the character in the local character set. Other names are assigned
token numbers starting at 257.
To assign a token number to a token (including literals), the first appearance of the token
name or literal in the declarations section can be immediately followed by a nonnegative
integer. This integer is taken to be the token number of the name or literal. Names and literals
not defined by this mechanism retain their default definition. It is important that all token
numbers be distinct.

Lab Manual of Compiler Design

Page 56

Chameli Devi School Of Engineering, Indore


For historical reasons, the endmarker must have token number 0 or negative. This token
number cannot be redefined by the user; thus, all lexical analyzers should be prepared to
return 0 or negative as a token number upon reaching the end of their input.
A very useful tool for constructing lexical analyzers is the Lex program developed by Mike
Lesk.[8] These lexical analyzers are designed to work in close harmony with Yacc parsers.
The specifications for these lexical analyzers use regular expressions instead of grammar
rules. Lex can be easily used to produce quite complicated lexical analyzers, but there remain
some languages (such as FORTRAN) which do not fit any theoretical framework, and whose
lexical analyzers must be crafted by hand.

Source Code:
%{
#include<stdio.h>
int a=0,s=0,m=0,d=0,ob=0,cb=0;
int flaga=0, flags=0, flagm=0, flagd=0;
%}
id [a-zA-Z]+
%%
{id} {printf("\n %s is an identifier\n",yytext);}
[+] {a++;flaga=1;}
[-] {s++;flags=1;}
[*] {m++;flagm=1;}
[/] {d++;flagd=1;}
[(] {ob++;}
[)] {cb++;}
%%
int main()
{
printf("Enter the expression\n");
yylex();
if(ob-cb==0)
{
printf("Valid expression\n");
}
else
{
printf("Invalid expression");
}
printf("\nAdd=%d\nSub=%d\nMul=%d\nDiv=%d\n",a,s,m,d);
printf("Operators are: \n");
Lab Manual of Compiler Design

Page 57

Chameli Devi School Of Engineering, Indore


if(flaga)
printf("+\n");
if(flags)
printf("-\n");
if(flagm)
printf("*\n");
if(flagd)
printf("/\n");
return 0;
}

OUTPUT:
$lex p2a.l
$cc lex.yy.c ll
$./a.out
Enter the expression
(a+b*c)
a is an identifier
b is an identifier
c is an identifier
[Ctrl-d]
Valid expression
Add=1
Sub=0
Mul=1
Div=0
Operators are:
+
*

Lab Manual of Compiler Design

Page 58

Chameli Devi School Of Engineering, Indore

Program #13
Program using LEX to recognize whether a given sentence is simple or
compound.
%{
int flag=0;
%}
%%
(""[aA][nN][dD]"")|(""[oO][rR]"")|(""[bB][uU][tT]"") {flag=1;}
%%
int main()
{
printf("Enter the sentence\n");
yylex();
if(flag==1)
printf("\nCompound sentence\n");
else
printf("\nSimple sentence\n");
return 0;
}

Lab Manual of Compiler Design

Page 59

Chameli Devi School Of Engineering, Indore

OUTPUT:
$lex p2b.l
$cc lex.yy.c ll
$./a.out
Enter the sentence
I am Pooja
I am Pooja
[Ctrl-d]
Simple sentence
$./a.out
Enter the sentence
CSE or ISE
CSE or ISE
[Ctrl-d]
Compound sentence

Lab Manual of Compiler Design

Page 60

Chameli Devi School Of Engineering, Indore

Program #14
Program using LEX to recognize and count the number of identifiers in a
given input file.
Lex helps write programs whose control flow is directed by instances of regular expressions in
the input stream. It is well suited for editor-script type transformations and for segmenting
input in preparation for a parsing routine.
Lex source is a table of regular expressions and corresponding program fragments. The table
is translated to a program which reads an input stream, copying it to an output stream and
partitioning the input into strings which match the given expressions. As each such string is
recognized the corresponding program fragment is executed. The recognition of the
expressions is performed by a deterministic finite automaton generated by Lex. The program
fragments written by the user are executed in the order in which the corresponding regular
expressions occur in the input stream.

Source Code:
%{
#include<stdio.h>
int count=0;
%}
op [+-*/]
letter [a-zA-Z]
digitt [0-9]
id {letter}*|({letter}{digitt})+
notid ({digitt}{letter})+
%%
[\t\n]+
("int")|("float")|("char")|("case")|("default")| ("if")|("for")|("printf")|("scanf") {printf("%s is a
keyword\n", yytext);}
{id} {printf("%s is an identifier\n", yytext); count++;}
{notid} {printf("%s is not an identifier\n", yytext);}
%%
int main()
{
FILE *fp;
char file[10];
printf("\nEnter the filename: ");
Lab Manual of Compiler Design

Page 61

Chameli Devi School Of Engineering, Indore


scanf("%s", file);
fp=fopen(file,"r");
yyin=fp;
yylex();
printf("Total identifiers are: %d\n", count);
return 0;
}
OUTPUT:
$cat > input
int
float
78f
90gh
a
d
are case
default
printf
scanf
$lex p3.l
$cc lex.yy.c ll
$./a.out
Enter the filename: input
int is a keyword
float is a keyword
78f is not an identifier
90g is not an identifier
h is an identifier
a is an identifier
d is an identifier
are is an identifier
case is a keyword
default is a keyword
printf is a keyword
scanf is a keyword
total identifiers are: 4

Lab Manual of Compiler Design

Page 62

Chameli Devi School Of Engineering, Indore

Program #15
YACC

(Yet Another Compiler Compiler ) program to recognize a valid

arithmetic expression that uses operators +, -, * and /.

Basic Specifications
Names refer to either tokens or nonterminal symbols. Yacc requires token names to be
declared as such. In addition, for reasons discussed in Section 3, it is often desirable to
include the lexical analyzer as part of the specification file; it may be useful to include other
programs as well. Thus, every specification file consists of three sections: the declarations,
(grammar) rules, and programs. The sections are separated by double percent ``%%'' marks.
(The percent ``%'' is generally used in Yacc specifications as an escape character.)
In other words, a full specification file looks like
declarations
%%
rules
%%
programs
The declaration section may be empty. Moreover, if the programs section is omitted, the
second %% mark may be omitted also;
thus, the smallest legal Yacc specification is
%%
rules
Blanks, tabs, and newlines are ignored except that they may not appear in names or multicharacter reserved symbols. Comments may appear wherever a name is legal; they are
enclosed in /* . . . */, as in C and PL/I.
The rules section is made up of one or more grammar rules. A grammar rule has the form:
A : BODY ;
A represents a nonterminal name, and BODY represents a sequence of zero or more names
and literals. The colon and the semicolon are Yacc punctuation.
Names may be of arbitrary length, and may be made up of letters, dot ``.'', underscore ``_'',
and non-initial digits. Upper and lower case letters are distinct. The names used in the body of
a grammar rule may represent tokens or nonterminal symbols.

Lab Manual of Compiler Design

Page 63

Chameli Devi School Of Engineering, Indore


A literal consists of a character enclosed in single quotes ``'''. As in C, the backslash ``\'' is an
escape character within literals, and all the C escapes are recognized. Thus
'\n' newline
'\r' return
'\'' single quote ``'''
'\\' backslash ``\''
'\t' tab
'\b' backspace
'\f' form feed
'\xxx' ``xxx'' in octal
For a number of technical reasons, the NUL character ('\0' or 0) should never be used in
grammar rules.
If there are several grammar rules with the same left hand side, the vertical bar ``|'' can be
used to avoid rewriting the left hand side. In addition, the semicolon at the end of a rule can
be dropped before a vertical bar. Thus the grammar rules
A
A
A

:
:
:

B C D ;
E F ;
G ;

can be given to Yacc as


A

:
|
|
;

B C D
E F
G

It is not necessary that all grammar rules with the same left side appear together in the
grammar rules section, although it makes the input much more readable, and easier to
change.
If a nonterminal symbol matches the empty string, this can be indicated in the obvious way:
empty : ;
Names representing tokens must be declared; this is most simply done by writing
%token name1 name2 . . .
in the declarations section. (See Sections 3 , 5, and 6 for much more discussion). Every name
not defined in the declarations section is assumed to represent a nonterminal symbol. Every
nonterminal symbol must appear on the left side of at least one rule.
Of all the nonterminal symbols, one, called the start symbol, has particular importance. The
parser is designed to recognize the start symbol; thus, this symbol represents the largest,
most general structure described by the grammar rules. By default, the start symbol is taken
to be the left hand side of the first grammar rule in the rules section. It is possible, and in fact
Lab Manual of Compiler Design

Page 64

Chameli Devi School Of Engineering, Indore


desirable, to declare the start symbol explicitly in the declarations section using the %start
keyword:
%start symbol
The end of the input to the parser is signaled by a special token, called the endmarker. If the
tokens up to, but not including, the endmarker form a structure which matches the start
symbol, the parser function returns to its caller after the endmarker is seen; it accepts the
input. If the endmarker is seen in any other context, it is an error.
It is the job of the user-supplied lexical analyzer to return the endmarker when appropriate;
see section 3, below. Usually the endmarker represents some reasonably obvious I/O status,
such as ``end-of-file'' or ``end-of-record''.
2: Actions
With each grammar rule, the user may associate actions to be
performed each time the rule is recognized in the input process. These actions may return
values, and may obtain the values returned by previous actions. Moreover, the lexical
analyzer can return values for tokens, if desired.
An action is an arbitrary C statement, and as such can do input and output, call subprograms,
and alter external vectors and variables. An action is specified by one or more statements,
enclosed in curly braces ``{'' and ``}''. For example,
A

'(' B ')'
{

hello( 1, "abc" ); }

and
XXX

YYY ZZZ
{
printf("a message\n");
flag = 25; }

are grammar rules with actions.

Lab Manual of Compiler Design

Page 65

Chameli Devi School Of Engineering, Indore


Source Code:
LEX
%{
#include"y.tab.h"
extern yylval;
%}
%%
[0-9]+ {yylval=atoi(yytext); return NUMBER;}
[a-zA-Z]+ {return ID;}
[\t]+ ;
\n {return 0;}
. {return yytext[0];}
%%
YACC
%{
#include<stdio.h>
%}
%token NUMBER ID
%left '+' '-'
%left '*' '/'
%%
expr: expr '+' expr
|expr '-' expr
|expr '*' expr
|expr '/' expr
|'-'NUMBER
|'-'ID
|'('expr')'
|NUMBER
|ID
;
%%
main()
{
printf("Enter the expression\n");
yyparse();
printf("\nExpression is valid\n");
exit(0);
}
int yyerror(char *s)
{
printf("\nExpression is invalid");
exit(0);
}
Lab Manual of Compiler Design

Page 66

Chameli Devi School Of Engineering, Indore

OUTPUT:
$lex p4a.l
$yacc d p4a.y
$cc lex.yy.c y.tab.c ll
$./a.out
Enter the expression
(a*b+5)
Expression is valid
$./a.out
Enter the expression
(a+6-)
Expression is invalid

Lab Manual of Compiler Design

Page 67

Chameli Devi School Of Engineering, Indore

Program #15
YACC (Yet Another Compiler Compiler ) program to recognize a valid
variable, which starts with a letter, followed by any number of letters or
digits.
Yacc turns the specification file into a C program, which parses the input according to
the specification given. The algorithm used to go from the specification to the parser is
complex, and will not be discussed here (see the references for more information). The
parser itself, however, is relatively simple, and understanding how it works, while not
strictly necessary, will nevertheless make treatment of error recovery and ambiguities
much more comprehensible.
Source Code:
LEX
%{
#include"y.tab.h"
extern yylval;
%}
%%
[0-9]+ {yylval=atoi(yytext); return DIGIT;}
[a-zA-Z]+ {return LETTER;}
[\t] ;
\n return 0;
. {return yytext[0];}
%%
YACC
%{
#include<stdio.h>
%}
%token LETTER DIGIT
%%
variable: LETTER|LETTER rest
;
rest: LETTER rest
|DIGIT rest
|LETTER
|DIGIT
;
%%
Lab Manual of Compiler Design

Page 68

Chameli Devi School Of Engineering, Indore


main()
{
yyparse();
printf("The string is a valid variable\n");
}
int yyerror(char *s)
{
printf("this is not a valid variable\n");
exit(0);
}

OUTPUT:
$lex p4b.l
$yacc d p4b.y
$cc lex.yy.c y.tab.c ll
$./a.out
input34
The string is a valid variable
$./a.out
89file
This is not a valid variable

Lab Manual of Compiler Design

Page 69

Chameli Devi School Of Engineering, Indore

Program #16
Implement a program of YACC (Yet Another Compiler Compiler ) to recognize
strings aaab, abbb, ab and a using the grammar (anbn, n>= 0).
Yacc: Yet Another Compiler-Compiler
Yacc provides a general tool for imposing structure on the input to a computer program. The
Yacc user prepares a specification of the input process; this includes rules describing the
input structure, code to be invoked when these rules are recognized, and a low-level routine
to do the basic input. Yacc then generates a function to control the input process. This
function, called a parser, calls the user-supplied low-level input routine (the lexical analyzer)
to pick up the basic items (called tokens) from the input stream. These tokens are organized
according to the input structure rules, called grammar rules; when one of these rules has
been recognized, then user code supplied for this rule, an action, is invoked; actions have the
ability to return values and make use of the values of other actions.
Yacc is written in a portable dialect of C[1] and the actions, and output subroutine, are in C as
well. Moreover, many of the syntactic conventions of Yacc follow C.
The heart of the input specification is a collection of grammar rules. Each rule describes an
allowable structure and gives it a name. For example, one grammar rule might be
date : month_name day ',' year ;
Here, date, month_name, day, and year represent structures of interest in the input process;
presumably, month_name, day, and year are defined elsewhere. The comma ``,'' is enclosed
in single quotes; this implies that the comma is to appear literally in the input. The colon and
semicolon merely serve as punctuation in the rule, and have no significance in controlling the
input. Thus, with proper definitions, the input
July 4, 1776
might be matched by the above rule.

Lab Manual of Compiler Design

Page 70

Chameli Devi School Of Engineering, Indore


An important part of the input process is carried out by the lexical analyzer. This user routine
reads the input stream, recognizing the lower level structures, and communicates these
tokens to the parser. For historical reasons, a structure recognized by the lexical analyzer is
called a terminal symbol, while the structure recognized by the parser is called a nonterminal
symbol. To avoid confusion, terminal symbols will usually be referred to as tokens.
There is considerable leeway in deciding whether to recognize structures using the lexical
analyzer or grammar rules. For example, the rules
month_name : 'J' 'a' 'n' ;
month_name : 'F' 'e' 'b' ;
...
month_name : 'D' 'e' 'c' ;
might be used in the above example. The lexical analyzer would only need to recognize
individual letters, and month_name would be a nonterminal symbol. Such low-level rules tend
to waste time and space, and may complicate the specification beyond Yacc's ability to deal
with it. Usually, the lexical analyzer would recognize the month names, and return an
indication that a month_name was seen; in this case, month_name would be a token.
Literal characters such as ``,'' must also be passed through the lexical analyzer, and are also
considered tokens.
Specification files are very flexible. It is realively easy to add to the above example the rule
date : month '/' day '/' year ;
allowing
7 / 4 / 1776
as a synonym for
July 4, 1776

Lab Manual of Compiler Design

Page 71

Chameli Devi School Of Engineering, Indore


In most cases, this new rule could be ``slipped in'' to a working system with minimal effort,
and little danger of disrupting existing input.
The input being read may not conform to the specifications. These input errors are detected
as early as is theoretically possible with a left-to-right scan; thus, not only is the chance of
reading and computing with bad input data substantially reduced, but the bad data can usually
be quickly found. Error handling, provided as part of the input specifications, permits the
reentry of bad data, or the continuation of the input process after skipping over the bad data.
In some cases, Yacc fails to produce a parser when given a set of specifications. For
example, the specifications may be self contradictory, or they may require a more powerful
recognition mechanism than that available to Yacc. The former cases represent design errors;
the latter cases can often be corrected by making the lexical analyzer more powerful, or by
rewriting some of the grammar rules. While Yacc cannot handle all possible specifications, its
power compares favorably with similar systems; moreover, the constructions which are
difficult for Yacc to handle are also frequently difficult for human beings to handle. Some
users have reported that the discipline of formulating valid Yacc specifications for their input
revealed errors of conception or design early in the program development.

Source Code:
LEX
%{
#include"y.tab.h"
%}
%%
[a] return A;
[b] return B;
%%
YACC
%{
#include<stdio.h>
%}
%token A B
%%
S:A S B
|
Lab Manual of Compiler Design

Page 72

Chameli Devi School Of Engineering, Indore


;
%%
main()
{
printf("Enter the string\n");
if(yyparse()==0)
{
printf("Valid\n");
}
}
yyerror(char *s)
{
printf("%s\n",s);
}
OUTPUT:
$lex p5b.l
$yacc d p5b.y
$cc lex.yy.c y.tab.c ll
$./a.out
Enter the string
aabb
[Ctrl-d]
Valid
$./a.out
Enter the string
aab
syntax error

Lab Manual of Compiler Design

Page 73

Chameli Devi School Of Engineering, Indore

Program #17
Program to recognize the Context free grammar (anbn, n>= 10), Where a & b
are input symbols of the grammar.
A context-free grammar (CFG) is a set of recursive rewriting rules (or productions)
used to generate patterns of strings.

A CFG consists of the following components:

a set of terminal symbols, which are the characters of the alphabet that appear in the strings
generated by the grammar.

a set of nonterminal symbols, which are placeholders for patterns of terminal symbols that can
be generated by the nonterminal symbols.

a set of productions, which are rules for replacing (or rewriting) nonterminal symbols (on the
left side of the production) in a string with other nonterminal or terminal symbols (on the right
side of the production).

a start symbol, which is a special nonterminal symbol that appears in the initial string generated
by the grammar.

To generate a string of terminal symbols from a CFG, we:


Begin with a string consisting of the start symbol;
Apply one of the productions with the start symbol on the left hand size, replacing the start
symbol with the right hand side of the production;
Repeat the process of selecting nonterminal symbols in the string, and replacing them with the
right hand side of some corresponding production, until all nonterminals have been replaced by
terminal symbols.
Lab Manual of Compiler Design

Page 74

Chameli Devi School Of Engineering, Indore

Finding all the Strings Generated by a CFG


There are several ways to generate the (possibly infinite) set of strings generated by a grammar. We
will show a technique based on the number of productions used to generate the string.
Find the strings generated by the following CFG:
<S> --> w c d <S> | b <L> e | s
<L> --> <L> ; <S> | <S>

0. Applying at most zero productions, we cannot generate any strings.


1. Applying at most one production (starting with the start symbol) we can generate {wcd<S>, b<L>e,
s}. Only one of these strings consists entirely of terminal symbols, so the set of terminal strings we can
generate using at most one production is {s}.
2. Applying at most two productions, we can generate all the strings we can generate with one
production, plus any additional strings we can generate with an additional production.
{wcdwcd<S>, wcdb<L>e, wcds, b<S>e, b<L>;<S>e,s}

The set of terminal strings we can generate with at most two productions is therefore {s, wcds}.
3. Applying at most three productions, we can generate:
{wcdwcdwcd<S>, wcdwcdb<L>e, wcdwcds, wcdb<L>;<S>e,
wcdb<S>e, bwcd<S>e, bb<L>ee, bse, b<L>;<S>Se,
b<S><S>e, b<L>wcd<S>e, b<L>b<L>ee, b<L>se }

The set of terminal strings we can generate with at most three


productions is therefore {s, wcds, wcdwcds, bse}.

Lab Manual of Compiler Design

Page 75

Chameli Devi School Of Engineering, Indore


We can repeat this process for an arbitrary number of steps N, and find all the strings the grammar can
generate by applying N productions.

Source Code:
LEX
%{
#include"y.tab.h"
%}
%%
[a] return A;
[b] return B;
%%
YACC
%{
#include<stdio.h>
%}
%token A B
%%
stat:exp B
;
exp:A A A A A A A A A exp1
;
exp1:A exp2
|A
|A A exp2
|A A A exp2
|A A A A exp2
;
exp2:A
;
%%
main()
{
printf("Enter the string\n");
if(yyparse()==0)
{
printf("Valid\n");
}
}
yyerror(char *s)
{
printf("error\n");
Lab Manual of Compiler Design

Page 76

Chameli Devi School Of Engineering, Indore


}

OUTPUT:
$lex p6.l
$yacc d p6.y
$cc lex.yy.c y.tab.c ll
$./a.out
Enter the string
aaaaaaaaaaab
Valid
$./a.out
Enter the string
aab
error

Lab Manual of Compiler Design

Page 77

Chameli Devi School Of Engineering, Indore

Program #18
Write a C program to implement the syntax-directed definition of if E then
S1 and if E then S1 else S2.
/* Input to the program is assumed to be syntactically correct. The expression of if statement,
for true condition and statement for false condition are enclosed in parenthesis */
Some programming languages permit the user to use words like ``if'', which are normally
reserved, as label or variable names, provided that such use does not conflict with the legal
use of these names in the programming language. This is extremely hard to do in the
framework of Yacc; it is difficult to pass information to the lexical analyzer telling it ``this
instance of `if' is a keyword, and that instance is a variable''. The user can make a stab at it,
using the mechanism described in the last subsection, but it is difficult.
A number of ways of making this easier are under advisement. Until then, it is better that the
keywords be reserved; that is, be forbidden for use as variable names. There are powerful
stylistic reasons for preferring this, anyway.
10: Advanced Topics
This section discusses a number of advanced features of Yacc.
Simulating Error and Accept in Actions
The parsing actions of error and accept can be simulated in an action by use of macros
YYACCEPT and YYERROR. YYACCEPT causes yyparse to return the value 0; YYERROR
causes the parser to behave as if the current input symbol had been a syntax error; yyerror is
called, and error recovery takes place. These mechanisms can be used to simulate parsers
with multiple endmarkers or context-sensitive syntax checking.
Accessing Values in Enclosing Rules.
An action may refer to values returned by actions to the left of the current rule. The
mechanism is simply the same as with ordinary actions, a dollar sign followed by a digit, but in
this case the digit may be 0 or negative. Consider
sent

adj noun verb adj noun


{ look at the sentence . . . }

;
adj

:
THE
|
YOUNG {
...
;

$$ = THE; }
$$ = YOUNG; }

Lab Manual of Compiler Design

Page 78

Chameli Devi School Of Engineering, Indore


noun

:
|

DOG
{
$$ = DOG; }
CRONE
{
if( $0 == YOUNG ){
printf( "what?\n" );
}
$$ = CRONE;
}

;
...
In the action following the word CRONE, a check is made that the preceding token shifted
was not YOUNG. Obviously, this is only possible when a great deal is known about what
might precede the symbol noun in the input. There is also a distinctly unstructured flavor
about this. Nevertheless, at times this mechanism will save a great deal of trouble, especially
when a few combinations are to be excluded from an otherwise regular structure.

Source Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int parsecondition(char[],int,char*,int);
void gen(char [],char [],char[],int);
int main()
{
int counter = 0,stlen =0,elseflag=0;
char stmt[60]; // contains the input statement
char strB[54]; // holds the expression for 'if'
condition
char strS1[50]; // holds the statement for true
condition
char strS2[45]; // holds the statement for false
condition
printf("Format of if statement \n Example...\n");
printf("if (a<b) then (s=a);\n");
printf("if (a<b) then (s=a) else (s=b);\n\n");
printf("Enter the statement \n");
gets(stmt);
stlen = strlen(stmt);
counter = counter + 2; // increment over 'if'
counter = parsecondition(stmt,counter,strB,stlen);
if(stmt[counter]==')')
Lab Manual of Compiler Design

Page 79

Chameli Devi School Of Engineering, Indore


counter++;
counter = counter + 3; // increment over 'then'
counter = parsecondition(stmt,counter,strS1,stlen);
if(stmt[counter+1]==';')
{ //reached end of statement, generate the output
printf("\n Parsing the input statement....");
gen(strB,strS1,strS2,elseflag);
return 0;
}
if(stmt[counter]==')')
counter++; // increment over ')'
counter = counter + 3; // increment over 'else'
counter = parsecondition(stmt,counter,strS2,stlen);
counter = counter + 2; // move to the end of
statement
if(counter == stlen)
{ //generate the output
elseflag = 1;
printf("\n Parsing the input statement....");
gen(strB,strS1,strS2,elseflag);
return 0;
}
return 0;
}
/* Function : parsecondition
Description : This function parses the statement
from the given index to get the statement enclosed
in ()
Input : Statement, index to begin search, string
to store the condition, total string length
Output : Returns 0 on failure, Non zero counter
value on success
*/
int parsecondition(char input[],int cntr,char
*dest,int totallen)
{
int index = 0,pos = 0;
while(input[cntr]!= '(' && cntr <= totallen)
cntr++;
if(cntr >= totallen)
return 0;
index = cntr;
while (input[cntr]!=')')
cntr++;
if(cntr >= totallen)
Lab Manual of Compiler Design

Page 80

Chameli Devi School Of Engineering, Indore


return 0;
while(index<=cntr)
dest[pos++] = input[index++];
dest[pos]='\0'; //null terminate the string
return cntr; //non zero value
}
/* Function : gen ()
Description : This function generates three
address code
Input : Expression, statement for true condition,
statement for false condition, flag to denote if
the 'else' part is present in the statement
output :Three address code
*/
void gen(char B[],char S1[],char S2[],int elsepart)
{
int Bt =101,Bf = 102,Sn =103;
printf("\n\tIf %s goto %d",B,Bt);
printf("\n\tgoto %d",Bf);
printf("\n%d: ",Bt);
printf("%s",S1);
if(!elsepart)
printf("\n%d: ",Bf);
else
{ printf("\n\tgoto %d",Sn);
printf("\n%d: %s",Bf,S2);
printf("\n%d:",Sn);
}
}

Lab Manual of Compiler Design

Page 81

Chameli Devi School Of Engineering, Indore


OUTPUT
Format of if statement
Example ...
if (a<b) then (s=a);
if (a<b) then (s=a) else (s=b);
Enter the statement
if (a<b) then (x=a) else (x=b);
Parsing the input statement....
If (a<b) goto 101
goto 102
101: (x=a)
goto 103
102: (x=b)
103:

Lab Manual of Compiler Design

Page 82

Das könnte Ihnen auch gefallen