BHIMKAYA

LEVEL-0-DFD (DATA FLOW DIAGRAM)
SOURCE CODE
NFA
INPUT OUTPUT
(SOURCE CODE) (TOKENS)
generati
on
SRS
(SOFTWARE REQUIREMENT
SPECIFICATION)
1. INTRODUCTION
COMPILER
Simply stated, a compiler is a program that reads a program
written in one language-the source language-and translates it
into an equivalent program in another language-the target
language. As an important part of this translation process, the
compiler reports to its user the presence of errors in the source
program.
Source
Compiler Target
program program
Error messages
Compilers are sometimes classified as single-pass, multi-pass,

load-and-go, debugging, or optimizing, depending on how they
have been constructed or on what function they are supposed to
perform. Despite this apparent complexity, the basic tasks that
any compiler must perform are essentially the same.
THE PHASES OF A COMPILER
Conceptually, a compiler operates in phases, each of which

transforms the source program from one representation to
another.
The first three phases, forms the bulk of the analysis portion of a
compiler. Symbol table management and error handling, are
shown interacting with the six phases.
Symbol table management

An essential function of a compiler is to record the identifiers
used in the source program and collect information about various
attributes of each identifier. A symbol table is a data structure
containing a record for each identifier, with fields for the
attributes of the identifier. The data structure allows us to find
the record for each identifier quickly and to store or retrieve data
from that record quickly. When an identifier in the source
program is detected by the lex analyzer, the identifier is entered
into the symbol table.
Error Detection and Reporting

Each phase can encounter errors. A compiler that stops when it
finds the first error is not as helpful as it could be.
The syntax and semantic analysis phases usually handle a large

fraction of the errors detectable by the compiler. The lexical
phase can detect errors where the characters remaining in the
input do not form any token of the language. Errors when the
token stream violates the syntax of the language are determined
by the syntax analysis phase. During semantic analysis the
compiler tries to detect constructs that have the right syntactic
structure but no meaning to the operation involved.
THE ANALYSIS PHASES:-
As translation progresses , the compiler’s internal representation
of the source program changes. Consider the statement,
position := initial + rate * 10
The lexical analysis phase reads the characters in the source

pgm and groups them into a stream of tokens in which each
token represents a logically cohesive sequence of characters,
such as an identifier, a keyword etc. The character sequence
forming a token is called the lexeme for the token. Certain tokens
will be augmented by a ‘lexical value’. For example, for any
identifier the lex analyzer generates not only the token id but also
enter s the lexeme into the symbol table, if it is not already
present there. The lexical value associated this occurrence of id
points to the symbol table entry for this lexeme. The
representation of the statement given above after the lexical
analysis would be:
id1: = id2 + id3 * 10
Syntax analysis imposes a hierarchical structure on the token

stream, which is shown by syntax trees.
THE SYNTHESIS PHASES:-

Intermediate Code Generation
After syntax and semantic analysis, some compilers generate an
explicit intermediate representation of the source program. This
intermediate representation can have a variety of forms.
In three-address code, the source pgm might look like this,

temp1: = inttoreal (10)
temp2: = id3 * temp1
temp3: = id2 + temp2
id1: = temp3
Code Optimisation
The code optimization phase attempts to improve the
intermediate code, so that faster running machine codes will
result. Some optimizations are trivial. There is a great variation in
the amount of code optimization different compilers perform. In
those that do the most, called ‘optimising compilers’, a significant
fraction of the time of the compiler is spent on this phase.
Code Generation
The final phase of the compiler is the generation of target code,
consisting normally of relocatable machine code or assembly
code. Memory locations are selected for each of the variables
used by the program. Then, intermediate instructions are each
translated into a sequence of machine instructions that perform
the same task. A crucial aspect is the assignment of variables to
registers.
LEXICAL ANALYSIS
Lexical analysis is the process of converting a sequence of
characters into a sequence of tokens. Programs performing lexical
analysis are called lexical analyzers or lexers. A lexer is often
organized as separate scanner and tokenizer functions, though
the boundaries may not be clearly defined.
The purpose of the lexical analyzer is to partition the input text,

delivering a sequence of comments and basic symbols.
Comments are character sequences to be ignored, while basic
symbols are character sequences that correspond to terminal
symbols of the grammar defining the phrase structure of the
input.
A simple way to build lexical analyzer is to construct a diagram

that illustrates the structure of the tokens of the source language,
and then to hand-translate the diagram into a program for finding
tokens. Efficient lexical analysers can be produced in this manner.
Role of a Lexical Analyzer

The lexical analyzer is the first phase of compiler. Its main task is
to read the input characters and produce as output a sequence of
tokens that the parser uses for syntax analysis. As in the figure,
upon receiving a “get next token” command from the parser the
lexical analyzer reads input characters until it can identify the
next token.
token
Lexical
Source program
analyser Parser
get next
token
Symbol
table
Fig. Interaction of lexical analyzer with parser.
Since the lexical analyzer is the part of the compiler that reads
the source text, it may also perform certain secondary tasks at
the user interface. One such task is stripping out from the source
program comments and white space in the form of blank, tab,
and new line character. Another is correlating error messages
from the compiler with the source program.
Issues in Lexical Analysis

There are several reasons for separating the analysis phase of
compiling into lexical analysis and parsing.
1) Simpler design is the most important consideration. The

separation of lexical analysis from syntax analysis often allows us
to simplify one or the other of these phases.
2) Compiler efficiency is improved.
3) Compiler portability is enhanced.

Tokens Patterns and Lexemes
There is a set of strings in the input for which the same token is
produced as output. This set of strings is described by a rule
called a pattern associated with the token. The pattern is set to
match each string in the set. A lexeme is a sequence of
characters in the source program that is matched by the pattern
for the token. For example in the Pascal’s statement const pi =
3.1416; the substring pi is a lexeme for the token identifier.
In most programming languages, the following constructs are

treated as tokens: keywords, operators, identifiers, constants,
literal strings, and punctuation symbols such as parentheses,
commas, and semicolons.
SAMPLE INFORMAL DESCRIPTION OF

LEXEMES PATTERN
const Const const
if if if
relation <,<=,=,<>,>, < or <= or = or <> or >= or
id >= >
num pi,count,D2 letter followed by letters and
literal 3.1416,0,6.02E digits
23 any numeric constant
“core dumped” any characters between “ and
“ except”
In the example when the character sequence pi appears in the

source program, the token representing an identifier is returned
to the parser. The returning of a token is often implemented by
passing and integer corresponding to the token. It is this integer
that is referred to as bold face id in the above table.
A pattern is a rule describing a set of lexemes that can represent

a particular token in source program. The pattern for the token
const in the above table is just the single string const that spells
out the keyword.
Certain language conventions impact the difficulty of lexical

analysis. Languages such as FORTRAN require a certain
constructs in fixed positions on the input line. Thus the alignment
of a lexeme may be important in determining the correctness of a
source program.
Attributes of Token
The lexical analyzer returns to the parser a representation for the
token it has found. The representation is an integer code if the
token is a simple construct such as a left parenthesis, comma, or
colon .The representation is a pair consisting of an integer code
and a pointer to a table if the token is a more complex element
such as an identifier or constant .The integer code gives the
token type, the pointer points to the value of that token .Pairs are
also retuned whenever we wish to distinguish between instances
of a token.
Regular Expressions
In Pascal, an identifier is a letter followed by zero or more letters

or digits. Regular expressions allow us to define precisely sets
such as this. With this notation, Pascal identifiers may be defined
as
letter (letter | digit)*
The vertical bar here means “or” , the parentheses are used to
group subexpressions, the star means “ zero or more instances
of” the parenthesized expression, and the juxtaposition of letter
with remainder of the expression means concatenation.
A regular expression is built up out of simpler regular expressions
using set of defining rules. Each regular expression r denotes a
language L(r). The defining rules specify how L(r) is formed by
combining in various ways the languages denoted by the
subexpressions of r.
Recognition of Tokens
The question of how to recognize the tokens is handled in this
section. The language generated by the following grammar is
used as an example.
Consider the following grammar fragment:
stmt àif expr then stmt
|if expr then stmt else stmt
|
expràterm relop term
|term
termàid
|num
where the terminals if , then, else, relop, id and num generate

sets of strings given by the following regular definitions:
ifà if
thenà ten
elseà else
relopà <|<=|=|<>|>|>=
idàletter(letter|digit)*
numàdigit+ (.digit+)?(E(+|-)?digit+)?
For this language fragment the lexical analyzer will

recognize the keywords if, then, else, as well as the lexemes
denoted by relop, id, and num. To simplify matters, we assume
keywords are reserved; that is, they cannot be used as
identifiers. Unsigned integer and real numbers of Pascal are
represented by num.
In addition, we assume lexemes are separated by white space,

consisting of nonnull sequences of blanks, tabs and newlines. Our
lexical analyzer will strip out white space. It will do so by
comparing a string against the regular definition ws, below.
delimàblank|tab|newline
wsàdelim+
If a match for ws is found, the lexical analyzer does not return a

token to the parser. Rather, it proceeds to find a token following
the white space and returns that to the parser. Our goal is to
construct a lexical analyzer that will isolate the lexeme for the
next token in the input buffer and produce as output a pair
consisting of the appropriate token and attribute value, using the
translation table given in the figure. The attribute values for the
relational operators are given by the symbolic constants
LT,LE,EQ,NE,GT,GE.
Transition diagram
A transition diagram is a stylized flowchart. Transition diagram is

used to keep track of information about characters that are seen
as the forward pointer scans the input. We do so by moving from
position to position in the diagrams as characters are read.
Positions in a transition diagram are drawn as circles and are

called states. The states are connected by arrow, called edges.
Edges leaving state s have labels indicating the input characters
that can next appear after the transition diagram has reached
state s. the label other refers to any character that is not
indicated by any of the other edges leaving s.
One state is labeled as the start state; it is the initial state of the
transition diagram where control resides when we begin to
recognize a token. Certain states may have actions that are
executed when the flow of control reaches that state. On entering
a state we read the next input character if there is and edge from
the current state whose label matches this input character, we
then go to the state pointed to by the edge. Otherwise we
indicate failure. A transition diagram for >= is shown in the
figure.
star
t
0 6 7
other
Fig 5. Transition diagram

for >=
A recognizer for a language is a program that takes as input a

string x and answers ‘yes’ if a sentence of the language and ‘no’
otherwise. We compile a regular expression into a recognizer by
constructing a transition diagram called finite automation. A finite
automation can be deterministic or non deterministic where non
deterministic means that more than one transition out of a state
may be possible out of a state may be possible on a same input
symbol.
DFA s are faster recognizers than nfas but can be much bigger
than equivalent NFAs.
Non deterministic finite automata

A mathematical model consisting :
1) a set of states S
2) input alphabet
3) transition function
4) initial state
5) final state
Lexical grammar
The specification of a programming language will include a set of
rules, often expressed syntactically, specifying the set of possible
character sequences that can form a token or lexeme. The
whitespace characters are often ignored during lexical analysis.
Token
A token is a categorized block of text. The block of text
corresponding to the token is known as a lexeme. A lexical
analyzer processes lexemes to categorize them according to
function, giving them meaning. This assignment of meaning is
known as tokenization. A token can look like anything; it just
needs to be a useful part of the structured text.
Consider this expression in the C programming language:
sum=3+2;
Tokenized in the following table:

lexe
token type
me
sum IDENTIFIER
= OPERATOR
3 CONSTANT
+ OPERATOR
2 CONSTANT
SPECIAL
;
CHARACTER
Tokens are frequently defined by regular expressions,

which are understood by a lexical analyzer generator
such as lex. The lexical analyzer (either generated
automatically by a tool like lex, or hand-crafted) reads in
a stream of characters, identifies the lexemes in the
stream, and categorizes them into tokens. This is called
"tokenizing." If the lexer finds an invalid token, it will
report an error.
Following tokenizing is parsing. From there, the

interpreted data may be loaded into data structures, for
general use, interpretation, or compiling.
2. PURPOSE
The first phase of the construction of a compiler is the

generation of a LEXICAL ANALYZER. The project aims at
building such an analyzer.
The program has been made on the concept of NFA (Non

Deterministic Finite Automata) for recognizing the
characters in the input and classifying them as tokens.
NFA (Non Deterministic Finite Automata) is a

mathematical model consisting:
1) a set of states (S)
2) input alphabet (∑)
3) transition function (δ)
4) initial state (q0)
5) set of final states (F)
A lexical analyzer generator creates a lexical analyser using a set

of specifications usually in the format
p1 {action 1}
p2 {action 2}
............
pn {action n}
where pi is a regular expression and each action actioni is a
program fragment that is to be executed whenever a lexeme
matched by pi is found in the input. If more than one pattern
matches, then longest lexeme matched is chosen. If there are
two or more patterns that match the longest lexeme, the first
listed matching pattern is chosen.
This is usually implemented using a finite automaton. There is an

input buffer with two pointers to it, a lexeme-beginning and a
forward pointer. The lexical analyser generator constructs a
transition table for a finite automaton from the regular expression
patterns in the lexical analyser generator specification. The lexical
analyser itself consists of a finite automaton simulator that uses
this transition table to look for the regular expression patterns in
the input buffer.
This can be implemented using an NFA or a DFA. The transition

table for an NFA is considerably smaller than that for a DFA, but
the DFA recognises patterns faster than the NFA.
Using NFA
The transition table for the NFA N is constructed for the

composite pattern p1|p2|. . .|pn, The NFA recognises the longest
prefix of the input that is matched by a pattern. In the final NFA,
there is an accepting state for each pattern pi. The sequence of
steps the final NFA can be in is after seeing each input character
is constructed. The NFA is simulated until it reaches termination
or it reaches a set of states from which there is no transition
defined for the current input symbol. The specification for the
lexical analyser generator is so that a valid source program
cannot entirely fill the input buffer without having the NFA reach
termination. To find a correct match two things are done. Firstly,
whenever an accepting state is added to the current set of states,
the current input position and the pattern pi is recorded
corresponding to this accepting state. If the current set of states
already contains an accepting state, then only the pattern that
appears first in the specification is recorded. Secondly, the
transitions are recorded until termination is reached. Upon
termination, the forward pointer is retracted to the position at
which the last match occurred. The pattern making this match
identifies the token found, and the lexeme matched is the string
between the lexeme beginning and forward pointers. If no
pattern matches, the lexical analyser should transfer control to
some default recovery routine.
3. REQUIREMENTS
HARDWARE SPECIFICATIONS
• minimum 128 mb RAM
• Pentium processor
SOFTWARE SPECIFICATIONS
• Operating system(windows xp/vista(64-
bit)/me/2000/98)
• turbo C/C++ IDE

CODING
/* Program on lexical analysis */
#include<stdio.h>
#include<conio.h>
#include<graphics.h>
#include<ctype.h>
#include<string.h>
#define MAX 30
void first()
{
int gd=DETECT,gm;
initgraph(&gd,&gm,"c:\\tc\\bgi");
setcolor(GREEN);
settextstyle(10,0,7);
outtextxy(130,50,"LEXICAL");
setcolor(YELLOW);
outtextxy(90,190,"ANALYSIS");
getch();
restorecrtmode();
}
void second ()
{
char str[MAX];
int gdriver=DETECT, gmod;
initgraph(&gdriver,&gmod,"c:\\tc\\bgi");
setcolor(RED);
rectangle(20,85,615,435);
rectangle(25,90,610,430);
setcolor(GREEN);
outtextxy(30,30,"SUBMITTED BY:-");
setcolor(MAGENTA);
outtextxy(40,130,"NAME :- KHUSHBOO SHARMA");
outtextxy(40,160,"BRANCH :- CSE(B) - V SEM");
outtextxy(40,190,"ROLL NO. : - 0609210046");
outtextxy(40,220,"COLLEGE :- PCCS");
setcolor(YELLOW);
settextstyle(1,0,2.5);
outtextxy(300,250,"&");
setcolor(BLUE);
outtextxy(350,280,"NAME :- NIHARIKA SETH");
outtextxy(350,310,"BRANCH :- CSE(B) - V SEM");
outtextxy(350,340,"ROLL NO. : - 0609210067");
outtextxy(350,370,"COLLEGE :- PCCS");
getch();
restorecrtmode();
}
void next()
{
char str[MAX];
int gdriver=DETECT, gmod;
initgraph(&gdriver,&gmod,"c:\\tc\\bgi");
setcolor(RED);
rectangle(20,85,615,435);
rectangle(25,90,610,430);
setcolor(GREEN);
setcolor(GREEN+BLINK);
outtextxy(110,110,"ENTER THE CODE TO BE ANALYSED.");
outtextxy(110,160,"THE PROGRAM WILL FIND THE");
outtextxy(110,210,"VARIOUS TOKENS PRESENT IN THE");
outtextxy(110,260,"INPUT AND PROVIDE YOU WITH THE
SAME");
getch();
restorecrtmode();
}
void main()
{
char str[MAX];
int state=0;
int i=0, j, startid=0, endid, startcon, endcon;
clrscr();
first();
for(j=0; j<MAX; j++)
str[j]=NULL;
second(); //Initialise NULL
next();
printf("\nEnter the string to be analysed:\n\n");
gets(str); //Accept input string
str[strlen(str)]=' ';
gotoxy(400,110);
printf("Analysis:\n\n");
while(str[i]!=NULL)
{
while(str[i]==' ') //To eliminate spaces
i++;
switch(state)
{
case 0: if(str[i]=='i') state=1; //if
else if(str[i]=='w') state=3; //while
else if(str[i]=='d') state=8; //do
else if(str[i]=='e') state=10; //else
else if(str[i]=='f') state=14; //for
else if(isalpha(str[i]) || str[i]=='_')
{
state=17;
startid=i;
} //identifiers
else if(str[i]=='<') state=19;

//relational '<' or '<='
else if(str[i]=='>') state=21;
//relational '>' or '>='
else if(str[i]=='=') state=23;

//relational '==' or assignment '='
else if(isdigit(str[i]))
{
state=25; startcon=i;
}
//constant
else if(str[i]=='(') state=26;

//special characters '('
else if(str[i]==')') state=27;

//special characters ')'
else if(str[i]==';') state=28;

//special characters ';'
else if(str[i]=='+') state=29;

//operator '+'
else if(str[i]=='-') state=30;

//operator '-'
break;
//States for 'if'

case 1: if(str[i]=='f') state=2;
else { state=17; startid=i-1; i--; }
break;
case 2: if(str[i]=='(' || str[i]==NULL)
{
printf("if : Keyword\n\n");
state=0;
i--;
}
break;
//States for 'while'

case 3: if(str[i]=='h') state=4;
break;
case 4: if(str[i]=='i') state=5;
break;
case 5: if(str[i]=='l') state=6;
break;
case 6: if(str[i]=='e') state=7;
break;
{
printf("while : Keyword\n\n");
state=0;
i--;
}
break;
//States for 'do'

case 8: if(str[i]=='o') state=9;
break;
case 9: if(str[i]=='{' || str[i]==' ' || str[i]==NULL ||
str[i]=='(')
{
printf("do : Keyword\n\n");
state=0;
i--;
}
break;
//States for 'else'
case 10: if(str[i]=='l') state=11;
break;
case 11: if(str[i]=='s') state=12;
break;
case 12: if(str[i]=='e') state=13;
break;
case 13: if(str[i]=='{' || str[i]==NULL)
{
printf("else : Keyword\n\n");
state=0;
i--;
}
break;
//States for 'for'

case 14: if(str[i]=='o') state=15;
break;
case 15: if(str[i]=='r') state=16;
break;
{
printf("for : Keyword\n\n");
state=0;
i--;
}
break;
//States for identifiers

case 17: if(isalnum(str[i]) || str[i]=='_')
{
state=18; i++;
}
else if(str[i]==NULL||str[i]=='<'||str[i]=='>'||
str[i]=='('||str[i]==')'||str[i]==';'||str[i]=='='||str[i]=='+'||
str[i]=='-') state=18;
i--;
break;
case 18:
if(str[i]==NULL || str[i]=='<' || str[i]=='>' || str[i]=='(' ||

str[i]==')' || str[i]==';' || str[i]=='=' || str[i]=='+' ||str[i]=='-')
{
endid=i-1;
printf("");
for(j=startid; j<=endid; j++)
printf("%c", str[j]);
printf(" : Identifier\n\n");
state=0;
i--;
}
break;
//States for relational operator '<' & '<='

case 19: if(str[i]=='=') state=20;
else if(isalnum(str[i]) || str[i]=='_')
{
printf("< : Relational operator\n\n");
i--;
state=0;
}
break;
{
printf("<= : Relational operator\n\n");
i--;
state=0;
}
break;
//States for relational operator '>' & '>='
else if(isalnum(str[i]) || str[i]=='_')
{
printf("> : Relational operator\n\n");
i--;
state=0;
}
break;
{
printf(">= : Relational operator\n\n");
i--;
state=0;
}
break;
//States for relational operator '==' & assignment operator

'='
else
{
printf("= : Assignment operator\n\n");
i--;
state=0;
}
break;
case 24: if(isalnum(str[i]))
{
printf("== : Relational operator\n\n");
state=0;
i--;
}
break;
//States for constants

case 25: if(isalpha(str[i]))
{
printf("*** ERROR ***\n\n");
puts(str);
for(j=0; j<i; j++)
printf(" ");
printf("^");
printf("Error at position %d : Alphabet cannot follow
digit\n", i);
state=99;
}
else if(str[i]=='(' || str[i]==')' || str[i]=='<' || str[i]=='>' ||
str[i]==NULL || str[i]==';' || str[i]=='=')
{
endcon=i-1;
printf("");
for(j=startcon; j<=endcon; j++)
printf("%c", str[j]);
printf(" : Constant\n\n");
state=0;
i--;
}
break;
//State for special character '('

case 26:
printf("( : Special character\n\n");
startid=i;
state=0;
i--;
break;
//State for special character ')'
case 27:
printf(") : Special character\n\n");
state=0;
i--;
break;
//State for special character ';'
case 28:
printf("; : Special character\n\n");
state=0;
i--;
break;
//State for operator '+'
case 29:
printf("+ : Operator\n\n");
state=0;
i--;
break;
//State for operator '-'
case 30:
printf("- : Operator\n\n");
state=0;
i--;
break;
//Error State
case 99: goto END;
}
i++;
}
printf("\n\nEnd of program\n\n");
END:
getch();
}
OUTPUT (example)
Correct input
Enter the string to be analysed : for(x1=0; x1<=10; x1++);
Analysis:
for : Keyword
( : Special character
x1 : Identifier
= : Assignment operator
0 : Constant
; : Special character
x1 : Identifier
<= : Relational operator
10 : Constant
x1 : Identifier
+ : Operator
+ : Operator
) : Special character
End of program
Wrong input
Enter the string to be analyzed: for(x1=0; x1<=19x; x++);
Analysis:
for : Keyword
( : Special character
x1 : Identifier
= : Assignment operator
0 : Constant
x1 : Identifier
<= : Relational operator
***ERROR***
for(x1=0; x1<=19x; x++);

^error at position 12: alphabet cannot follow digit
LIMITATIONS
• The input text has to be entered in a single line without any

indentations.This is contrary to the presently followed style
for writing programs.
• The whitespace characters such as blanks, newline

characters and tabs are not considered for the analysis
(generation of tokens).
• If, during the analysis of the input, an invalid sequence of

character is encountered, an error message is displayed and
the characters following this sequence are not analyzed.
FUTURE PROSPECTS OF THE
PROJECT
• This program generates the lexical analyzer which is the first

step in the construction of a compiler.
• If the further phases of the construction process are

performed correctly then an efficient compiler can be
developed.
REFERENCES
1) Principles of compiler design-
By Alfred V. Aho & Jefferey D. Ullman
2) Compilers techniques, design & tools- by Alfred V.

Aho , Jefferey D. Ullman & Ravi Sethi
3) Websites:
 www.wikipaedia.org
 www.crazyengineers.com
 www.mec.com
 www.curriri.com
ACKNOWLEDGEMENT
This project is the outcome of the efforts of several people, apart

from the team members, and it is important that their help be
acknowledged here.
First of all I want to present my sincere gratitude and deep

appreciations to Mr. ROHIT SACHAN and Mr. ABHINAV YADAV
(PROJECT INCHARGE) for their valuable support and guidance.
Without motivation, a person is literally unable to make her best

effort. I am highly grateful to them for their guidance as they
played an important role in making this project a success.
I would also like to devote my special thanks to Mrs. LALITA
VERMA (H.O.D-C.S.E) for availing us with the various required
resources.
KHUSHBOO SHARMA
0609210046
B. TECH. (5th SEM.)
COMPUTER SCIENCE & ENGINEERING
ACKNOWLEDGEMENT
This project is the outcome of the efforts of several people, apart

from the team members, and it is important that their help be
acknowledged here.
First of all I want to present my sincere gratitude and deep

appreciations to Mr. ROHIT SACHAN and Mr. ABHINAV YADAV
(PROJECT INCHARGE ) for their valuable support and guidance.
Without motivation, a person is literally unable to make her best

effort. I am highly grateful to them for their guidance as they
played an important role in making this project a success.
I would also like to devote my special thanks to Mrs. LALITA
VERMA (H.O.D-C.S.E) for availing us with the various required
resources.
NIHARIKA SETH
0609210067
B. TECH. (5th SEM.)
COMPUTER SCIENCE & ENGINEERING
MINI PROJECT
ON
SUBMITTED TO: -
SUBMITTED BY: -
Mr. ROHIT SACHAN

KHUSHBOO SHARMA
& (0609210046)
Mr. ABHINAV YADAV
&
NIHA
RIKA SETH
(06092
10067)
CERTIFICATE OF APPROVAL
This is to certify that KHUSHBOO SHARMA student

of B. Tech. - Computer Science & Engineering (5TH sem.)
of PRIYADARSHINI COLLEGE OF COMPUTER
SCIENCES,GREATER NOIDA (roll no. –
0609210046) has successfully completed her mini
project under my guidance.
Mrs.Lalita Verma
H.O.D. (C.S.E. department)
Priyadarshini College of Computer Sciences
CERTIFICATE OF APPROVAL
This is to certify that NIHARIKA SETH student of B.

Tech. - Computer Science & Engineering (5TH sem.) of
PRIYADARSHINI COLLEGE OF COMPUTER
SCIENCES,GREATER NOIDA (roll no. –
0609210067) has successfully completed her mini
project under my guidance.
Mrs.Lalita Verma
H.O.D. (C.S.E. department)
Priyadarshini College of Computer Sciences

BHIMKAYA

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

BHIMKAYA

Hochgeladen von

Copyright:

Verfügbare Formate

LEVEL-0-DFD (DATA FLOW DIAGRAM)

Compilers are sometimes classified as single-pass, multi-pass,

Conceptually, a compiler operates in phases, each of which

Symbol table management

Error Detection and Reporting

The syntax and semantic analysis phases usually handle a large

position := initial + rate * 10

The lexical analysis phase reads the characters in the source

id1: = id2 + id3 * 10

Syntax analysis imposes a hierarchical structure on the token

THE SYNTHESIS PHASES:-

In three-address code, the source pgm might look like this,

The purpose of the lexical analyzer is to partition the input text,

A simple way to build lexical analyzer is to construct a diagram

Role of a Lexical Analyzer

Fig. Interaction of lexical analyzer with parser.

Issues in Lexical Analysis

1) Simpler design is the most important consideration. The

2) Compiler efficiency is improved.

3) Compiler portability is enhanced.

In most programming languages, the following constructs are

SAMPLE INFORMAL DESCRIPTION OF

In the example when the character sequence pi appears in the

A pattern is a rule describing a set of lexemes that can represent

Certain language conventions impact the difficulty of lexical

In Pascal, an identifier is a letter followed by zero or more letters

letter (letter | digit)*

Consider the following grammar fragment:

stmt àif expr then stmt

|if expr then stmt else stmt

expràterm relop term

where the terminals if , then, else, relop, id and num generate

For this language fragment the lexical analyzer will

In addition, we assume lexemes are separated by white space,

If a match for ws is found, the lexical analyzer does not return a

A transition diagram is a stylized flowchart. Transition diagram is

Positions in a transition diagram are drawn as circles and are

Fig 5. Transition diagram

A recognizer for a language is a program that takes as input a

Non deterministic finite automata

Consider this expression in the C programming language:

Tokenized in the following table:

Tokens are frequently defined by regular expressions,

Following tokenizing is parsing. From there, the

The first phase of the construction of a compiler is the

The program has been made on the concept of NFA (Non

NFA (Non Deterministic Finite Automata) is a

A lexical analyzer generator creates a lexical analyser using a set

This is usually implemented using a finite automaton. There is an

This can be implemented using an NFA or a DFA. The transition

The transition table for the NFA N is constructed for the

• turbo C/C++ IDE

else if(str[i]=='<') state=19;

else if(str[i]=='=') state=23;

else if(str[i]=='(') state=26;

else if(str[i]==')') state=27;

else if(str[i]==';') state=28;

else if(str[i]=='+') state=29;

else if(str[i]=='-') state=30;

//States for 'if'

//States for 'while'

//States for 'do'

//States for 'for'