Beruflich Dokumente
Kultur Dokumente
SOURCE CODE
NFA
INPUT OUTPUT
(SOURCE CODE) (TOKENS)
generati
on
SRS
(SOFTWARE REQUIREMENT
SPECIFICATION)
1. INTRODUCTION
COMPILER
Simply stated, a compiler is a program that reads a program
written in one language-the source language-and translates it
into an equivalent program in another language-the target
language. As an important part of this translation process, the
compiler reports to its user the presence of errors in the source
program.
Source
Compiler Target
program program
Error messages
Code Optimisation
The code optimization phase attempts to improve the
intermediate code, so that faster running machine codes will
result. Some optimizations are trivial. There is a great variation in
the amount of code optimization different compilers perform. In
those that do the most, called ‘optimising compilers’, a significant
fraction of the time of the compiler is spent on this phase.
Code Generation
The final phase of the compiler is the generation of target code,
consisting normally of relocatable machine code or assembly
code. Memory locations are selected for each of the variables
used by the program. Then, intermediate instructions are each
translated into a sequence of machine instructions that perform
the same task. A crucial aspect is the assignment of variables to
registers.
LEXICAL ANALYSIS
Lexical analysis is the process of converting a sequence of
characters into a sequence of tokens. Programs performing lexical
analysis are called lexical analyzers or lexers. A lexer is often
organized as separate scanner and tokenizer functions, though
the boundaries may not be clearly defined.
Symbol
table
Since the lexical analyzer is the part of the compiler that reads
the source text, it may also perform certain secondary tasks at
the user interface. One such task is stripping out from the source
program comments and white space in the form of blank, tab,
and new line character. Another is correlating error messages
from the compiler with the source program.
Attributes of Token
The lexical analyzer returns to the parser a representation for the
token it has found. The representation is an integer code if the
token is a simple construct such as a left parenthesis, comma, or
colon .The representation is a pair consisting of an integer code
and a pointer to a table if the token is a more complex element
such as an identifier or constant .The integer code gives the
token type, the pointer points to the value of that token .Pairs are
also retuned whenever we wish to distinguish between instances
of a token.
Regular Expressions
The vertical bar here means “or” , the parentheses are used to
group subexpressions, the star means “ zero or more instances
of” the parenthesized expression, and the juxtaposition of letter
with remainder of the expression means concatenation.
A regular expression is built up out of simpler regular expressions
using set of defining rules. Each regular expression r denotes a
language L(r). The defining rules specify how L(r) is formed by
combining in various ways the languages denoted by the
subexpressions of r.
Recognition of Tokens
The question of how to recognize the tokens is handled in this
section. The language generated by the following grammar is
used as an example.
|
|term
termàid
|num
ifà if
thenà ten
elseà else
relopà <|<=|=|<>|>|>=
idàletter(letter|digit)*
numàdigit+ (.digit+)?(E(+|-)?digit+)?
delimàblank|tab|newline
wsàdelim+
Transition diagram
One state is labeled as the start state; it is the initial state of the
transition diagram where control resides when we begin to
recognize a token. Certain states may have actions that are
executed when the flow of control reaches that state. On entering
a state we read the next input character if there is and edge from
the current state whose label matches this input character, we
then go to the state pointed to by the edge. Otherwise we
indicate failure. A transition diagram for >= is shown in the
figure.
star
t
0 6 7
other
1) a set of states S
2) input alphabet
3) transition function
4) initial state
5) final state
Lexical grammar
The specification of a programming language will include a set of
rules, often expressed syntactically, specifying the set of possible
character sequences that can form a token or lexeme. The
whitespace characters are often ignored during lexical analysis.
Token
A token is a categorized block of text. The block of text
corresponding to the token is known as a lexeme. A lexical
analyzer processes lexemes to categorize them according to
function, giving them meaning. This assignment of meaning is
known as tokenization. A token can look like anything; it just
needs to be a useful part of the structured text.
sum=3+2;
p1 {action 1}
p2 {action 2}
............
pn {action n}
where pi is a regular expression and each action actioni is a
program fragment that is to be executed whenever a lexeme
matched by pi is found in the input. If more than one pattern
matches, then longest lexeme matched is chosen. If there are
two or more patterns that match the longest lexeme, the first
listed matching pattern is chosen.
Using NFA
HARDWARE SPECIFICATIONS
• minimum 128 mb RAM
• Pentium processor
SOFTWARE SPECIFICATIONS
• Operating system(windows xp/vista(64-
bit)/me/2000/98)
#include<stdio.h>
#include<conio.h>
#include<graphics.h>
#include<ctype.h>
#include<string.h>
#define MAX 30
void first()
{
int gd=DETECT,gm;
initgraph(&gd,&gm,"c:\\tc\\bgi");
setcolor(GREEN);
settextstyle(10,0,7);
outtextxy(130,50,"LEXICAL");
setcolor(YELLOW);
settextstyle(10,0,7);
outtextxy(90,190,"ANALYSIS");
settextstyle(1,0,4);
getch();
restorecrtmode();
}
void second ()
{
char str[MAX];
int gdriver=DETECT, gmod;
initgraph(&gdriver,&gmod,"c:\\tc\\bgi");
setcolor(RED);
rectangle(20,85,615,435);
rectangle(25,90,610,430);
setcolor(GREEN);
settextstyle(1,0,4);
outtextxy(30,30,"SUBMITTED BY:-");
settextstyle(6,0,1);
setcolor(MAGENTA);
outtextxy(40,130,"NAME :- KHUSHBOO SHARMA");
outtextxy(40,160,"BRANCH :- CSE(B) - V SEM");
outtextxy(40,190,"ROLL NO. : - 0609210046");
outtextxy(40,220,"COLLEGE :- PCCS");
setcolor(YELLOW);
settextstyle(1,0,2.5);
outtextxy(300,250,"&");
settextstyle(6,0,1);
setcolor(BLUE);
outtextxy(350,280,"NAME :- NIHARIKA SETH");
outtextxy(350,310,"BRANCH :- CSE(B) - V SEM");
outtextxy(350,340,"ROLL NO. : - 0609210067");
outtextxy(350,370,"COLLEGE :- PCCS");
getch();
restorecrtmode();
}
void next()
{
char str[MAX];
int gdriver=DETECT, gmod;
initgraph(&gdriver,&gmod,"c:\\tc\\bgi");
setcolor(RED);
rectangle(20,85,615,435);
rectangle(25,90,610,430);
setcolor(GREEN);
settextstyle(7,0,4);
settextstyle(7,0,1);
setcolor(GREEN+BLINK);
outtextxy(110,110,"ENTER THE CODE TO BE ANALYSED.");
outtextxy(110,160,"THE PROGRAM WILL FIND THE");
outtextxy(110,210,"VARIOUS TOKENS PRESENT IN THE");
outtextxy(110,260,"INPUT AND PROVIDE YOU WITH THE
SAME");
getch();
restorecrtmode();
}
void main()
{
char str[MAX];
int state=0;
int i=0, j, startid=0, endid, startcon, endcon;
clrscr();
first();
for(j=0; j<MAX; j++)
str[j]=NULL;
second(); //Initialise NULL
next();
printf("\nEnter the string to be analysed:\n\n");
gets(str); //Accept input string
str[strlen(str)]=' ';
gotoxy(400,110);
printf("Analysis:\n\n");
while(str[i]!=NULL)
{
while(str[i]==' ') //To eliminate spaces
i++;
switch(state)
{
case 0: if(str[i]=='i') state=1; //if
else if(str[i]=='w') state=3; //while
else if(str[i]=='d') state=8; //do
else if(str[i]=='e') state=10; //else
else if(str[i]=='f') state=14; //for
else if(isalpha(str[i]) || str[i]=='_')
{
state=17;
startid=i;
} //identifiers
else if(isdigit(str[i]))
{
state=25; startcon=i;
}
//constant
break;
case 18:
OUTPUT (example)
Correct input
Enter the string to be analysed : for(x1=0; x1<=10; x1++);
Analysis:
for : Keyword
( : Special character
x1 : Identifier
= : Assignment operator
0 : Constant
; : Special character
x1 : Identifier
<= : Relational operator
10 : Constant
; : Special character
x1 : Identifier
+ : Operator
+ : Operator
) : Special character
; : Special character
End of program
Wrong input
Enter the string to be analyzed: for(x1=0; x1<=19x; x++);
Analysis:
for : Keyword
( : Special character
x1 : Identifier
= : Assignment operator
0 : Constant
; : Special character
x1 : Identifier
<= : Relational operator
***ERROR***
LIMITATIONS
3) Websites:
www.wikipaedia.org
www.crazyengineers.com
www.mec.com
www.curriri.com
ACKNOWLEDGEMENT
KHUSHBOO SHARMA
0609210046
B. TECH. (5th SEM.)
COMPUTER SCIENCE & ENGINEERING
ACKNOWLEDGEMENT
NIHARIKA SETH
0609210067
B. TECH. (5th SEM.)
COMPUTER SCIENCE & ENGINEERING
MINI PROJECT
ON
SUBMITTED TO: -
SUBMITTED BY: -
CERTIFICATE OF APPROVAL
CERTIFICATE OF APPROVAL
Mrs.Lalita Verma
H.O.D. (C.S.E. department)
Priyadarshini College of Computer Sciences