Beruflich Dokumente
Kultur Dokumente
20152016 Semester II
Assignment
A parser is a pieceof softwarethattakesinputdata(frequentlytext)and buildsadatastructure
often some kind of parse tree, or other hierarchical structure giving a structuralrepresentation
of the input, checking for correct syntax intheprocess.Inthecaseofprogramminglanguages,a
parser is a component ofacompilerorinterpreter,whichparsessourcecodetocreatesomeform
ofinternalrepresentation.
This process happens every time you try to execute one of your Python scripts. You must all
fondly remember needing to overcome the generation of frequent syntax errors when first
learning the Python programming language. For this assignment we will create a rudimentary
parser for a rudimentary language that we will create and have it perform the first 2 stages of
parsing:lexicalanalysisandsyntacticanalysis.
Our language has a few keywords, operators, data types and delimiters. With these one shall be
able to write procedures, make selections and loop. However, as we will only be implementing
lexicalandsyntacticanalysis,wewillnotbeabletoexecutethestatementsandexpressions.
The keyword proc indicates the start of a procedure and the keyword end marks its
completion. To display results (we are not returning anything and therefore there is no return
keyword),wewillusedisp,ifthenforselectionanddountilforrepetition.
It is a strongly typed language and therefore variables and their types must be declared before
theyareused.Datatypesthatareallowedare:int,float,char,string,bool.
1
There are only 3 delimiters: the space, the colon, the line break. Colons are used immediately
after a variable name followed by a data type. For example, num:int indicates a variable
named num of type integer. Line breaks are used at the end of each line to indicatetheendofa
statement or expression. Spaces will be used toseparateothertokens,forexampletheendofthe
keyword proc and the start of the name of a procedure. Names can only be written using
lowercaseletters.
keywords=('proc','disp','if','then','do','until','end')
operators=('>','=','<','>','!=','+','','/','*','^')
types=('int','float','char','string','bool')
delimiters=('\n',':','')
Anexampleofaprocedurethatissyntacticallycorrectinourlanguagewouldbe,
'procaddx:inty:int\nsum:int\nx+y>sum\ndispsum\nend\n'
This defines a procedure called add, that would accept two parameters, x and y, both integers,
adds them and places the result of the addition intothevariablesumandthendisplaysthevalue
ofsum.Writethecodethatisrequiredanddescribedbelow.
Function
Description
isKeyword Accepts1argumentandreturnsTrue/Falseifitisinthetupleofkeywords
isOperator
Accepts1argumentandreturnsTrue/Falseifitisinthetupleofoperators
isType
Accepts1argumentandreturnsTrue/Falseifitisinthetupleofdatatypes
isDelim
Accepts1argumentandreturnsTrue/Falseifitisinthetupleofdelimiters
isLwrCase
Accepts1argumentandreturnsTrue/Falseifitisalowercasecharacter
isColon
Accepts1argumentandreturnsTrue/Falseifitisthecoloncharacteri.e.:
isSpace
Accepts1argumentandreturnsTrue/Falseifitisthespacecharacteri.e.
isLineBrk
Accepts1argumentandreturnsTrue/Falseifitisthelinebreakcharacteri.e.\n
2
2. Write a recursive function called isValidName which acceptsone(1)stringargument.
If the string consists oflowercaselettersofthealphabetonly,itreturnsTrue,otherwiseit
returnsFalse.Forexample,
>>>isValidName('proc')
True
>>>isValidName('Proc')
False
>>>
>>>getToken('procaddx:inty:int\nsum:int\nx+y>sum\ndispsum\nend\n')
('proc','addx:inty:int\nsum:int\nx+y>sum\ndispsum\nend\n')
>>>
Thefirsttokenencounteredisthekeywordprocwhichisdelimitedbyaspace,therefore
getTokenreturnsprocasthetoken,andtheremainderofthestringbeginningwith
thespaceandendingwiththelinebreak.WritegetTokenusingalocallydefined,
recursivefunctioncalledextractwhichperformstheservicejustdescribed.
5. Write a recursive function called tokenize that accepts a string as its argument. It
performs the service of lexical analysis i.e. it separates the string argument into its
constituent tokens. The tokens are returned as a list. It must use getToken. For
example,
>>>tokenize('procaddx:inty:int\nsum:int\nx+y>sum\ndispsum\nend\n')
['proc','','add','','x',':','int','','y',':','int','\n','sum',
':','int','\n','x','','+','','y','','>','','sum','\n','disp',
'','sum','\n','end','\n']
>>>
6. Write a function called canFollow which accepts two strings as its arguments. One
string is the token being analysed, the second is the token that succeeds (comes
immediately after) the token being analysed. canFollowreturns True or False based
on whether the successor token can follow the token being analysed, based on the
languagerulesdescribedearlier.Forexample,
>>>canFollow('proc','')
True
>>>canFollow('proc',':')
False
>>>
7. Write an iterativefunctioncalledanalyseSyntaxwhichacceptsalistasitsargument.
It performstheserviceofsyntacticanalysisi.e.thatthetokenintheargumentlistforman
allowable statement or expression. This functionmustusethefunctioncanFollowthat
you wrote earlier. If there is a syntax error, the function returns the string No syntax
errors found otherwiseitreturns thestring Syntaxerrorfoundalongwiththetokenand
itssuccessorthatgeneratedtheerror.Forexample,
>>> analyseSyntax(['proc', ' ', 'add', ' ', 'x', ':', 'int', ' ', 'y', ':',
'int','\n','sum',':','int','\n','x','','+', '','y','', '>','',
'sum','\n','disp','','sum','\n','end','\n'])
'Nosyntaxerrorsfound'
>>> analyseSyntax(['proc', ':', 'add', ' ', 'x', ':', 'int', ' ', 'y', ':',
'int','\n','sum',':','int','\n','x','','+', '','y','', '>','',
'sum','\n','disp','','sum','\n','end','\n'])
('Syntaxerrorfound:','proc',':')
>>>
8. Finally, write the function parse. It accepts a string as its sole argument and performs
the first two stages of parsing, lexical analysis and syntactic analysis. It must first print
the input string in its proper format, print the message Checking syntax, then call
upon the services of analyseSyntax and tokenize that you wrote earlier. For
example,
>>>parse('procaddx:inty:int\nsum:int\nx+y>sum\ndispsum\nend\n')
procaddx:inty:int
sum:int
x+y>sum
dispsum
end
Checkingsyntax...
'Nosyntaxerrorsfound'
>>>
Andthatsit.Webrokeourproblemintosmallerproblems,solvedthoseandthensolvedour
originalonebyputtingthesmallersolutionstogether.Ihopeyouhadfun!
This assignment is worth 15% of your coursework mark. It is due on April 15th 2016, at
11pm.PostyoursubmissionviaOurVLE.LookforthecontainernamedAssignment.
You are required to work in pairs, therefore ensure that your code has BOTH members'
ID numbers included as a comment. Only one member of the pair is to submit. Each
person may submit as a member of one (1) programming pair only. No late submissions
willbeaccepted.
Nameyourfileaccordingtothefollowingconvention,
IFtheIDnumbersofthestudentsare620000001and620000002
THENthefilenameshouldbe620000001_620000002.py