Beruflich Dokumente
Kultur Dokumente
\n { printf("1"); }
... { printf("2"); }
[a-c]+ { printf("3"); }
[A-C]+ { printf("4"); }
[1-3]+ { printf("5"); }
[acB13]+ { printf("6"); }
[^a-c]+[^A-C]+[1-9] { printf("7"); }
What will be the output on the following input? Assume there is nothing to the right of
the last visible character on each line except a newline character and that the following
input is to be scanned all at once:
abcABC123
321CBAcba
BaB1cAb
AAAAacac1231
• Integer constants are decimal strings with as many leading zeroes as you want.
There are no negative numbers in our subset. Legitimate integer constants
include 15, 0117, 00000000040, and 0.
• String constants are double quote-delimited strings of zero of more characters.
The characters themselves can be anything whatsoever, except for ". Schemers
just need to do without.
• Symbols are text strings without double quotes. They always start with a letter,
but can otherwise contain letters, numbers, and a few other characters: '_', '-',
'>', '?', and '!'.
2
• (1 2 3 4 5)
• ("hi" there "hey” (there "ho" ((there))))
• (())
Your job is to leverage off the abbreviated scheme-lexer.h/.l files and some C++ classes
representing the various Scheme components (you’ll hear the buzzword
S-Expression if you overhear Scheme programmers talking), and to figure out how to
build a parse tree for an arbitrary Scheme S-Expression. This isn’t so much about flex as
it is about ad hoc parsing. The challenge here is to wire up a tree representation of a
Scheme expression. Ultimately you’ll have bison do the same thing for you, and after
you do this you’ll appreciate the wiring capabilities of bison quite a bit. You’ll also be
prompted to start thinking about C++ virtual inheritance a bit.
typedef union {
int intValue;
char *textValue;
} YYSTYPE;
typedef enum {
T_IntegerConstant = 256, T_StringConstant, T_Symbol
} TokenType;
int yylex();
scheme-scanner.l
%}
Whitespace ([ \t\n\r]+)
IntegerConstant ([0-9]+)
StringConstant (\"[^\"]*\")
Symbol ([a-zA-Z][a-zA-Z0-9\-\>?!]+)
%%
{Whitespace} { ; }
{IntegerConstant} { yylval.intValue = strtol(yytext, NULL, 10);
return T_IntegerConstant; }
{StringConstant} { yylval.textValue = strdup(yytext);
return T_StringConstant; }
{Symbol} { yylval.textValue = strdup(yytext);
return T_Symbol; }
3
Here’re the C++ class definitions that correspond to text, integers, and lists. Note that
symbols and string constants can be handled by the same class.
class SExpression {
public:
virtual ~SExpression() {}
protected:
SExpression() {};
};
public:
Integer(int n) : n(n) {}
virtual ~Integer() {}
private:
int n;
};
public:
String(const char *text) : text(text) {}
virtual ~String() {}
private:
string text; // std::string
};
public:
SExpressionList();
virtual ~SExpressionList();
virtual void append(SExpression *expr);
private:
vector<SExpression *> elements;
};
Write a function called readList which repeatedly calls yylex to read in all of the tokens
making up a Scheme list (possibly empty, possibly containing just primitives, or maybe
containing sublists and subsublists). You can assume that stdin feeds in exactly one
perfectly formatted Scheme list. Assume the following prototype (where lookahead
refers to a master TokenType that’s be initialized to the '(' returned by the first call to
yylex.
4
First, we’ll draw the data structure representation of an arbitrary Scheme list, just to
illustrate what you’re working toward. You’ll then all work together to write the code.
a b a, b
start b a b a b b
A B C D E F Z
b a
a
b.) Present a context-free grammar that generates the language accepted by your DFA
from part a. [Make sure you understand how to do this for any DFA, not just this
one.]
The most obvious approach is to set each production to imitate some transition in
the DFA. Here’s what I was thinking:
A " aA|bB|#
B " aC|bB|#
C " aA|bD|#
D " aE|bB|#
E " aA|bF|#
F " aE|#
Since Z is a dead state, we don’t need to include any mention of it in the CFG
(though it’s not a mistake to—just
! unnecessary.)
5
\n { printf("1"); }
... { printf("2"); }
[a-c]+ { printf("3"); }
[A-C]+ { printf("4"); }
[1-3]+ { printf("5"); }
[acB13]+ { printf("6"); }
[^a-c]+[^A-C]+[1-9] { printf("7"); }
What will be the output on the following input? Assume there is nothing to the right of
the last visible character on each line except a newline character and that the following
input is to be scanned all at once: 2 matches abc
7 matches ABC123\n321
abcABC123 2 matches CBA
321CBAcba 2 matches cba
BaB1cAb 1 matches \n
AAAAacac1231 6 matches BaB1c
4 matches A
2722164371 3 matches b
7 matches \nAAAAacac1231
1 matches \n (we didn’t require this one)