Beruflich Dokumente
Kultur Dokumente
Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically from grammar. Language generator tables
Input parser
stack
tables
Pushdown Automata
A context-free grammar can be recognized by a finite state machine with a stack: a PDA. The PDA is defined by set of internal states and a transition table. The PDA can read the input and read/write on the stack. The actions of the PDA are determined by its current state, the current top of the stack, and the current input symbol. There are three distinguished states: start state: nothing seen accept state: sentence complete error state: current symbol doesnt belong.
Top-down parsing
Parse tree is synthesized from the root (sentence symbol). Stack contains symbols of rhs of current production, and pending non-terminals. Automaton is trivial (no need for explicit states) Transition table indexed by grammar symbol G and input symbol a. Entries in table are terminals or productions: P ABC
Top-down parsing
Actions:
initially, stack contains sentence symbol At each step, let S be symbol on top of stack, and a be the next token on input. if T (S, a) is terminal a, read token, pop symbol from stack if T (S, a) is production P ABC., remove S from stack, push the symbols A, B, C on the stack (A on top). If S is the sentence symbol and a is the end of file, accept. If T (S, a) is undefined, signal error.
Semantic action: when starting a production, build tree node for non-terminal, attach to parent.
FOLLOW (N) is the set of terminals that can appear after a string derived from N:
Follow (Expr) = {+, ), $ }
e
aABC X1X2
X1X2 and X1
if A
a N,
Follow (N) includes Follow (A) e, Follow (N) includes Follow (A)
if A
aNB,B
Follow (E) = { ), $ } Follow (E) = { ), $ } Follow (T) = First (E ) + Follow (E) = { +, ), $ } Follow (T) = Follow (T) = { +, ), $ } Follow (F) = First (T) + Follow (T) = { *, +, ), $ }
LL (1) grammars
If table construction is successful, grammar is LL (1): left-to right, leftmost derivation with one-token lookahead. If construction fails, can conceive of LL (2), etc. Ambiguous grammars are never LL (k) If a terminal is in First for two different productions of A, the grammar cannot be LL (1). Grammars with left-recursion are never LL (k) Some useful constructs are not LL (k)
Bottom-up parsing
Synthesize tree from fragments Automaton performs two actions:
shift: push next symbol on stack reduce: replace symbols on stack
Automaton synthesizes (reduces) when end of a production is recognized States of automaton encode synthesis so far, and expectation of pending non-terminals Automaton has potentially large set of states Technique more general than LL (k)
LR (k) parsing
Left-to-right, rightmost derivation with k-token lookahead. Most general parsing technique for deterministic grammars. In general, not practical: tables too large (10^6 states for C++, Ada). Common subsets: SLR, LALR (1).
A state is a set of items Transition within states are determined by terminals and non-terminals Parsing tables are built from automaton:
action: shift / reduce depending on next symbol goto: change state depending on synthesized non-terminal
Adding states
If a state has item A a .a b,
and the next symbol in the input is a, we shift a on the stack and enter a state that contains item A a a.b (as well as all other items brought in by closure) if a state has as item A a. , this indicates the end of a production: reduce action. If a state has an item A a .N b, then after a reduction that find an N, go to a state with A a N. b
LR (k) parsing
Canonical LR (1): annotate each item with its own follow set: (A -> a a.b , f ) f is a subset of the follow set of A, because it is derived from a single specific production for A A state that includes A -> a a.b is a reduce state only if next symbol is in f: fewer reduce actions, fewer conflicts, technique is more powerful than SLR (1) Generalization: use sequences of k symbols in f Disadvantage: state explosion: impractical in general, even for LR (1)
LALR (1)
Compute follow set for a small set of items Tables no bigger than SLR (1) Same power as LR (1), slightly worse error diagnostics Incorporated into yacc, bison, etc.