Top-Down Parsing: Programming Language Application

Top-down parsing
In computer science, top-down parsing is a parsing strategy where one first looks at the
highest level of the parse tree and works down the parse tree by using the rewriting rules of a
formal grammar. LL parsers are a type of parser that uses a top-down parsing strategy.
Top-down parsing is a strategy of analyzing unknown data relationships by hypothesizing
general parse tree structures and then considering whether the known fundamental structures
are compatible with the hypothesis. It occurs in the analysis of both natural languages and
computer languages.
Top-down parsing can be viewed as an attempt to find left-most derivations of an inputstream by searching for parse-trees using a top-down expansion of the given formal grammar
rules. Tokens are consumed from left to right. Inclusive choice is used to accommodate
ambiguity by expanding all alternative right-hand-sides of grammar rules.[1]
Simple implementations of top-down parsing do not terminate for left-recursive grammars,
and top-down parsing with backtracking may have exponential time complexity with respect
to the length of the input for ambiguous CFGs.[2] However, more sophisticated top-down
parsers have been created by Frost, Hafiz, and Callaghan [3][4] which do accommodate
ambiguity and left recursion in polynomial time and which generate polynomial-sized
representations of the potentially exponential number of parse trees.
Programming language application

A compiler parses input from a programming language to assembly language or an internal
representation by matching the incoming symbols to production rules. Production rules are
commonly defined using Backus-Naur form. An LL parser is a type of parser that does topdown parsing by applying each production rule to the incoming symbols, working from the
left-most symbol yielded on a production rule and then proceeding to the next production rule
for each non-terminal symbol encountered. In this way the parsing starts on the Left of the
result side (right side) of the production rule and evaluates non-terminals from the Left first
and, thus, proceeds down the parse tree for each new non-terminal before continuing to the
next symbol for a production rule.
For example:
would match and attempt to match next. Then would be tried. As one may expect, some
languages are more ambiguous than others. For a non-ambiguous language in which all
productions for a non-terminal produce distinct strings: the string produced by one production
will not start with the same symbol as the string produced by another production. A nonambiguous language may be parsed by an LL(1) grammar where the (1) signifies the parser
reads ahead one token at a time. For an ambiguous language to be parsed by an LL parser, the
parser must lookahead more than 1 symbol, e.g. LL(3).
The common solution to this problem is to use an LR parser, which is a type of shift-reduce
parser, and does bottom-up parsing.
Accommodating left recursion in top-down parsing
A formal grammar that contains left recursion cannot be parsed by a naive recursive descent
parser unless they are converted to a weakly equivalent right-recursive form. However, recent
research demonstrates that it is possible to accommodate left-recursive grammars (along with
all other forms of general CFGs) in a more sophisticated top-down parser by use of
curtailment. A recognition algorithm which accommodates ambiguous grammars and curtails
an ever-growing direct left-recursive parse by imposing depth restrictions with respect to
input length and current input position, is described by Frost and Hafiz in 2006. [5] That
algorithm was extended to a complete parsing algorithm to accommodate indirect (by
comparing previously computed context with current context) as well as direct left-recursion
in polynomial time, and to generate compact polynomial-size representations of the
potentially exponential number of parse trees for highly ambiguous grammars by Frost, Hafiz
and Callaghan in 2007.[3] The algorithm has since been implemented as a set of parser
combinators written in the Haskell programming language. The implementation details of
these new set of combinators can be found in a paper [4] by the authors, which was presented
in PADL'08. The X-SAIGA site has more about the algorithms and implementation details.
Time and space complexity of top-down parsing

When top-down parser tries to parse an ambiguous input with respect to an ambiguous CFG,
it may need exponential number of steps (with respect to the length of the input) to try all
alternatives of the CFG in order to produce all possible parse trees, which eventually would
require exponential memory space. The problem of exponential time complexity in top-down
parsers constructed as sets of mutually recursive functions has been solved by Norvig in
1991.[6] His technique is similar to the use of dynamic programming and state-sets in Earley's
algorithm (1970), and tables in the CYK algorithm of Cocke, Younger and Kasami.
The key idea is to store results of applying a parser p at position j in a memotable and to
reuse results whenever the same situation arises. Frost, Hafiz and Callaghan [3][4] also use
memoization for refraining redundant computations to accommodate any form of CFG in
polynomial time ((n4) for left-recursive grammars and (n3) for non left-recursive
grammars). Their top-down parsing algorithm also requires polynomial space for potentially
exponential ambiguous parse trees by 'compact representation' and 'local ambiguities
grouping'. Their compact representation is comparable with Tomitas compact representation
of bottom-up parsing.[7]
Using PEG's, another representation of grammars, packrat parsers provide an elegant and
powerful parsing algorithm. See Parsing expression grammar.
Bottom-up parsing
In computer science, parsing reveals the grammatical structure of linear input text, as a first
step in working out its meaning. Bottom-up parsing identifies and processes the text's
lowest-level small details first, before its mid-level structures, and leaving the highest-level
overall structure to last.
Bottom-up Versus Top-down
The bottom-up name comes from the concept of a parse tree, in which the most detailed parts
are at the bottom of the (upside-down) tree, and larger structures composed from them are in
successively higher layers, until at the top or "root" of the tree a single unit describes the
entire input stream. A bottom-up parse discovers and processes that tree starting from the
bottom left end, and incrementally works its way upwards and rightwards.[1] A parser may act
on the structure hierarchy's low, mid, and highest levels without ever creating an actual data
tree; the tree is then merely implicit in the parser's actions. Bottom-up parsing lazily waits
until it has scanned and parsed all parts of some construct before committing to what the
combined construct is.
Typical parse tree for

A = B + C*2; D = 1
Bottom-up parse steps
Top-down parse steps

The opposite of this are top-down parsing methods, in which the input's overall structure is
decided (or guessed at) first, before dealing with mid-level parts, leaving the lowest-level
small details to last. A top-down parser discovers and processes the hierarchical tree starting
from the top, and incrementally works its way downwards and rightwards. Top-down parsing
eagerly decides what a construct is much earlier, when it has only scanned the leftmost
symbol of that construct and has not yet parsed any of its parts. Left corner parsing is a
hybrid method which works bottom-up along the left edges of each subtree, and top-down on
the rest of the parse tree.
If a language grammar has multiple rules that may start with the same leftmost symbols but
have different endings, then that grammar can be efficiently handled by a deterministic
bottom-up parse but cannot be handled top-down without guesswork and backtracking. So
bottom-up parsers handle a somewhat larger range of computer language grammars than do
deterministic top-down parsers.
Bottom-up parsing is sometimes done by backtracking. But much more commonly, bottomup parsing is done by a shift-reduce parser such as a LALR parser.
One of the earlier documentations of a bottom-up parser is "A Syntax-Oriented Translator" by
Peter Zilahy Ingerman, published in 1966 by Academic Press, NY.
Examples
Precedence parser
o Operator-precedence parser
o Simple precedence parser
BC (bounded context) parsing
LR parser (Left-to-right, Rightmost derivation)
o Simple LR (SLR) parser
o LALR parser
o Canonical LR (LR(1)) parser
o GLR parser
CYK parser
Recursive ascent parser
Shift-Reduce Parser

Top-Down Parsing: Programming Language Application

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Top-Down Parsing: Programming Language Application

Hochgeladen von

Copyright:

Verfügbare Formate

Top-down parsing

Programming language application

Accommodating left recursion in top-down parsing

Time and space complexity of top-down parsing

Bottom-up Versus Top-down

Typical parse tree for

Bottom-up parse steps

Top-down parse steps

Das könnte Ihnen auch gefallen