Beruflich Dokumente
Kultur Dokumente
Scannerless parsing
From Wikipedia, the free encyclopedia
(Redirected from Lexerless parsing)
Main page Scannerless parsing (also called lexerless parsing) refers to the use of a single formalism to express
Contents both the lexical and contextfree syntax used to parse a language.
Featured content This parsing strategy is suitable when a clear lexerparser distinction is unneeded. Examples of when this
Current events is appropriate include TeX, most wiki grammars, makefiles, and simple per application control languages.
Random article
Donate Contents [hide]
1 Advantages
Interaction
2 Disadvantages
About Wikipedia
3 Required extensions
Community portal
4 Implementations
Recent changes
5 Notes
Contact Wikipedia
6 Further reading
Help
Toolbox
Advantages [edit]
Print/export
Only one metasyntax is needed
Languages Non-regular lexical structure is handled easily
Português "Token classification" is unneeded which removes the need for design accommodations such as "the
lexer hack" and language keywords (such as "while" in C)
Grammars can be compositional (can be merged without human intervention) [1]
Click to customize your PDFs pdfcrowd.com
Disadvantages [edit]
since the lexical scanning and syntactic parsing processing is combined, the resulting parser tends to
be harder to understand and debug for more complex languages
most parsers of characterlevel grammars are nondeterministic
there is no guarantee that the language being parsed is unambiguous
Unfortunately, when parsed at the character level, most popular programming languages are no longer
strictly contextfree. Visser identified five key extensions to classical contextfree syntax which handle
almost all common noncontextfree constructs arising in practice:
Follow restrictions, a limited form of "longest match"
Reject productions, a limited form of negative matching (as found in boolean grammars)
Preference attributes to handle the dangling else construct in Clike languages
Per-production transitions rather than pernonterminal transitions in order to facilitate:
Associativity attributes, which prevent a selfreference in a particular production of a nonterminal
from producing that same production
Precedence/priority rules, which prevent selfreferences in higherprecedence productions from
producing lowerprecedence productions
Implementations [edit]
SGLR is a parser for the modular Syntax Definition Formalism SDF, and is part of the ASF+SDF
MetaEnvironment and the Stratego/XT program transformation system.
JSGLR , a pure Java implementation of SGLR, also based on SDF.
TXL supports characterlevel parsing.
dparser generates ANSI C code for scannerless GLR parsers.
Spirit allows for both scannerless and scannerbased parsing.
Notes [edit]
^ This is because parsing at the character level makes the language recognized by the parser a single
contextfree language defined on characters, as opposed to a contextfree language of sequences of
strings in regular languages. Some lexerless parsers handle the entire class of contextfree languages,
which is closed under composition.
Visser, E. (1997b). Scannerless generalizedLR parsing. Technical Report P9707, Programming Research
Group, University of Amsterdam
Text is available under the Creative Commons AttributionShareAlike License; additional terms may apply. See Terms of Use for
details.
Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a nonprofit organization.
Contact us