Sie sind auf Seite 1von 40

PARSING WITH CONTEXT-FREE GRAMMARS

- by FaaDoOEngineers.com

PARSING

Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG:

Finding a derivation of the string consistent with the grammar The derivation gives us a PARSE TREE

- by FaaDoOEngineers.com

EXAMPLE (CFR LAST WEEK)

- by FaaDoOEngineers.com

PARSING AS SEARCH

Just as in the case of non-deterministic regular expressions, the main problem with parsing is the existence of CHOICE POINTS There is a need for a SEARCH STRATEGY determining the order in which alternatives are considered

- by FaaDoOEngineers.com

TOP-DOWN AND BOTTOM-UP SEARCH STRATEGIES


The search has to be guided by the INPUT and the GRAMMAR TOP-DOWN search: the parse tree has to be rooted in the start symbol S

EXPECTATION-DRIVEN parsing

BOTTOM-UP search: the parse tree must be an analysis of the input

DATA-DRIVEN parsing
- by FaaDoOEngineers.com

AN EXAMPLE OF TOP-DOWN SEARCH (IN PARALLEL)

- by FaaDoOEngineers.com

AN EXAMPLE OF BOTTOM-UP SEARCH

- by FaaDoOEngineers.com

NON-PARALLEL SEARCH

If its not possible to examine all alternatives in parallel, its necessary to make further decisions:

Which node in the current search space to expand first (breadth-first or depth-first) Which of the applicable grammar rules to expand first Which leaf node in a parse tree to expand next (e.g., leftmost)
- by FaaDoOEngineers.com

TOP-DOWN, DEPTH-FIRST, LEFT-TO-RIGHT

- by FaaDoOEngineers.com

TOP-DOWN, DEPTH-FIRST, LEFT-TO-RIGHT (II)

- by FaaDoOEngineers.com

TOP-DOWN, DEPTH-FIRST, LEFT-TO-RIGHT (III)

- by FaaDoOEngineers.com

TOP-DOWN, DEPTH-FIRST, LEFT-TO-RIGHT (IV)

- by FaaDoOEngineers.com

A T-D, D-F, L-R PARSER

- by FaaDoOEngineers.com

TOP-DOWN vs

BOTTOM-UP

TOP-DOWN:

Only search among grammatical answers BUT: suggests hypotheses that may not be consistent with data Problem: left-recursion

BOTTOM-UP:

Only forms hypotheses consistent with data BUT: may suggest hypotheses that make no sense globally
- by FaaDoOEngineers.com

LEFT-RECURSION

A LEFT-RECURSIVE grammar may cause a T-D, D-F, L-R parser to never return Examples of left-recursive rules:

NP NP PP S S and S But also:


NP Det Nom Det NPs

- by FaaDoOEngineers.com

THE PROBLEM WITH LEFT-RECURSION

- by FaaDoOEngineers.com

LEFT-RECURSION: POOR SOLUTIONS

Rewrite the grammar to a weakly equivalent one

Problem: may not get correct parse tree Problem: limit is arbitrary

Limit the depth during search

- by FaaDoOEngineers.com

LEFT-CORNER PARSING

A hybrid of top-down and bottom-up parsing Strategy: dont consider any expansion unless the current input can serve as the LEFT-CORNER of that expansion

- by FaaDoOEngineers.com

FURTHER PROBLEMS IN PARSING

Ambiguity

Church and Patel (1982): the number of attachment ambiguities grows like the Catalan numbers

C(2) = 2, C(3) = 5, C(4) = 14, C(5) = 132, C(6) = 469, C(7) = 1430, C(8) = 4867

Avoiding reparsing

- by FaaDoOEngineers.com

COMMON STRUCTURAL AMBIGUITIES

COORDINATION ambiguity

OLD (MEN AND WOMEN) vs (OLD MEN) AND WOMEN Gerundive VP attachment ambiguity

ATTACHMENT ambiguity:

I saw the Eiffel Tower flying to Paris I shot an elephant in my pajamas

PP attachment ambiguity

- by FaaDoOEngineers.com

PP ATTACHMENT AMBIGUITY

- by FaaDoOEngineers.com

AMBIGUITY: SOLUTIONS

Use a PROBABILISTIC GRAMMAR (not covered in this module) Use semantics

- by FaaDoOEngineers.com

AVOID RECOMPUTING INVARIANTS

Consider parsing with a top-down parser the NP:

A flight from Indianapolis to Houston on TWA NP Det Nominal NP NP PP NP ProperNoun

With the grammar rules:


- by FaaDoOEngineers.com

INVARIANTS AND TOP-DOWN PARSING

- by FaaDoOEngineers.com

THE EARLEY ALGORITHM

- by FaaDoOEngineers.com

DYNAMIC PROGRAMMING

A standard T-D parser would reanalyze A FLIGHT 4 times, always in the same way A DYNAMIC PROGRAMMING algorithm uses a table (the CHART) to avoid repeating work The Earley algorithm also

Does not suffer from the left-recursion problem Solves an exponential problem in O(n3)
- by FaaDoOEngineers.com

THE CHART

The Earley algorithm uses a table (the CHART) of size N+1, where N is the length of the input

Table entries sit in the `gaps between words Completed constituents In-progress constituents Predicted constituents

Each entry in the chart is a list of


All three types of objects are represented in the same way as STATES
- by FaaDoOEngineers.com

THE CHART: GRAPHICAL REPRESENTATION

- by FaaDoOEngineers.com

STATES

A state encodes two types of information:

How much of a certain rule has been encountered in the input Which positions are covered A , [X,Y]

DOTTED RULES

VP V NP NP Det Nominal S VP
- by FaaDoOEngineers.com

EXAMPLES

- by FaaDoOEngineers.com

SUCCESS

The parser has succeeded if entry N+1 of the chart contains the state

S , [0,N]

- by FaaDoOEngineers.com

THE ALGORITHM

The algorithm loops through the input without backtracking, at each step performing three operations:

PREDICTOR: add predictions to the chart COMPLETER: Move the dot to the right when looked-for constituent is found SCANNER: read in the next input word

- by FaaDoOEngineers.com

THE ALGORITHM: CENTRAL LOOP

- by FaaDoOEngineers.com

EARLEY ALGORITHM: THE THREE OPERATORS

- by FaaDoOEngineers.com

EXAMPLE, AGAIN

- by FaaDoOEngineers.com

EXAMPLE: BOOK THAT FLIGHT

- by FaaDoOEngineers.com

EXAMPLE: BOOK THAT FLIGHT (II)

- by FaaDoOEngineers.com

EXAMPLE: BOOK THAT FLIGHT (III)

- by FaaDoOEngineers.com

EXAMPLE: BOOK THAT FLIGHT (IV)

- by FaaDoOEngineers.com

READINGS

Jurafsky and Martin, chapter 10.1-10.4

- by FaaDoOEngineers.com

Das könnte Ihnen auch gefallen