Sie sind auf Seite 1von 28

CS 598 JH: Advanced NLP (Spring ʼ09)

CFG parsing:
The basics

Julia Hockenmaier
juliahmr@illinois.edu
3324 Siebel Center
Office Hours: Fri, 2:00-3:00pm

http://www.cs.uiuc.edu/~juliahmr/cs598
Todayʼs topics
CFGs and PCFGs:
CFGs as AND/OR graphs
Shared parse forests
PCFGs

CS 498 JH: Introduction to NLP (Fall ʼ08)


2
CFGs and
AND-OR graphs
Context-free grammars
A CFG is a 4-tuple〈N,Σ,R,S〉
A set of nonterminals N
(e.g. N = {S, NP, VP, PP, Noun, Verb, ....})

A set of terminals Σ
(e.g. Σ = {I, you, he, eat, drink, sushi, ball, })

A set of rules R
R ⊆ {A → β with left-hand-side (LHS) A ∈ N
and right-hand-side (RHS) β ∈ (N ∪ Σ)* }
A start symbol S (sentence)

CS 598 JH: Advanced NLP (Springʼ09)


4
Some terminology

What is:
- the language defined by a CFG?
- a CFG derivation?
- a CFG parse tree?
- the yield of a CFG?

CS 598 JH: Advanced NLP (Springʼ09)


5
AND/OR graphs
Formal Definition (Mahanti/Bagchi, 1985)

An AND/OR graph G is a directed graph with a special node s (start/root


node) and a nonempty set {t1,…,tn} of terminal nodes.
The nonterminal nodes {n1...nm} are of two types: AND and OR
(The 3rd type (NONTERMINAL LEAF) is irrelevant for our purposes)

Semantics (Mahanti/Bagchi, 1985)

Start node = the problem to be solved


AND nonterminal node ni
= all its immediate descendants have to be solved
OR nonterminal node nj
= one of its immediate descendants has to be solved
Terminal nodes = their solutions are known

CS 598 JH: Advanced NLP (Springʼ09)


6
CFGs as AND-OR graphs

Terminal nodes of grammar G = Terminal nodes of graph G


Start symbol of G = Start node of G
Rules of G = AND nodes of G
Nonterminals of G = OR nodes of G
(B.Lang 1989/1991)
CS 598 JH: Advanced NLP (Springʼ09)
7
Ambiguity

Each individual parse tree is an AND graph


(B.Lang 1989/1991)

CS 598 JH: Advanced NLP (Springʼ09)


8
Context and subtree

We can split the parse tree τ at each nonterminal node ni


into a context and a subtree.

If yield(τ) = uvw, and yield(ni) = v, then


subtree(ni) = v, and context(ni) = uniw
(B.Lang 1989/1991)

CS 598 JH: Advanced NLP (Springʼ09)


9
Two kinds of ambiguity

(B.Lang 1989/1991)

Question:
Are there any other kinds of
ambiguity?
CS 598 JH: Advanced NLP (Springʼ09)
10
Shared parse forests

A compact representation of sentential ambiguity

CS 598 JH: Advanced NLP (Springʼ09)


11
Aside: Parse forests and
grammars
We can view the parse forest (AND/OR graph) of a sentence S
as a grammar GS , with L(GS) = {S}, except that now multiple
AND nodes may be labeled with the same rule:

CS 598 JH: Advanced NLP (Springʼ09)


12
The size of a parse forest
For a grammar with maximal branching factor p,
the size of a shared parse forest for a sentence S
with n words is O(np+1)
(B. Lang, 1991)

Hence, space complexity of CKY (with binary CFG) is O(n3)

CS 598 JH: Advanced NLP (Springʼ09)


13
Parsing as a deductive
process
(Shieber, Schabes, Pereira, 1993)
Parsing as deduction: what?
“Parsing can be viewed as a deductive process that seeks
to prove claims about the grammatical status of a string
from assumptions describing the grammaatical properties of
the stringʼs elements and the linear order between them”

(Shieber, Schabes, Pereira ʼ93)

Cf. categorial grammar (Lambek, Ajdukiewicz, Bar-Hillel)

CS 498 JH: Introduction to NLP (Fall ʼ08)


15
Parsing as deduction: why?
This allows a separation of parsing into:

- a logic of grammaticality claims (= the grammar)


- a proof search procedure (= the parsing algorithm)
(Shieber, Schabes, Pereira ʼ93)

Cf. categorial grammar (Lambek, Ajdukiewicz, Bar-Hillel)

This also provides the formal basis and useful terminology


for understanding parsing algorithms

CS 498 JH: Introduction to NLP (Fall ʼ08)


16
Deduction
A1 ...... Ak

An inference rule consists of


- antecedents A1….Ak
- a consequent B
NB: there may be side conditions on A1….Ak and B
Usually rules are given as schemata, where A1….Ak and B
are/contain variables that need to be instantiated.= when
the rule is used

CS 498 JH: Introduction to NLP (Fall ʼ08)


17
Derivation
The derivation of a formula B from assumptions A1...Am
is a sequence of formulas S1...Sn such that
- B=Sn
- for i <n: Si=Aj or there is an axiom that allows
Si to be derived from S1..n-1

If a derivation of B from A1...Am exists, we say that


A1...Am derives B:

A1...Am ⊢B

CS 498 JH: Introduction to NLP (Fall ʼ08)


18
Parsing as deduction
- Goal formula: the input string w=w1...wn is grammatical
according to the given grammar.
- Parsing = finding a derivation for a goal formula.

CS 498 JH: Introduction to NLP (Fall ʼ08)


19
PCFG parsing
Computing P(τ | S)
Using Bayesʼ Rule:
P (τ, S)
arg max P (τ |S) = arg max
τ τ P (S)
= arg max P (τ, S)
τ
= arg max P (τ ) if S = yield(τ )
τ

The yield of a tree is the string of terminal symbols


that can be read offCorrect
the leaf analysis
nodes

( )
VP
NP
PP
V NP P NP
yield eat sushi with tuna eat sushi
= eat sushiwith with
tuna tuna
VP
CS 598 JH: Advanced NLP (Springʼ09)
VP PP 21
Computing P(τ)
T is the (infinite) set of all trees in the language:
L = {s ∈ Σ | ∃τ ∈ T : yield(τ) = s}

Weed to define P(τ) such that:


∀τ ∈ T : 0 ≤ P(τ) ≤ 1
∑τ∈T P(τ) = 1
The set T is generated by a context-free grammar
S → NP VP VP → Verb NP NP → Det Noun
S → S conj S VP → VP PP NP → NP PP
S → ..... VP → ..... NP → .....

CS 598 JH: Advanced NLP (Springʼ09)


22
Probabilistic Context-Free Grammars
For every nonterminal X, define a probability distribution
P(X → α | X) over all rules with the same LHS symbol X:
S → NP VP 0.8
S → S conj S 0.2
NP → Noun 0.2
NP → Det Noun 0.4
NP → NP PP 0.2
NP → NP conj NP 0.2
VP → Verb 0.4
VP → Verb NP 0.3
VP → Verb NP NP 0.1
VP → VP PP 0.2
PP → P NP 1.0

CS 598 JH: Advanced NLP (Springʼ09)


23
Computing P(τ) with a PCFG
The probability of a tree τ is the product of the probabilities
of all its rules:
S → NP VP 0.8
S
S → S conj S 0.2
NP VP NP → Noun 0.2
Noun VP PP NP → Det Noun 0.4
John Verb NP P NP
NP → NP PP 0.2
NP → NP conj NP 0.2
eats Noun with Noun
VP → Verb 0.4
pie cream VP → Verb NP 0.3
VP → Verb NP NP 0.1
P(τ) = 0.8 ×0.3 ×0.2 ×1.0 ×0.23 VP → VP PP 0.2
PP → P NP 1.0
= 0.00384
CS 598 JH: Advanced NLP (Springʼ09)
24
PCFG parsing
Probabilistic CKY
Like standard CKY, but with probabilities.
Terminals have probability p=1
Non-terminals:
Associate P(X→ YZ | X) with every pair of backpointers
from X in cell[i][j] to Y in cell[i][k] and Z in cell[k+1][j]

Finding the most likely parse


Local greedy (Viterbi) search is guaranteed to be optimal:
For every non-terminal X in cell[i][j],
keep only the highest-scoring Y in cell[i][k] and Z in cell[k+1][j]

argmaxX,Y P(Y) × P(X) × P(X→ YZ | X)

CS 498 JH: Introduction to NLP (Fall ʼ08)


26
Probabilistic CKY
Input: POS-tagged sentence
John_N eats_V pie_N with_P cream_N

John eats pie with cream S → NP VP 0.8


S → S conj S 0.2
N
NP S S S John NP → Noun 0.2
0.2 0.8*0.2*0.4 0.8*0.2*0.08 0.2*0.0024*0.8
NP → Det Noun 0.4
VP VP VP NP → NP PP 0.2
V max( 0.008*0.2, eats
0.4 0.3*0.2
0.06*0.2*0.2) NP → NP conj NP 0.2
N
NP NP pie VP → Verb 0.4
0.2 0.2*0.2*0.2
VP → Verb NP 0.3
P PP with VP → Verb NP NP 0.1
1*0.2
VP → VP PP 0.2
N
NP
cream PP → P NP 1.0
0.2

CS 498 JH: Introduction to NLP (Fall ʼ08)


27
Inside/outside probabilities
w1 ... ... wi ... wn
w1

...
XP XP wj
...

w1….wi-1 wi……..wj wj+1….wn


...
Outside Probability of XP i..j : αij (XP )
wn
αij (XP ) = P (S ⇒ w1 ..wi−1 XP wj+1 ...wn )

Inside Probability of XP i..j : βij (XP )


βij (XP ) = P (XP ⇒∗ wi ...wj )

CS 498 JH: Introduction to NLP (Fall ʼ08)


28

Das könnte Ihnen auch gefallen