Lecture 2

CS 598 JH: Advanced NLP (Spring ʼ09)
CFG parsing:
The basics
Julia Hockenmaier
juliahmr@illinois.edu
3324 Siebel Center
Office Hours: Fri, 2:00-3:00pm
http://www.cs.uiuc.edu/~juliahmr/cs598
Todayʼs topics
CFGs and PCFGs:
CFGs as AND/OR graphs
Shared parse forests
PCFGs
CS 498 JH: Introduction to NLP (Fall ʼ08)

2
CFGs and
AND-OR graphs
Context-free grammars
A CFG is a 4-tuple〈N,Σ,R,S〉
A set of nonterminals N
(e.g. N = {S, NP, VP, PP, Noun, Verb, ....})
A set of terminals Σ
(e.g. Σ = {I, you, he, eat, drink, sushi, ball, })
A set of rules R
R ⊆ {A → β with left-hand-side (LHS) A ∈ N
and right-hand-side (RHS) β ∈ (N ∪ Σ)* }
A start symbol S (sentence)
CS 598 JH: Advanced NLP (Springʼ09)

4
Some terminology
What is:
- the language defined by a CFG?
- a CFG derivation?
- a CFG parse tree?
- the yield of a CFG?

5
AND/OR graphs
Formal Definition (Mahanti/Bagchi, 1985)
An AND/OR graph G is a directed graph with a special node s (start/root

node) and a nonempty set {t1,…,tn} of terminal nodes.
The nonterminal nodes {n1...nm} are of two types: AND and OR
(The 3rd type (NONTERMINAL LEAF) is irrelevant for our purposes)
Semantics (Mahanti/Bagchi, 1985)
Start node = the problem to be solved

AND nonterminal node ni
= all its immediate descendants have to be solved
OR nonterminal node nj
= one of its immediate descendants has to be solved
Terminal nodes = their solutions are known

6
CFGs as AND-OR graphs
Terminal nodes of grammar G = Terminal nodes of graph G

Start symbol of G = Start node of G
Rules of G = AND nodes of G
Nonterminals of G = OR nodes of G
(B.Lang 1989/1991)
7
Ambiguity
Each individual parse tree is an AND graph

(B.Lang 1989/1991)

8
Context and subtree
We can split the parse tree τ at each nonterminal node ni

into a context and a subtree.
If yield(τ) = uvw, and yield(ni) = v, then

subtree(ni) = v, and context(ni) = uniw
(B.Lang 1989/1991)

9
Two kinds of ambiguity
(B.Lang 1989/1991)
Question:
Are there any other kinds of
ambiguity?
10
Shared parse forests
A compact representation of sentential ambiguity

11
Aside: Parse forests and
grammars
We can view the parse forest (AND/OR graph) of a sentence S
as a grammar GS , with L(GS) = {S}, except that now multiple
AND nodes may be labeled with the same rule:

12
The size of a parse forest
For a grammar with maximal branching factor p,
the size of a shared parse forest for a sentence S
with n words is O(np+1)
(B. Lang, 1991)
Hence, space complexity of CKY (with binary CFG) is O(n3)

13
Parsing as a deductive
process
(Shieber, Schabes, Pereira, 1993)
Parsing as deduction: what?
“Parsing can be viewed as a deductive process that seeks
to prove claims about the grammatical status of a string
from assumptions describing the grammaatical properties of
the stringʼs elements and the linear order between them”
(Shieber, Schabes, Pereira ʼ93)
Cf. categorial grammar (Lambek, Ajdukiewicz, Bar-Hillel)

15
Parsing as deduction: why?
This allows a separation of parsing into:
- a logic of grammaticality claims (= the grammar)

- a proof search procedure (= the parsing algorithm)
(Shieber, Schabes, Pereira ʼ93)
Cf. categorial grammar (Lambek, Ajdukiewicz, Bar-Hillel)
This also provides the formal basis and useful terminology

for understanding parsing algorithms

16
Deduction
A1 ...... Ak
An inference rule consists of

- antecedents A1….Ak
- a consequent B
NB: there may be side conditions on A1….Ak and B
Usually rules are given as schemata, where A1….Ak and B
are/contain variables that need to be instantiated.= when
the rule is used

17
Derivation
The derivation of a formula B from assumptions A1...Am
is a sequence of formulas S1...Sn such that
- B=Sn
- for i <n: Si=Aj or there is an axiom that allows
Si to be derived from S1..n-1
If a derivation of B from A1...Am exists, we say that

A1...Am derives B:
A1...Am ⊢B

18
Parsing as deduction
- Goal formula: the input string w=w1...wn is grammatical
according to the given grammar.
- Parsing = finding a derivation for a goal formula.

19
PCFG parsing
Computing P(τ | S)
Using Bayesʼ Rule:
P (τ, S)
arg max P (τ |S) = arg max
τ τ P (S)
= arg max P (τ, S)
τ
= arg max P (τ ) if S = yield(τ )
τ
The yield of a tree is the string of terminal symbols

that can be read offCorrect
the leaf analysis
nodes
( )
VP
NP
PP
V NP P NP
yield eat sushi with tuna eat sushi
= eat sushiwith with
tuna tuna
VP
VP PP 21
Computing P(τ)
T is the (infinite) set of all trees in the language:
L = {s ∈ Σ | ∃τ ∈ T : yield(τ) = s}
∗
Weed to define P(τ) such that:

∀τ ∈ T : 0 ≤ P(τ) ≤ 1
∑τ∈T P(τ) = 1
The set T is generated by a context-free grammar
S → NP VP VP → Verb NP NP → Det Noun
S → S conj S VP → VP PP NP → NP PP
S → ..... VP → ..... NP → .....

22
Probabilistic Context-Free Grammars
For every nonterminal X, define a probability distribution
P(X → α | X) over all rules with the same LHS symbol X:
S → NP VP 0.8
S → S conj S 0.2
NP → Noun 0.2
NP → Det Noun 0.4
NP → NP PP 0.2
NP → NP conj NP 0.2
VP → Verb 0.4
VP → Verb NP 0.3
VP → Verb NP NP 0.1
VP → VP PP 0.2
PP → P NP 1.0

23
Computing P(τ) with a PCFG
The probability of a tree τ is the product of the probabilities
of all its rules:
S → NP VP 0.8
S
S → S conj S 0.2
NP VP NP → Noun 0.2
Noun VP PP NP → Det Noun 0.4
John Verb NP P NP
NP → NP PP 0.2
NP → NP conj NP 0.2
eats Noun with Noun
VP → Verb 0.4
pie cream VP → Verb NP 0.3
VP → Verb NP NP 0.1
P(τ) = 0.8 ×0.3 ×0.2 ×1.0 ×0.23 VP → VP PP 0.2
PP → P NP 1.0
= 0.00384
24
PCFG parsing
Probabilistic CKY
Like standard CKY, but with probabilities.
Terminals have probability p=1
Non-terminals:
Associate P(X→ YZ | X) with every pair of backpointers
from X in cell[i][j] to Y in cell[i][k] and Z in cell[k+1][j]
Finding the most likely parse

Local greedy (Viterbi) search is guaranteed to be optimal:
For every non-terminal X in cell[i][j],
keep only the highest-scoring Y in cell[i][k] and Z in cell[k+1][j]
argmaxX,Y P(Y) × P(X) × P(X→ YZ | X)

26
Probabilistic CKY
Input: POS-tagged sentence
John_N eats_V pie_N with_P cream_N
John eats pie with cream S → NP VP 0.8

S → S conj S 0.2
N
NP S S S John NP → Noun 0.2
0.2 0.8*0.2*0.4 0.8*0.2*0.08 0.2*0.0024*0.8
NP → Det Noun 0.4
VP VP VP NP → NP PP 0.2
V max( 0.008*0.2, eats
0.4 0.3*0.2
0.06*0.2*0.2) NP → NP conj NP 0.2
N
NP NP pie VP → Verb 0.4
0.2 0.2*0.2*0.2
VP → Verb NP 0.3
P PP with VP → Verb NP NP 0.1
1*0.2
VP → VP PP 0.2
N
NP
cream PP → P NP 1.0
0.2

27
Inside/outside probabilities
w1 ... ... wi ... wn
w1
...
XP XP wj
...
w1….wi-1 wi……..wj wj+1….wn

...
Outside Probability of XP i..j : αij (XP )
wn
αij (XP ) = P (S ⇒ w1 ..wi−1 XP wj+1 ...wn )
∗
Inside Probability of XP i..j : βij (XP )

βij (XP ) = P (XP ⇒∗ wi ...wj )

28

Lecture 2

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Lecture 2

Hochgeladen von

Copyright:

Verfügbare Formate

CS 598 JH: Advanced NLP (Spring ʼ09)

CS 498 JH: Introduction to NLP (Fall ʼ08)

CS 598 JH: Advanced NLP (Springʼ09)

CS 598 JH: Advanced NLP (Springʼ09)

An AND/OR graph G is a directed graph with a special node s (start/root

Semantics (Mahanti/Bagchi, 1985)

Start node = the problem to be solved

CS 598 JH: Advanced NLP (Springʼ09)

Terminal nodes of grammar G = Terminal nodes of graph G

Each individual parse tree is an AND graph

CS 598 JH: Advanced NLP (Springʼ09)

We can split the parse tree τ at each nonterminal node ni

If yield(τ) = uvw, and yield(ni) = v, then

CS 598 JH: Advanced NLP (Springʼ09)

A compact representation of sentential ambiguity

CS 598 JH: Advanced NLP (Springʼ09)

CS 598 JH: Advanced NLP (Springʼ09)

Hence, space complexity of CKY (with binary CFG) is O(n3)

CS 598 JH: Advanced NLP (Springʼ09)

(Shieber, Schabes, Pereira ʼ93)

Cf. categorial grammar (Lambek, Ajdukiewicz, Bar-Hillel)

CS 498 JH: Introduction to NLP (Fall ʼ08)

- a logic of grammaticality claims (= the grammar)

Cf. categorial grammar (Lambek, Ajdukiewicz, Bar-Hillel)

This also provides the formal basis and useful terminology

CS 498 JH: Introduction to NLP (Fall ʼ08)

An inference rule consists of

CS 498 JH: Introduction to NLP (Fall ʼ08)

If a derivation of B from A1...Am exists, we say that

CS 498 JH: Introduction to NLP (Fall ʼ08)

CS 498 JH: Introduction to NLP (Fall ʼ08)

The yield of a tree is the string of terminal symbols

Weed to define P(τ) such that:

CS 598 JH: Advanced NLP (Springʼ09)

CS 598 JH: Advanced NLP (Springʼ09)

Finding the most likely parse

argmaxX,Y P(Y) × P(X) × P(X→ YZ | X)

CS 498 JH: Introduction to NLP (Fall ʼ08)

John eats pie with cream S → NP VP 0.8

CS 498 JH: Introduction to NLP (Fall ʼ08)

w1….wi-1 wi……..wj wj+1….wn

Inside Probability of XP i..j : βij (XP )

CS 498 JH: Introduction to NLP (Fall ʼ08)

Das könnte Ihnen auch gefallen