Sie sind auf Seite 1von 28

SYNTACTIC ANALYSIS –I

(FORMAL GRAMMARS)

Jasmeet Singh
Thapar University
INTRODUCTION
The word “syntax” in natural language, refers to the
grammatical arrangement of words in a sentence and
their relationship with each other.
The objective of syntactic analysis is to find the syntactic
structure of the sentence.
The syntactic structure is usually represented in the
form of tree whose nodes are the phrases and the leaves
corresponds to words of the languages.
The process of identifying the syntactic structure of the
sentence is called syntactic parsing or parsing.
Syntactic parsing can also be defined as the process of
assigning ‘phrase markers’ to a sentence.
Syntactic analysis or parsing is useful in determining
the meaning of sentence.
CONSTITUENCY
Constituency is an important aspect of natural language
useful for syntactic analysis.
There are certain words that go together with each other
more than with others.
In a language, words that usually group together to act as a
single unit are called constituents or phrases.
For instance, The bird, The beautiful garden, The Wimbledon
court are all noun phrases as they can all appear as same
syntactic context (subject or object of verb).
The constituents combine with other combine to form
sentence.
For instance, the noun phrase ‘The bird’ combine with the
verb phrase ‘flies’ to form the sentence ‘The bird flies’.
The ordering of words in a constituent and the
ordering of constituents is quite important.
CONTEXT- FREE GRAMMAR (CFG)
Context-free grammar is a widely used mathematical
system for modeling constituent structure in natural
language.
Context-free grammars are also called phrase
structure grammars and are first defined for natural
language by Chomsky (1957).
CFGs are first used for Algol programming language
by Backus (1959) and Naur (1960), so it is also
referred as Backus-Naur Form (BNF).
CONTEXT- FREE GRAMMAR (CFG)
A context-free grammar consists of
a set of rules or productions, each of which expresses the ways
that symbols of the language can be grouped and ordered
together,
a lexicon of words.
For example, the following productions express that a NP
(or noun phrase), can be composed of either a ProperNoun
or a determiner (Det) followed by a Nominal
NP → Det Nominal
NP → ProperNoun
Context-free rules can be hierarchically embedded, so we
can combine the previous rules with others like the
following which express facts about the lexicon:
Det → a
Det → the
Noun → flight
CONTEXT- FREE GRAMMAR (CFG)
The symbols that are used in a CFG are divided into two
classes:
1) The symbols that correspond to words in the language
(“the”, “bird”) are called terminal symbols; the lexicon is
the set of rules that introduce these terminal symbols.
2) The symbols that express clusters or generalizations of
these are called non-terminals.
In each context free rule, the item to the right of the arrow
(→) is an ordered list of one or more terminals and non-
terminals, while to the left of the arrow is a single non-
terminal symbol expressing some cluster or generalization.
Each grammar must have one designated start symbol
which is often called S.
Since context-free grammars are often used to define
sentences, S is usually interpreted as the “sentence” node.
EXAMPLE OF PRODUCTION RULES AND LEXICON
Lexicon Production Rules
Noun → flights | breeze | trip | S → NP VP (I + want a morning
flight)
morning | ...
NP → Pronoun (I)
Verb → is | prefer | like | need | want |
|Proper-Noun (Los Angeles)
fly
|Det Nominal (a + flight)
Adjective → cheapest | non−stop | o Nominal → Nominal Noun (morning flight)
first | latest | other | direct | ...
| Noun (flights)
Pronoun → me | I | you | it | ... VP → Verb (do)
Proper-Noun → Alaska | Baltimore | |Verb NP (want + a flight)
Los Angeles | Chicago | United | | Verb NP PP
American | ... (leave + Boston + in the morning)
Determiner → the | a | an | this | |Verb PP
these | that | ... (leaving + on Thursday)
Preposition → from | to | on | near | PP → Preposition NP
... (from + Los Angeles)
Conjunction → and | or | but | ...
EXAMPLE OF PARSE TREE USING CFG
The parse tree for “I prefer a morning flight” according to grammar
(defined in previous slide)

• Can be represented in a more compact way – bracketed:


[s [NP [Pro I]] [VP [V prefer][NP [Det a] [Nom [N morning] [Nom [N flight]]]]]]
FORMAL DEFINITION OF CFG
A context-free grammar G is defined by four parameters N,
Σ, P, S ( technically “is a 4-tuple”):
1) N a set of non-terminal symbols (or variables)
2) Σ a set of terminal symbols (disjoint from N)
3) A set of rules or productions, each of the form A →β
where A is a non-terminal, β is a string of symbols from
the infinite set of strings (Σ∪N)∗
4) A designated start symbol S
Capital letters like A, B, …..-Non-terminals
S-The start symbol
Lower-case Greek letters like α, β, and γ Strings drawn
from (Σ∪N)∗
Lower-case Roman letters like u, v, and w- Strings of
terminals
FORMAL LANGUAGE
Derivation:
Let α1,α2, ...,αm be strings in (Σ∪N)∗,m ≥ 1, such that
α1 ⇒α2,α2 ⇒α3,...,αm−1 ⇒αm
We say that α1 derives αm, or α1 ⇒∗ αm
The language LG generated by a grammar G as the set of strings
composed of terminal symbols which can be derived from the
designated start symbol S.
LG = {w|w is in Σ∗ and S ⇒∗ w}
Sentences (strings of words) that can be derived by a grammar are
in the formal language defined by that grammar, are called
grammatical sentences.
Sentences that cannot be derived by a given formal grammar are
not in the language defined by that grammar, and are referred to
as ungrammatical.
In linguistics, the use of formal languages to model natural
languages is called generative grammar, since the language is
defined by the set of possible sentences “generated” by the
grammar.
ENGLISH SENTENCE LEVEL CONSTRUCTIONS
A sentence in a language can have varying structure.
In English, the four commonly known structures of
sentences are:
1) Declarative Structure

2) Imperative Structure

3) Yes-No Question Structure

4) Wh-Question Structure
ENGLISH DECLARATIVE SENTENCE CONSTRUCTIONS

Sentences with a declarative structure have a subject


followed by a predicate.
The subject of a declarative sentence is a noun phrase
and the predicate is a verb phrase.
Examples of few declarative sentences include:
I like horse riding.
The flight should leave at around seven p.m.
I prefer a morning flight
The phrase structure rule for the imperative
sentences is:
S→→ NP VP
ENGLISH IMPERATIVE SENTENCE CONSTRUCTIONS

Sentences with imperative structure usually begins with a


verb phrase and do not have subject.
The subject of these types of sentence is implicit and is
understood by ‘you’.
These types of sentences are used for commands and
suggestions; hence they are called as imperative sentences.
Examples of imperative sentences include:
Show me the notebook.
Stop talking.
Move to the classroom.
The phrase structure rule for imperative sentences
is:
S → VP
ENGLISH YES-NO QUESTION SENTENCE CONSTRUCTIONS

Sentences with yes-no question structure are often


(though not always) used to ask questions (hence the
name).
These questions perform different pragmatic
functions such as asking, requesting, or suggesting.
Some examples of these type of sentences are:
Do you have NLP book?
Is the cricket match over?
Can you show me show your photograph?
These sentences begin with an auxiliary verb,
followed by a subject NP, followed by a VP.
S → Aux NP VP
ENGLISH WH- QUESTION SENTENCE CONSTRUCTIONS

Sentences with wh-question structure are more complex.


These are so named because one of their constituents is a wh-
phrase, that is, one that includes a wh-word (who, whose, when,
where, what, which, how, why).
These may be broadly grouped into two classes of sentence-level
structures: wh-subject-question, wh-non-subject question.
The wh-subject-question structure is identical to the
declarative structure, except that the first noun phrase contains
some wh-word.
Examples of wh-subject-question includes:
Which team won the match?
Which flights serve breakfast?
The phrase structure rule for wh-subject-question includes:
S→
→ Wh-NP VP
ENGLISH WH- QUESTION SENTENCE CONSTRUCTIONS
CONTD…

In the wh-non-subject question structure, the wh-phrase is not


the subject of the sentence, and so the sentence includes another
subject.
In these types of sentences the auxiliary appears before the
subject NP, just as in the yes-no-question structures.
Examples of these sentences include:
Which cameras can you show me in your shop?
What flights do you have from India to Washington?
The phrase structure of the wh-non-subject question is:
S → Wh-NP Aux NP VP
Constructions like the wh-non-subject-question contain what
are called long distance dependencies because the Wh-NP
what flights is far away from the predicate that it is semantically
related to, the main verb have in the VP.
PHRASE LEVEL CONSTRUCTIONS
A fundamental notion in natural language is that certain
groups of words behaves as constituents or phrases.
A phrase is named after their head, which is the lexical
category that determines the property of phrase.
For instance, if the head/ central word is noun, then it is
called noun phrase.
One of the simplest way to test whether a group of words
is a phrase or not is to see whether it can be substituted
with the other group of words without changing meaning
(Substitution Test).
Elements that can substitute each other in certain
syntactic position are said to be members of same
paradigm.
CONSTITUENT SUBSTITUTION TEST
THE NOUN PHRASE
Three of the most popular NPs:
• NP → Pronoun
• NP → Proper Noun
• NP → Det Nominal | Nominal → Nominal Noun | Nominal → Noun

NP → Modifiers HEAD-Noun Modifiers


• The determiner:
• The role can be filled by simple lexical determiners or by more
complex expressions( e.g. possessive expressions)
Examples 1: a flight | this flight | any flights | those flights | some flights
Examples 2: United’s flight |United’s pilot’s union | Denver’s mayor’s
mother’s canceled flight
Possessive expressions are defined by: Det → NP’s
• The nominal:
• Can be either a simple noun or a construction in which a noun
(Nominal → Noun) is in the center and it also have pre- and post-head
modifiers.
BEFORE THE HEAD NOUN
Several word classes may appear before the head: cardinal numbers,
ordinal numbers and quantifiers.
• Examples: cardinal numbers – two friends | one stop
• Examples: ordinal numbers – the first friend | the next stop | the other
flight
• Examples: quantifiers – many friends | several stops | few flights | much
noise
Adjectives occur after quantifiers but before nouns
Adjectives can also be grouped into an adjective phrase (AP).
• Adjectives can have an adverb before the adjective. Example: the least
expensive fare
• A rule which defines all prenominal modifiers:
NP → (Det) (Card) (Ord) (Quant) (AP) Nominal
AFTER THE HEAD NOUN 1/2
A head noun can be followed by three kinds of postmodifiers:
1. prepositional phrases – Example: all flights from Dubai
2. non-finite clauses – Example: any flights arriving after eleven a.m.
3. relative clauses – Example: a flight that serves breakfast
A nominal rule that accounts for PPs: Nominal → Nominal PP
Three most common kinds of non-finite postmodifiers:
a) Gerundive (-ing)
b) -ed
c) Infinite forms
• Gerundives consist of a VP that begins with the gerundive form of the
verb. Examples: any of those leaving on Thursday
• A nominal rule that accounts for gerundive modifiers: Nominal →
GerundVP and the rules for gerunds are: GerundVP → GerundV NP |
GerundV PP | GerundV | GerundV NP PP
AFTER THE HEAD NOUN 2/2
Postmodifiers based on –ed forms and infinities. Examples:
I need to have dinner served.
Which is the aircraft used by this flight ?
The last flight to arrive in Boston.
Postnominal relative clauses (a.k.a restrictive relative clauses) begin
with a relative pronoun (that or who). The relative pronoun functions
as the subject of the embedded verb.
• Examples:
A flight that serves breakfast
Flights that leave in the morning
The one that leaves at ten thirty five
To deal with relative clauses, we add the rules:
Nominal → Nominal RelClause
RelClause → (who | that) VP
PARSE TREE FOR “ALL THE MORNING FLIGHTS FROM DENVER TO
TAMPA LEAVING BEFORE 10”
VERB PHRASE
• VPs consist of a verb and a number of other constituents. These
constituents include NPs and PPs:
VP → Verb Example: disappear
VP → Verb NP Example: prefer a morning flight
VP → Verb NP PP Example: leave Boston in the morning
VP → Verb PP Example: leaving on Thursday
An entire embedded sentence may follow a verb. They are called
sentential complements. Examples:
You [VP [v said [S there were two flights that were the cheapest ]]]
[VP [v Tell ] [NP me ] [s how to get from the airport in Philadelphia to downtown]]
To deal with sentential complements we add the rule:
VP → Verb S
Another potential constituent of a VP is another VP. This happens for
some verbs, e.g. want, try, would like, intend, need. Examples:
I want [VP to fly from Dallas to Orlando ]
Hello, I’m trying [VP to find a flight from Dallas to Orlando ]
PREPOSITIONAL, ADJECTIVE &ADVERB PHRASES
Preposition Phrase (PP) contains a preposition followed by
other constituent usually noun phrase.
Examples: We played volleyball on the beach
John went outside
The phrase structure rule is PP→ → Prep (NP)
Adjective Phrase (AP) consists of adjective preceded by an
adverb followed by a PP.
Examples: Ashish is clever, The train is very late, My
sister is fond of animals
The phrase structure rule is AP→ → (Adv) Adj (PP)
An adverb phrase consists of an adverb, preeceded by a
degree of adverb
Example: Time passes very quickly.
The phrase structure rule is AdvP→ → (Intens) Adv
COORDINATION
• Phrases can be conjoined with conjunctions (e.g. and, or, but)
Forms of coordinations:
• For noun phrases NP → NP and NP
Please repeat [NP [NP the flights] and [NP the costs ]
• for nominals Nominal → Nominal and Nominal
Please repeat the [Nom [Nom flights] and [Nom costs ]
• For verb phrases VP → VP and VP
What flights do you have [VP [VP leaving Denver] and [Nom arriving in San
Francisco ]
• For S conjunction
[S [S I’m interested in a flight from Denver to Dallas] and [S I’m also interested
in going to San Francisco ]
• Coordination makes use of metarules: X → X and X
TREEBANKS
• Corpus in which each sentence is syntactically annotated with a
parse tree
• The Penn Treebank uses corpora in English from Brown,
Switchboard, ATIS and Wall Street Journal. There are treebanks for
Chinese and Arabic as well.
• Other Treebanks: Prague Dependency Treebank for Czech, Negra
Treebank for German, the Susanne Treebank for English.
Brown Corpus ATIS Corpus