Sie sind auf Seite 1von 78

Natural Language Processing

(Lecture 1 Introduction)

Perpectivising NLP: Areas of AI and


their inter-dependencies
Search

Logic

Machine
Learning

NLP

Vision

Knowledge
Representation

Planning

Robotics

Expert
Systems

Books etc.
Main Text(s):
Foundations of Statistical NLP: Manning and Schutze
Natural Language Understanding: James Allan
Speech and NLP: Jurafsky and Martin

Journals
Computational Linguistics, Natural Language Engineering, AI, AI
Magazine, IEEE SMC

Conferences
ACL, EACL, COLING, MT Summit, EMNLP, IJCNLP, HLT,
ICON, SIGIR, WWW, ICML, ECML

Allied Disciplines
Philosophy

Semantics, Meaning of meaning, Logic


(syllogism)

Linguistics

Study of Syntax, Lexicon, Lexical Semantics etc.

Probability and Statistics

Corpus Linguistics, Testing of Hypotheses,


System Evaluation

Cognitive Science

Computational Models of Language Processing,


Language Acquisition

Psychology

Behavioristic insights into Language Processing,


Psychological Models

Brain Science

Language Processing Areas in Brain

Physics

Information Theory, Entropy, Random Fields

Computer Sc. & Engg.

Systems for NLP

Topics proposed to be covered

Shallow Processing

Part of Speech Tagging and Chunking using HMM, MEMM, CRF, and
Rule Based Systems
EM Algorithm

Language Modeling

N-grams
Probabilistic CFGs

Basic Speech Processing

Phonology and Phonetics


Statistical Approach
Automatic Speech Recognition and Speech Synthesis

Deep Parsing

Classical Approaches: Top-Down, Bottom-UP and Hybrid Methods


Chart Parsing, Earley Parsing
Statistical Approach: Probabilistic Parsing, Tree Bank Corpora

Topics proposed to be covered (contd.)

Knowledge Representation and NLP


Predicate Calculus, Semantic Net, Frames, Conceptual Dependency,
Universal Networking Language (UNL)
Lexical Semantics
Lexicons, Lexical Networks and Ontology
Word Sense Disambiguation
Applications
Machine Translation
IR
Summarization
Question Answering

What is NLP
Branch of AI
2 Goals
Science Goal: Understand the way
language operates
Engineering Goal: Build systems that
analyse and generate language; reduce
the man machine gap

The famous Turing Test: Language


Based Interaction

Test conductor

Machine

Human

Can the test conductor find out which is the machine and which
the human

Inspired Eliza
http://www.manifestation.com/neurot
oys/eliza.php3

Inspired Eliza

(another sample
interaction)

A Sample of Interaction:

Two Views of NLP and the


Associated Challenges
1. Classical View
2. Statistical/Machine
Learning View

Ambiguity: It is one of the challenging


problem
Stages of language processing
Phonetics and phonology
Morphology
Lexical Analysis
Syntactic Analysis
Semantic Analysis
Pragmatics
Discourse

Phonetics
It is concern with the processing of speech

Challenges
Homophones: bank (finance) vs. bank (river
bank)
Near Homophones: maatraa vs. maatra (hin)
Word Boundary
aajaayenge (aa jaayenge (will come) or aaj aayenge (will
come today)
I got [ua]plate
Phrase boundary
mtech1 students are especially urged to attend as such
seminars are integral to one's post-graduate education.
Disfluency: ah, um, ahem etc.(are used by speaker to get some
time to organize thoughts)

Morphology
It deals with word formation rules from root words.
Nouns: Plural (boy-boys); Gender marking (czar-czarina)
Verbs: Tense (stretch-stretched); Aspect (e.g. perfective sithad sat); Modality (e.g. request khaanaa khaaiie)
First crucial first step in NLP
Languages rich in morphology: e.g., Dravidian, Hungarian,
Turkish
Languages poor in morphology: Chinese, English
Languages with rich morphology have the advantage of easier
processing at higher stages of processing
A task of interest to computer science: Finite State Machines
for Word Morphology

Lexical Analysis
Essentially refers to dictionary access and
obtaining the properties of the word
e.g. dog
noun (lexical property)
take-s-in-plural (morph property)
animate (semantic property)
4-legged (-do-)
carnivore (-do)
Challenge: Lexical or word sense disambiguation
Question: why we need to store this information on to
the dictionary?
When we produce the dictionary entries in a lexicon one of our
main concern is how to embed richness in this data structure.
Property of any word is available in dictionary.
Example: How many years does a dog lives?

Lexical Disambiguation
First step: part of Speech Disambiguation
Dog as a noun (animal)
Dog as a verb (to pursue)
Ex: Where aver Ram went misfortune dogged him

Sense Disambiguation
Dog (as animal)
Dog (as a very detestable person)
Many meaning of the word come into play depending on the context. A
word typically follows one sense per discourse.
Ex: when we see dog in the text , in general you take dog as a animal.
Very rarely dog means a detestable person.

Needs word relationships in a context


The chair emphasised the need for adult education
Very common in day to day communications
Satellite Channel Ad: Watch what you want, when you
want (two senses of watch)
e.g., Ground breaking ceremony/research

Technological developments bring in new


terms, additional meanings/nuances for
existing terms

Justify as in justify the right margin (word


processing context)
Xeroxed: a new verb
Digital Trace: a new expression
Communifaking: pretending to talk on mobile when
you are actually not
Discomgooglation: anxiety/discomfort at not being
able to access internet
Helicopter Parenting: over parenting
Typo: Typing mistake.
Texto: Texting mistake.
Speako: Mistake while speaking.

Syntax Processing Stage


Structure Detection: The whole problem of
converting a sentences into a tree like structure is
called the problem of parsing or syntactic analysis.
S

VP

NP
V

NP

I
like

mangoes

Parsing Strategy
Driven by
grammar

S-> NP VP
NP-> N | PRON
VP-> V NP | V PP
N-> Mangoes
PRON-> I
V-> like

Challenges in Syntactic Processing:


Structural Ambiguity
Scope ambiguity
1.The old men and women were taken to safe locations
(old men and women) vs. ((old men) and women)
2. No smoking areas will allow Hookas inside
Some, No, All are Quantifiers.
Processing of Quantifies is important in NLP otherwise we leads to ambiguity.
No (smoking areas) will (allow Hookas inside): Hookas allowed in smoking areas.
(No smoking areas ) will (allow Hookas inside): Hookas allowed in no smoking
areas.

Preposition Phrase Attachment

I saw the boy with a telescope


(who has the telescope?)
I saw the mountain with a telescope
(world knowledge: mountain cannot be an instrument of seeing)
I saw the boy with the pony-tail
(world knowledge: pony-tail cannot be an instrument of seeing)
Very ubiquitous: newspaper headline 20 years later, BMC pays father 20
lakhs for causing sons death

Structural Ambiguity
Overheard
I did not know my PDA had a phone for 3 months

An actual sentence in the newspaper


The camera man shot the man with the gun when he was near
Tendulkar
Here shot have two meanings: shot with the Camera or shot with the
gun.
With the gun also have two meanings: shot with the gun or man with the
gun
When he was near Tendulkar also have two meaning: here he refers 2
possibilities: Man is near Tendulkar or Camera man is near Tendulkar.

So total 8 different meaning of this particular


sentence.
(P.G. Wodehouse, Ring in Jeeves) Jill had rubbed ointment on
Mike the Irish Terrier, taken a look at the goldfish belonging to
the cook, which had caused anxiety in the kitchen by refusing
its ants eggs
(Times of India, 26/2/08) Aid for kins of cops killed in terrorist
attacks

Semantic Analysis
Representation in terms of
Predicate calculus/Semantic
Nets/Frames/Conceptual Dependencies and
Scripts
John gave a book to Mary
Give action: Agent: John, Object: Book,
Recipient: Mary
Challenge: ambiguity in semantic role labeling
(Eng) Visiting aunts can be a nuisance
(Hin) aapko mujhe mithaai khilaanii padegii
(ambiguous in Marathi and Bengali too; not in
Dravidian languages)

Pragmatics
Very hard problem
Model user intention
Tourist (in a hurry, checking out of the hotel,
motioning to the service boy): Boy, go
upstairs and see if my sandals are under the
divan. Do not be late. I just have 15 minutes
to catch the train.
Boy (running upstairs and coming back
panting): yes sir, they are there.

Discourse
Processing of sequence of sentences
Mother to John:
John go to school. It is open today. Should you
bunk? Father will be very angry.
Ambiguity of open
bunk what?
Why will the father be angry?
Complex chain of reasoning and application of
world knowledge
Ambiguity of father
father as parent
or
father as headmaster

Complexity of Connected
Text
John was returning from school
dejected today was the math
test
He couldnt control the class
Teacher shouldnt have made him
responsible
After all he is just a gatekeeper

A look at Textual Humour


1. Teacher (angrily): did you miss the class yesterday?
Student: not much
2. A man coming back to his parked car sees the
sticker "Parking fine". He goes and thanks the
policeman for appreciating his parking skill.
3. Ram: I got a Jaguar car for my unemployed
youngest son.
Shyam: That's a great exchange!
4. Shane Warne should bowl maiden overs, instead of
bowling maidens over

Giving a flavour of what is


done in NLP: Structure
Disambiguation

Structure Disambiguation is as
critical as Sense Disambiguation
Scope (portion of text in the scope of a
modifier)
Old men and women will be taken to safe
locations
No smoking areas allow hookas inside
Clause
I told the child that I liked that he came to
the game on time
Preposition
I saw the boy with a telescope

Structure Disambiguation is as
critical as Sense Disambiguation
(contd.)
Semantic role

Visiting aunts can be a nuisance


Mujhe aapko mithaai khilaani padegii (I have to give you
sweets or You have to give me sweets)

Postposition

unhone teji se bhaaagte hue chor ko pakad liyaa (he


caught the thief that was running fast or he ran fast and
caught the thief)

All these ambiguities lead to the


construction of multiple parse trees for
each sentence and need semantic,
pragmatic and discourse cues for
disambiguation

Higher level knowledge


needed for disambiguation
Semantics
I saw the boy with a pony tail (pony tail
cannot be an instrument of seeing)
Pragmatics
((old men) and women) as opposed to (old
men and women) in Old men and women
were taken to safe location, since womenboth and young and old- were very likely
taken to safe locations
Discourse:
No smoking areas allow hookas inside,
except the one in Hotel Grand.
No smoking areas allow hookas inside, but
not cigars.

Preposition Attachment
Disambiguation

Problem definition
4-tuples of the form V N1 P N2
saw (V) boys (N1) with (P) telescopes
(N2)

Attachment choice is between the


matrix verb V and the object noun
N1

Lexical Association Table (Hindle and


Rooth, 1991 and 1993)
From a large corpus of parsed text
first find all noun phrase heads
then record the verb (if any) that precedes
the head
and the preposition (if any) that follows it
as well as some other syntactic
information about the sentence.

Extract attachment information from


this table of co-occurrences

Example: lexical
association
A table entry is considered a definite instance of the
prepositional phrase attaching to the verb if:
the verb definitely licenses the prepositional phrase
E.g. from Propbank,
absolve
frames
absolve.XX: NP-ARG0 NP-ARG2-of obj-ARG1 1
absolve.XX NP-ARG0 NP-ARG2-of obj-ARG1
On Friday , the firms filed a suit *ICH*-1 against West
Virginia in New York state court asking for [ ARG0 a
declaratory judgment] [rel absolving] [ARG1 them] of
[ARG2-of liability] .

Core steps
Seven different procedures for
deciding whether a table entry is an
instance of no attachment, sure
noun attach, sure verb attach, or
ambiguous attach
able to extract frequency
information, counting the number of
times a particular verb or noun
attaches with a particular preposition

Core steps (contd.)


These frequencies serve as the training
data for the statistical model used to
predict correct attachment
To disambiguate a sentence, compute
the likelihood of the particular
preposition given the particular verb
and contrast with the likelihood of the
preposition given the particular noun
i.e., compare P(with|saw) with P(with|telescope)
as in I saw the boy with a telescope

Critique
Limited by the number of
relationships in the training corpora
Too large a parameter space
Model acquired during training is
represented in a huge table of
probabilities, precluding any
straightforward analysis of its
workings

Approach based on Transformation Based


Error Driven Learning,
Brill and Resnick, COLING 1994

Example Transformations

Initial attachments by default


are to N1 predominantly.

Transformation rules with


word classes
Wordnet synsets
and
Semantic classes
used

Accuracy values of the transformation


based approach: 12000 training and 500
test examples
Method

Accuracy

Hindle and
Rooth
(baseline)
Transformatio
ns
Transformatio
ns
(word classes)

70.4 to
75.8%

#of
transforma
tion rules
NA

79.2

418

81.8

266

Maximum Entropy Based


Approach: (Ratnaparki, Reyner,
Roukos, 1994)
Use more features than (V N1)
bigram and (N1 P) bigram
Apply Maximum Entropy Principle

Core formulation
We denote
the partially parsed verb phrase, i.e.,
the verb phrase without the
attachment decision, as a history h,
and
the conditional probability of an
attachment as P(d|h),
where d and corresponds to a noun or
verb attachment- 0 or 1- respectively.

Maximize the training data


log likelihood
--(1)

--(2)

Equating the model expected


parameters and training data
parameters

--(3)

--(4)

Features
Two types of binary-valued
questions:
Questions about the presence of
any n-gram of the four head words,
e.g., a bigram maybe V == is, P
== of
Features comprised solely of
questions on words are denoted as
word features

Features (contd.)
Questions that involve the class
membership of a head word
Binary hierarchy of classes derived
by mutual information

Features (contd.)
Given a binary class hierarchy,
we can associate a bit string with every word in
the vocabulary
Then, by querying the value of certain bit
positions we can construct
binary questions.

Features comprised solely of questions


about class bits are denoted as class
features, and features containing
questions about both class bits and words
are denoted as mixed features.

Word classes (Brown et. al.


1992)

Experimental data size

Performance of ME Model on Test


Events

Examples of Features Chosen for Wall


St. Journal Data

Average Performance of Human & ME Model on


300 Events of WSJ Data

Human and ME model performance


on consensus set for WSJ

Average Performance of Human & ME


Model on
200 Events of Computer Manuals Data

Back-off model based


approach (Collins and
Brooks, 1995)
NP-attach:
(joined ((the board) (as a non executive
director)))
VP-attach:
((joined (the board)) (as a non executive
director))
Correspondingly,
NP-attach:
1 joined board as director
VP-attach:
0 joined board as director
Quintuple of (attachment: A: 0/1, V, N1, P, N2)
5 random variables

Probabilistic formulation
Or briefly,

If

Then the attachment is to the noun, else to the verb

Maximum Likelihood
estimate

The Back-off estimate


o Inspired by speech recognition
o Prediction of the Nth word from previous (N-1) words

Data sparsity problem


f(w1, w2, w3,wn) will frequently be 0
for large values on n

Back-off estimate contd.

The cut off frequencies (c1, c2 ....) are thresholds determining whether to back-off or not at each levelcounts lower than ci at stage i are deemed to be too low to give an accurate estimate, so in this case
backing-off continues.

Back off for PPT


attachment

Note: the back off tuples always retain the preposition

The backoff algorithm

Lower and upper bounds


on performance

Lower bound
(most frequent)

Upper bound
(human experts
Looking at 4 word
only)

Results

Comparison with other


systems
Maxent,
Ratnaparkhi et. al.

Transformation
Learning,
Brill et. al.

Flexible Unsupervised PP Attachment using WSD


and Data Sparsity Reduction: (Medimi Srinivas
and Pushpak Bhattacharyya, IJCAI 2007)
Unsupervised approach (some way similar to
Ratnaparkhi 1998): The training data is
extracted from raw text
The unambiguous training data of the form V-PN and N1-P-N2 TEACH the system how to
resolve PP-attachment in ambiguous test data
V-N1-P-N2
Refinement of extracted training data. And use
of N2 in PP-attachment resolution process.

Flexible Unsupervised PP Attachment using WSD


and Data Sparsity Reduction: (Medimi Srinivas
and Pushpak Bhattacharyya, IJCAI 2007)
PP-attachment is determined by the semantic
property of lexical items in the context of
preposition using WordNet
An Iterative Graph based unsupervised
approach is used for Word Sense
disambiguation (Similar to Mihalcea 2005)
Use of a Data sparseness Reduction (DSR)
Process which uses lemmatization, Synset
replacement and a form of inferencing. DSRP
uses WordNet.
Flexible use of WSD and DSR processes for PPAttachment

Graph based disambiguation: page rank based algorithm,


Mihalcea 2005

Experimental setup
Training Data:
Brown corpus (raw text). Corpus size is 6 MB, consists of
51763 sentences, nearly 1 million 27 thousand words.
Most frequent Prepositions in the syntactic context N1-P-N2:
of, in, for, to, with, on, at, from, by
Most frequent Prepositions in the syntactic context V-P-N: in,
to, by, with, on, for, from, at, of
The Extracted unambiguous N1-P-N2: 54030 and V-P-N:
22362

Test Data:
Penn Treebank Wall Street Journal (WSJ) data extracted by
Ratnaparkhi
It consists of V-N1-P-N2 tuples: 20801(training),
4039(development) and 3097(Test)

Experimental setup contd.


BaseLine:
The unsupervised approach by Ratnaparkhi, 1998
(Base-RP).
Preprocessing:
Upper case to lower case
Any four digit number less than 2100 as a year
Any other number or % signs are converted to
num
Experiments are performed using DSRP: with
different stages of DSRP
Experiments are performed using GuWSD and
DSRP: with different senses

The process of extracting training


data: Data Sparsity Reduction
Tools/process
Raw Text
POS Tagger

Chunker

Extraction
Heuristics

Output
The professional conduct of the doctors is guided by Indian
Medical Association.
The_DT professional_JJ conduct_NN of_IN the_DT
doctors_NNS is_VBZ guided_VBN by_ IN Indian_NNP
Medical_NNP Association_NNP._.
[The_DT professional_JJ conduct_NN ] of_IN [the_DT
doctors_NNS ] (is_VBZ guided_VBN) by_IN [Indian_NNP
Medical_NNP Association_NNP].
After replacing each chunk by its head word it results
in:
conduct_NN of_IN doctors_NNS guided_VBN by_IN
Association_NNP
N1PN2: conduct of doctors and VPN: guided by Association

Morphing

N1PN2: conduct of doctor and VPN: guide by association

DSRP (Synset
Replacement)

N1PN2: {conduct, behavior} of {doctor, physician} can


result in 4 combination with the same sense and similarly
for VPN: {guide, direct} by {association} can result in 2

Data Sparsity Reduction:


Inferencing
If
V1-P-N1 and V2-P-N1 exist as also do V1P- N2 and V2-P-N2, then
if
V3-P-Ni exist (i=1,2), then
we can infer the existence of V3-PNJ
(i j) with a frequency count of V3P-Ni that can be added to the corpus.

Example of DSR by
inferencing
V1-P-N1: play in garden and V2-P-N1:
sit in garden
V1-P-N2: play in house and V2-P-N2:
sit in house
V3-P-N2: jump in house exists
Infer the existence of V3-P-N1:
jump in garden

Results

Effect of various processes on


FlexPPAttach algorithm

Precision vs. various


processes

Das könnte Ihnen auch gefallen