Sie sind auf Seite 1von 21

Lexicografie computationala

Feb., 2012

Anca Dinu
University of Bucharest

Intoducere

Lexicologia computationala este utilizarea


calculatoarealor
in
studiul
lexiconului
(teoretic).
Lexicografia computationala inseamna
crearea de machine readable dictionaries
(MRD) (practic).
Se folosesc uneori ca sinonime.

Introducere

MRD sunt resurse esentiale pentru NLP


(Summarization,
question
answering,
inference, etc).
Importanta lor este si mai mare pentru limbile
cu o morfologie bogata. Au o componenta
generativa
care
construieste
formele
inflectionale pornind de la leme si reguli de
formare.

Directii in CL

Adnotare de corpus (de obicei in XML):


Markup Languages permit crearea de
corpusuri adnotate standardizat din care apoi
se pot extrage automat sau semi-automat
(Corpus Pattern Analysis) date pentru
crearea de lexicoane (structura argumentala,
roluri tematice, etc.)

Directii in CL

Creare Lexical Knowledge Bases (LKBs).


Contin aceleasi informatii ca un dictionar
printat, avand in plus informatii sintactice,
semantice si relationale (ontologii)
Ex: WordNet; FrameNet

WordNet

Nouns, verbs, adjectives and adverbs are


grouped into sets of cognitive synonyms
(synsets), each expressing a distinct
concept. Synsets are interlinked by means of
conceptual-semantic and lexical relations.
The resulting network of meaningfully related
words and concepts can be navigated with
the browser.

Annotation: principii generale

Annotation schemata should focus on a


single coherent theme:
Different linguistic phenomena should be
annotated separately over the same corpus
Annotations must be consistent with each
other:
Unification and merging of multiple
annotation is necessary (standard XML)

Examples of Semantic
Annotation
Predicators and their named arguments:
[The man]agent painted [the wall]patient.
Anaphors and their antecedents:
[The protein] inhibits growth in yeast. [It] blocks
production . . .
Acronyms and their long forms:
[Platelet-derived growth factor] (known as [pdgf])
impacts . . .
Semantic Typing of entities:
[The man]human fired [the gun]firearm.

Probleme cu LKB existente

Organizarea traditionala a lexicoanelor este


statica, i.e. presupune ca intelesul unui cuvant
pote fi definit exhaustiv printr-o enumerare a
sensurilor (tip lista).
In consecinta, cand o problema de interpretare a
limbajului natural se loveste de ambiguitate
lexicala, sistemul incearca sa selecteze cea mai
apropiata definitie din lista oferita de lexicon

Probleme cu LKB existente


2 dezavantaje:

Trebuie specificate a priori toate contextele


posibile in care poate aparea un cuvant; in
caz contrar, rezulta acoperire incompleta;
Nu se poate explica/prezice utilizarea
creativa a cuvintelor in contexte noi

Solutie : Generative Lexicon (GL)

James Pustejovsky: 1995 (cartea Generative


lexicon), 2001, 2005
De citit pt data viitoare articolul Type Theory and
Lexical Decomposition

Assumptions for GL

Language meaning is compositional.


Compositionality is a desirable property of a
semantic model.
Many linguistic phenomena appear noncompositional.
GL exploits richer representations and fixes
the holes in the compositionality model.

Exemple de fapte lingvistice care


par de natura non-compozitionala
intensionality (think),
binding (she),
quantification (most),
interrogatives (who),
focus (only), and
presuppositions (the king of France).

Compositionality

1.
2.

3.

The meaning of a complex expression is


determined by its structure and the meanings
of its constituents.
Questions . . .
What is the nature of the structure?
What is the meaning of a constituent?
What counts as a constituent?

Challenges to Simple
Compositionality

(1) a. Mary began [to eat her breakfast].


b. Mary began [eating her breakfast].
c. Mary began [her breakfast].
(2) a. Mary enjoyed her beer.
b. John enjoys his coffee in the morning.
c. Bill enjoyed the movie.

Challenges to Simple
Compositionality

(3)a. The woman baked a potato in the oven.


b. The woman baked a cake in the oven.

(4) a. John swept.


b. John swept the floor.
c. John swept the dirt into the corner.
d. John swept the dirt off the sidewalk.
e. John swept the floor clean.
f. John swept the dirt into a pile.

Challenges to Simple
Compositionality
shovel, rake, shave, weed.
(5) a. John whistled.
b. John whistled at the dog.
c. John whistled a tune.
d. John whistled a warning.
e. John whistled his appreciation.
f. John whistled to the dog to come.
yell, snap, whisper.

Challenges to Simple Compositionality

(6) Externally Caused Events: break, etc.


a. The vase broke.
b. Mary broke the vase.
c. The storm broke the window.
(7) Internally Caused Events (unacusatives): decay,
bloom, etc.
a. The flowers bloomed early.
b. *The gardener bloomed the flowers.

Compunere = aplicare de functii


1. What is the nature of the function?
2. What does it apply to; i.e., what can be an
argument?
1. John loves Mary.
2. love(Arg1,Arg2)
3. Apply love(Arg1,Arg2) to Mary
4. love(Arg1,Mary)
5. Apply love(Arg1,Mary) to John
6. love(John,Mary)

Lambda Calcul
(a) e is a type.
(b) t is a type.
(c) If a and b are types, then a -> b is a type.
A simple type tree:
t
e
e->t
Function Application: If is of type a, and is of
type a -> b, then () is of type b.

Lambda calcul
data viitoare

Das könnte Ihnen auch gefallen