Beruflich Dokumente
Kultur Dokumente
Nicola Carmignani
carmigna@di.unipi.it
July 5, 2006
1 / 38
Outline
1 Introduction
2 Word Prediction
The Origin of Word Prediction
n-gram Models
3 Sentence Prediction
Statistical Approach
Information Retrieval Approach
4 Conclusions
2 / 38
Outline
1 Introduction
2 Word Prediction
The Origin of Word Prediction
n-gram Models
3 Sentence Prediction
Statistical Approach
Information Retrieval Approach
4 Conclusions
3 / 38
Introduction
4 / 38
Statistical NLP
5 / 38
Prediction::An Overview
6 / 38
Prediction::An Overview
6 / 38
Prediction::An Overview
6 / 38
Prediction::An Overview
6 / 38
Prediction::Why?
7 / 38
Outline
1 Introduction
2 Word Prediction
The Origin of Word Prediction
n-gram Models
3 Sentence Prediction
Statistical Approach
Information Retrieval Approach
4 Conclusions
8 / 38
Word Prediction::An Overview
9 / 38
Please, don’t confuse!
10 / 38
Word Prediction::The Origins
The word prediction task can be framed viewed as the statistical
formulation of the speech recognition problem.
P(A|W )P(W )
Ŵ = arg max
W P(A)
11 / 38
n-gram Models::Introduction
12 / 38
n-gram Models
Markov Assumption: only the prior local content (the last few
words) affects the next word.
13 / 38
n-gram Models
I n = 2 (bigram)
P(wi | w1 , . . . , wi−1 ) ≈ P(wi | wi−1 )
I n = 3 (trigram)
P(wi | w1 , . . . , wi−1 ) ≈ P(wi | wi−2 wi−1 )
14 / 38
n-gram word Models::Example
Example:
W = Last night I went to the concert
15 / 38
How to Estimate Probabilities
16 / 38
Problems with n-grams
The drawback of these methods is the amount of text needed to
train the model. Training corpus has to be large enough to
ensure that each valid word sequence appears a relevant number
of times.
A great amount of computational resources is needed especially
if the number of words in the lexicon is big.
For a vocabulary V of 20,000 words
I |V|2 = 400 million of bigrams;
I |V|3 = 8 trillion of trigrams;
I |V|4 = 1.6 × 1017 of four-grams.
18 / 38
n-gram Models for Inflected Languages
19 / 38
Hybrid Approach to Prediction
Two Markov models can be included: one for word classes (POS
tag unigrams, bigrams and trigrams) and one for words (word
unigrams and bigrams). A linear combination algorithm may
combine these two models.
20 / 38
Outline
1 Introduction
2 Word Prediction
The Origin of Word Prediction
n-gram Models
3 Sentence Prediction
Statistical Approach
Information Retrieval Approach
4 Conclusions
21 / 38
Sentence Prediction::An Overview
It’s now easy to presume what is Sentence Prediction. Now we
would like to predict how a user will continue a given initial
fragment of natural language text.
Some Applications:
I Korvemaker and Greiner have developed a system which
predicts whole command lines.
OS Search Engine
23 / 38
Sentence Prediction::Statistical Approach
24 / 38
[Eng et al., 2004]
25 / 38
[Eng et al., 2004]
I This algorithm starts with the most recently entered word (wt )
and moves iteratively looking for highest scored periods.
28 / 38
Sentence Prediction::IR Approach
29 / 38
[Grabski et al., 2004]
30 / 38
[Grabski et al., 2004]
31 / 38
[Grabski et al., 2004]
32 / 38
Outline
1 Introduction
2 Word Prediction
The Origin of Word Prediction
n-gram Models
3 Sentence Prediction
Statistical Approach
Information Retrieval Approach
4 Conclusions
33 / 38
Conclusions
34 / 38
It’s Worth Another Try!!!
THE
35 / 38
It’s Worth Another Try!!!
THE END
35 / 38
References I
36 / 38
References II
38 / 38