Beruflich Dokumente
Kultur Dokumente
Notation
There are d words in the dictionary. d is large. (Think 20000.) There are
k topics. (Think k 100.) Each topic is a d vector with non-negative
components summing to 1. The i th component is the probability that a
random word in a document (purely) on that topic is word i. We let M be
the d k matrix with one column for each topic vector.
0.2
The Model
for i = 1, 2, . . . , d.
Question What can you say about the dot product of two document
vectors if they are on different topics ? First think of the = 0 case, then
small .
Question Is the above a give-away ? I.e., can you solve the inference
problem just based on this?
Hint What can you say about the dot product of two document vectors
on the same topic. (even when = 0). Think of the case when components
of the topic vector are smaller than 1/m, so a single word is unlikely to occur
in a document.
0.3
The Solution
B1 0 . . .
0 B ...
2
A=
0
0 ...
0 0 ...
is a block matrix:
0
0
0
0
.
... ...
0 Bk
Theorem Making the Primary Words and Pure Topics Assumptions, the
top k singular vectors of A are close to the indicator vectors of k clusters of
documents, proivded m is large enough.
[The clusters are : Cluster l consists of documents with topic l. Indicator
vector of a cluster is the vector of all 1s on the cluster and 0 elsewhere,
normalized to length 1.]
Idea of the Proof First for the case = 0. Notation: nl is the number
of documents on topic l and dl is the number of primary words of topic l.
E(Bl ) = M,l 1T .
1 (E(Bl )) = |M,l | nl nl p,
(1)
nl p
1 ((Bl E(Bl ))) Max length of any column+ nl Max S.D. of any entry 1+ .
m
2
We see that this quantity is much smaller than 1 (E(Bl )) for m large enough.
Now, assume that |M,1 |, |M,2 |, . . . , |M,k | are all distinct, so that 1 (Bl )
are all distinct. Also assume that |M,1 | > |M,2 | > > |M,k |.
We claim that the top singular vector of A will be close to the indicator
vector of the first cluster. First, prove that it does not have any component
on clusters other than the first. Then, suppose it has a component perpendicular to the indicator vector on the first cluster. The contribution of this
compoenent will be at most like 1 (Bl E(Bl )) << 1 (Bl )....
Ref: Latent Semantic Indexing by Papadimitriou, Raghavan, Tamaki,
Vempala.