Beruflich Dokumente
Kultur Dokumente
USING PLSA
By
K.HARITHA -316126510149
P.ANUHYA - 316126510166
P.HARI TEJA - 316126510167
P.MADHU KUMAR - 316126510169
ABSTRACT
PROBLEM STATEMENT
INTRODUCTION
SAMPLE INPUT
SAMPLE OUTPUT
REFERENCES
ABSTRACT
The number of text documents are growing with the advent of the internet
and development of world wide web. The huge growth of text of text
documents are incredible to manually classify. In general statistical
approaches have been applied in single domain for text classification. These
approaches are based in the word occurrence i.e. frequency of one or more
words in a given document. But this approach doesn’t work well with multiple
domains so to achieving the goal one of the most important challenges is the
problem of learning topics is text documents that belong to different.
INTRODUCTION
PLSA does not need labelled information and thus does not considered
available prior knowledge of the domain. PLSA was resultant from the well
known latent semantic analysis(LSA) for text analysis. In this model each
document is considered as the rounded combination of several topics where
this topics are obtained using the maximum likelihood principle. It assigns
multiple topics to a single documents. Each document is assumed to be
generated from multiple topics.
CHALLENGES
Traditional statistical approaches have been applied in single domain for text
classification. These approaches are based on word occurrences. They require
label data in order to construct reliable and accurate classification model.
But label data are rarely available and getting is to expensive. Other
challenge in machine learning approaches is given a learning task for which
training data is not available. Most important problem is learning topics in
text documents that belongs to different domains
SAMPLE INPUT AND SAMPLE OUTPUT
SAMPLE INPUT
A dataset containing the list of documents, to be classified. The set of
documents to be classified is represented by D.
SAMPLE OUTPUT
Documents are categorized.
REFERENCES