Beruflich Dokumente
Kultur Dokumente
5241
5242
represent the domain-specific (Vds) and domainindependent (Vdi) words. The edge E connects two
vertices in Vds and Vdi. Each edge is associated with
weight mij. The score of mij measures the relationship
between words in source and target domain.
Besides using the co-occurrence frequency of words
within documents, they also adopt more meaningful
methods to estimate mij. They define a reasonable
window size. If a domain-specific word and a domainindependent word co-occur within the window size, then
there is an edge connecting them. Furthermore, they also
use the distance between wi and wj to adjust the score of
mij. The weight it assigns is larger to the corresponding
edge if their distance is small.
C. Spectral Feature Clustering
Based on the graph spectral theory [9], they assume if two
domain-specific features are connected to many common
domain-independent features, then they tend to be very
related and will be aligned to a same cluster with high
probability, and if two domain-independent features are
connected to many common domain-specific features, then
they tend to be very related and will be aligned to a same
cluster with high probability.
Given the feature bipartite graph G, the goal is to learn a
feature alignment mapping function: Rm-l -> Rk, where m is
the number of all features, l is number of domainindependent features and m - l is the number of domainspecific features.
1.
Form a weight matrix M belong to R(m-1)*l, where
Mij corresponds to the co-occurrence relationship between
a domain-specific word wi belong to WDS and a domainindependent word wj belong to WDI .
2.
Form an affinity matrix A of the bipartite graph,
where the first m - l rows and columns correspond to the m
- l domain-specific features, and the last l rows and
columns correspond to the l domain-independent features.
3.
Form a diagonal matrix D, where Dii = Aij, and
construct the matrix L = D-1/2 AD-1/2.
4.
Find the k largest eigenvectors of L, u1, u2, ..., uk,
and form the matrix U = [u1u2..uk] belong to Rm*k.
5.
Define the feature alignment mapping function.
Though the goal is only to cluster domain-specific
features, it is proved that clustering two related sets of
points simultaneously can often get better results than only
clustering one single set of points [19].
www.ijarcce.com
5243
V. CONCLUSION
In this paper, we present a short survey on various
techniques used for cross-domain sentiment analysis. We
2.
By using
and
, calculate (Di-words)- have mainly discussed three techniques viz sentiment
(DS-word) co-occurrence matrix
sensitive thesaurus, spectral feature alignment, structural
correspondence learning. All these three techniques are
-1/2
-1/2
3.
Construct matrix L = D AD , Where
different from one other in the way of expanding the
feature vector, measuring the relatedness among the
words, and finally the classier used for classification.
4.
Find the K largest eigenvectors of L, u1, u2, ....,
Some methods used for performing cross-domain
uk, and form the matrix U = [u1 u2 ... uk] belongs to R
classification uses labeled or unlabeled data or some uses
m*k
.
both. Hence the technique used gives different result for
Return a classifier f, trained on
different domain as well as purposes. We discussed pros
and cons of each methods used and the challenges faced
for cross-domain sentiment analysis.
D. Feature Augmentation
A tradeoff parameter is used in this feature augmentation
ACKNOWLEDGMENT
to balance the effect of original features and new features. I would like to thanks my respected project guide Prof.
Smita Nirkhi who has put her careful guidance and timely
IV. STRUCTURAL CORRESPONDENCE
advice which inspired me towards sustained effort for my
LEARNING
concerned work.
SCL-MI. This is the structural correspondence learning Special thanks to Prof. Wanjari Co-ordinator M.tech CSE
(SCL) method proposed by Blitzer et al. [3]. In this for his timely guidance.
method they utilizes both labeled and unlabeled data in the One more personality, who is ever ready to listen to the
benchmark data set. It selects pivots using the mutual difficulties of students and is always on his toes to help
information between a feature (unigrams or bigrams) and them out, I would like to thank our Head of Department
the domain label. Next, linear classifiers are learned to Dr. M. B. Chandak and entire Computer Science and
predict the existence of those pivots. The learned weight Engineering Department of RCOEM for their assistance.
vectors are arranged as rows in a matrix and singular value
decomposition (SVD) is performed to reduce the
REFERENCES
dimensionality of this matrix. Finally, this lower
dimensional matrix is used to project features to train a [1] Danushka Bollegala, David Weir, and John Carroll, Cross-Domain
Sentiment Classification Using a Sentiment Sensitive Thesaurus,
binary sentiment classifier.
IEEE transactions on knowledge and data engineering, VOL. 25,
A. Algorithm
NO. 8, August 2013.
In SCL, it uses labeled data from source domain [2] Sinno Jialin Pan, Xiaochuan Niz, Jian-Tao Sunz, Qiang Yangy,
Zheng Chen, Cross-Domain Sentiment Classification viaSpectral
and unlabeled data from source and target domain, then it
Feature Alignment, 19th Intl Conf. World Wide Web (WWW10).
chooses m pivot features which frequently occurs in both
[3] J. Blitzer, M. Dredze, F. Pereira, Domain Adaptation for
domains. It will then find some correlation between the
Sentiment Classification, 45th Annv. Meeting of the Assoc.
pivot features and other features by training linear
Computational Linguistics (ACL07).
predictor. It will predict occurrences of each pivot in the [4] Saeideh Shahheidari, Hai Dong, Md Nor Ridzuan Bin Daud,
Twitter sentiment mining: A multi domain analysis, 7th
unlabeled data from source and target domains. It uses a
International Conference on Complex, Intelligent, and Software
weight vector wl, in which positive entries mean that a
Intensive Systems.
non-pivot features is highly correlated with the [5] Ms Kranti Ghag and Dr. Ketan Shah, Comparative Analysis of the
Techniques for Sentiment Analysis, ICATE 2013
corresponding pivot. It arrange weight vector into a matrix
W. At training and test time, they observe a feature vector [6] B. Pang and L. Lee, Opinion Mining and Sentiment Analysis,
Foundations and Trends in Information Retrieval, vol. 2, nos. 1/2,
x, then apply the projection Xx to obtain k new real-valued
pp. 1-135, 2008.
features. Now it learns a predictor for the augmented [7] B. Pang, L. Lee, and S. Vaithyanathan, Thumbs Up? Sentiment
Classification Using Machine Learning Techniques, Proc. ACL-02
instance (x, Xx). If X contains meaningful correspondence,
Conf. Empirical Methods in Natural Language Processing (EMNLP
then the predictor which uses X will perform well in both
02),pp. 79-86, 2002.
domains.
[8] M. Hu and B. Liu, Mining and Summarizing Customer Reviews,
5244
www.ijarcce.com
5245