Beruflich Dokumente
Kultur Dokumente
Alexandra BALAHUR
Andrs MONTOYO
Alicante, Spain
Alicante, Spain
abalahur@dlsi.ua.es
montoyo@dlsi.ua.es
Abstract:
Mining the web for customer opinion on different
products is both a useful, as well as challenging task.
Previous approaches to customer review classification
included document level, sentence and clause level
sentiment
analysis
and
feature
based
opinion
summarization. In this paper, we present a feature driven
opinion summarization method, where the term driven is
employed to describe the concept-to-detail (product class to
product-specific characteristics) approach we took. For
each product class we first automatically extract general
features (characteristics describing any product, such as
price, size, design), for each product we then extract
specific features (as picture resolution in the case of a
digital camera) and feature attributes (adjectives grading
the characteristics, as for example high or low for price,
small or big for size and modern or faddy for design).
Further on, we assign a polarity (positive or negative) to
each of the feature attributes using a previously annotated
corpus and Support Vector Machines Sequential Minimal
Optimization[1] machine learning with the Normalized
Google Distance[2]. We show how the method presented is
employed to build a feature-driven opinion summarization
system that is presently working in English and Spanish. In
order to detect the product category, we use a modified
system for person names classification. The raw review text
is split into sentences and depending on the product class
detected, only the phrases containing the specific product
features are selected for further processing. The phrases
extracted undergo a process of anaphora resolution, Named
Entity Recognition and syntactic parsing. Applying
syntactic dependency and part of speech patterns, we
extract pairs containing the feature and the polarity of the
feature attribute the customer associates to the feature in
the review. Eventually, we statistically summarize the
polarity of the opinions different customers expressed
about the product on the web as percentages of positive and
negative opinions about each of the product features. We
show the results and improvements over baseline, together
with a discussion on the strong and weak points of the
method and the directions for future work.
Keywords:
Opinion mining; summarization; Normalized Google
Distance; SVM machine learning
1.
Introduction
Background
Previous work in customer review classification
System Architecture
(small)1.52,1.87,0.82,1.75,1.92,1.93,positive
(little)1.44,1.84,0.80,1.64,2.11,1.85,positive
(big) 2.27, 1.19, 0.86, 1.55, 1.16, 1.77, negative
(bulky) 1.33, 1.17, 0.92, 1.13, 1.12,1.16, negative
The vector corresponding to the tiny attribute
feature is: (tiny) 1.51, 1.41, 0.82, 1.32, 1.60, 1.36.
This vector was classified by SVM as positive,
using the training set specified above. The precision
value in the classifications we made was between 0.72
and 0.80, with a kappa value above 0.45.
4.
5.
SA
FIP
FIR
Eng
Sp
Combined
Baseline
Eng
0.82
0.80
0.79
0.80 0.81
0.21
0.78 0.79
0.20
0.79 0.79
0.40
Table 1. System results
Baseline
Sp
0.19
0.20
0.40
Acknowledgements
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]