Sie sind auf Seite 1von 34

Opinion Mining

Dr. Alaa El-Halees Faculty of Information Technology Islamic University of Gaza Seminar 9/9/2008

Outline

Definition Applications Challenges Model Arabic Conclusion

Definition

Opinion mining (sentiment mining, opinion/sentiment extraction) is the area of research that attempts to make automatic systems to determine human opinion from text written in natural language. It seeks to identify the view point (s) underlying a text span; an example application is classifying a movie review as thumbs up or thumbs down.

Definition

Consider, for instance, the following scenario. A major computer manufacturer, disappointed with unexpectedly low sales, finds itself confronted with this question: Why aren't consumers buying our laptop?

While concrete data such as the laptop's weight or the price of a competitor's model are obviously relevant, answering this question requires focusing more on people's personal views of such objective characteristics. Moreover, subjective judgments regarding intangible qualities --- e.g., "the design is tacky" or "customer service was condescending" --- or even misperceptions --- "updated device drivers aren't available" --must be taken into account as well.

Definition

What other people think has always been an important piece of information for most of us during the decision-making process. Opinion mining draws on computational linguistic, information retrieval, text mining, natural language processing, machine learning, statistics and predictive analysis

Definition
Two main types of textual information. Facts and Opinions Most current information processing technique (e.g., search engines) work with facts (assume they are true) Facts can be expressed with topic keywords

Definition
In real life, facts are important, but opinion also plays a crucial role. A computer manufacturer, disappointed with low sales, asks itself: Why arent consumers buying our laptop? The Democratic National Committee, disappointed with the last election, wants to know on an on-going basis: What is the reaction in the press, newsgroups, chat rooms, and blogs to Bushs latest policy decision?

Definition
The main advantage is the speed On average, humans process six articles per hour against the machines throughput of 10 per second

Applications

Applications as a Sub-Component Technology: recommendation systems Summarization Question Answering: Q: What is the international reaction to the reelection of Robert Mugabe as President of Zimbabwe? A: African observers generally approved of his victory while Western Governments denounced it.

Applications

Applications in Business

marketing intelligence, product and service benchmarking and improvement. To understand the voice of the customer as expressed in everyday communications

Applications

Politics

As is well known, opinions matter a great deal in politics. Some work has focused on understanding what voters are thinking

Applications

Blog analysis

Perform subjectivity and polarity classification on blog posts Discover irregularities in temporal mood patterns (fear, excitement, etc) appearing in a large corpus of blogs Use link polarity information to model trust and influence in the blogosphere Analyze Blog sentiments about movies and correlate it with its sales

Applications

Human Computer Interaction

Affect sensing Human Robot Interaction

Challenges

Determine whether a document or portion (e.g. paragraph or statement) is subjective. Example: the battery lasts 2 hours vs. the battery only lasts 2 hours

Challenges
The difficulty lies in the richness of human language use. Example: 1. This is a great camera. 2. A great amount of money was spent for promoting this camera. 3. One might think this is a great camera. Well think again, because..... a single keyword can be used to convey three different opinions, +ve, neutral and -ve respectively.

Challenges

In order to arrive at sensible conclusions, sentiment analysis has to understand context. For example, fighting and disease is negative in a war context but positive in a medical one. Different mining for different domains.

Challenges

Human agreed in of the same document. 82% chance of two or more human analysts agreeing with each other.

sentiment analysis model

Data Preparation

The data preparation step performs necessary data preprocessing and cleaning on the dataset for the subsequent analysis. Some commonly used preprocessing steps include removing non-textual contents and markup tags (for HTML pages), and removing information about the reviews that are not required for sentiment analysis, such as review dates and reviewers names. Balance training datasets distributions.

Review Analysis

The review analysis step analyzes the linguistic features of reviews so that interesting information, including opinions and/or product features, can be identified. This step often applies various computational linguistics tasks to reviews first, and then extracts opinions and product features from the processed reviews. Two commonly adopted tasks for review analysis are POS tagging and negation tagging.

Sentiment Classification

There are two main techniques for sentiment classification: The symbolic technique uses manually crafted rules and lexicons, The machine learning approach uses unsupervised, or supervised learning to construct a model from a large training corpus.

What?

Find relevant words, phrases, patterns that can be used to express subjectivity Determine the polarity of subjective expressions

Words

Adjectives positive: honest important mature large patient Ron Paul is the only honest man in Washington.

Kitchells writing is unbelievably mature and is only likely to get better. To humour me my patient father agrees yet again to my choice of film

negative: harmful hypocritical inefficient insecure

It was a macabre and hypocritical circus. Why are they being so inefficient ?

Words

Verbs

positive: praise, love negative: blame, criticize positive: pleasure, enjoyment negative: pain, criticism

Nouns

Phrases

Phrases containing adjectives and adverbs

positive: high intelligence, low cost negative: little variation, many troubles

Patterns

way with <np>: to ever let China use force to have its way with expense of <np>: at the expense of the worlds securty and stability underlined <dobj>: Jiangs subdued tone underlined his desire to avoid disputes

Machine Learning

Studies showed that standard machine learning techniques definitively outperform humanproduced baselines.

Machine Learning

To treat sentiment classification simply as a special case of topic-based categorization (with the two topics being positive sentiment and negative sentiment)

Supervised Methods

In order to train a classifier for sentiment recognition in text, classic supervised learning techniques (e.g. Support Vector Machines, naive Bayes, Maximum Entropy) can be used. A supervised approach entails the use of a labelled training corpus to learn a certain classification function. The method that in the literature often yields the highest accuracy regards a Support Vector Machine classifier

Unsupervised Learning
A clustering algorithm partitions the adjectives into two subsets +
scenic nice terrible handsome painful slow

fun expensive comfortable

Arabic

Work of Yousif Almas and Khurshid Ahmad A note on extracting sentiments in financial news in English, Arabic & Urdu Used Pattern approach in financial Data

Conclusion

An important field of study New Filed Many application Suitable for Arabic Language Research Almost no work in this area

References

Pang, Bo and Lee, L. (2008). Opinion Mining and Sentiment Analysis, Foundations and Trends R in, Information Retrieval, Vol. 2, Nos. 12 (2008) 1 135, ebook from http://www.cs.cornell.edu/home/llee/omsa/omsa.pdf Wiebe, J. Cardie, C. and Riloff, E. ( 2007). Manual and Automatic Subjectivity and Sentiment Analysis , Center for Extraction and Summarization of Events and Opinions in Text. University of Utah

References

Almas, Y. and Ahmad, K. (2007). A note on


extracting sentiments in financial news in English, Arabic & Urdu. The Second Workshop on Computational

Approaches to Arabic Script-based Languages LSA 2007 Linguistic Institute July 21, 2007 Stanford University. Leung, C. and Chan, S. ( 2008). Sentiment Analysis of Product Reviews. Encyclopedia of Data Warehousing and Mining - 2nd Edition, Information Science Reference, August 2008

Das könnte Ihnen auch gefallen