Beruflich Dokumente
Kultur Dokumente
Blog Contents
Characteristics of blogs
Blog Article
Interlinking &
Forming communities
Highly personal
With opinions
Time
Location
Immediate response
to events
With mixed topics
Strength
United States
China
Locations
Canada
A theme snapshot
# of nodes in communities
# of communities
Content Analysis
Blog level topic analysis
Information diffusion through
blogspace
Use topic bursting to predict
sales spikes
E.g., [Gruhl et al. 2005]
Blog mentions
Sales rank
Summarizing topics
Monitoring public opinions
Business Intelligence
Over 184 million people currently maintain a blog / are active in Soc. Media
...about 20% of the Internet population
Over 60% read post in blogs / Soc. Media
Trend Setters
In Real Time
There are over a million new blog / Soc. Media posts every day
Product Innovation
Consumer Insight
Trend Insight
Brand Insight
Blogger Outreach
Buzz &
Sentiment
Crisis Communication
Tactical
Strategic
Strategic
Opinions
In real life, facts are important, but opinion also
plays a crucial role. A computer manufacturer,
disappointed with low sales, asks itself: Why
arent consumers buying our laptop? A political
party, disappointed with the last election, wants
to know on an on-going basis: What is the
reaction in the press, newsgroups, chat rooms,
and blogs to latest policy decisions?
Opinions in posts
Analysis of Posts (Tasks)
Perform subjectivity and polarity classification
on blog posts
Discover irregularities in temporal mood
patterns (fear, excitement, etc) appearing in a
large corpus of posts
Use link polarity information to model trust
and influence in the blogosphere
Analyze sentiments about products and
correlate it with its sales
Challenges
Determine whether a document or portion
(e.g. paragraph or statement) is
subjective.
Example: the battery lasts 2 hours vs.
the battery lasts only 2 hours
Challenges
The difficulty lies in the richness of human
language use.
Example:
1. This is a great camera.
2. A great amount of money was spent for
promoting this camera.
3. One might think this is a great camera.
Well think again, because.....
a single keyword can be used to convey three
different opinions, +ve, neutral and -ve
respectively.
Challenges
In order to arrive at sensible conclusions,
sentiment analysis has to understand
context. For example, fighting and
disease is negative in a war context but
positive in a medical one.
Different mining conditions for different
domains.
Sentiment Classification
There are two main techniques for
sentiment classification:
The symbolic technique uses manually
crafted rules and lexicons,
The machine learning approach uses
unsupervised, or supervised learning to
construct a model from a large training
corpus.
Subjectivity
Find relevant words, phrases, patterns that
can be used to express subjectivity
Determine the polarity of subjective
expressions
Words
Adjectives
positive: honest important mature large patient
Ron Paul is the only honest man in Washington.
Kitchells writing is unbelievably mature and is only likely to
get better.
To humour me my patient father agrees yet again to my
choice of film
Words
Verbs
positive: praise, love
negative: blame, criticize
Nouns
positive: pleasure, enjoyment
negative: pain, criticism
Phrases
Phrases containing adjectives and
adverbs
positive: high intelligence, low cost, better
performance
negative: little variation, many troubles,
several excuses
Supervised Methods
In order to train a classifier for sentiment
recognition in text, classic supervised learning
techniques (e.g. Support Vector Machines, naive
Bayes, Maximum Entropy) can be used. A
supervised approach entails the use of a
labelled training corpus to learn a certain
classification function. Support Vector Machine
classifiers have been found to have the greatest
accuracy.
Unsupervised Learning
Clustering algorithms can be used to partition the
adjectives into two subsets
+
slow
scenic
nice
terrible
handsome
painful
fun
expensive
comfortable
Applications / Caselets
Sentiment Analysis for Mining
Marketing Intelligence
Marketing Intelligence
MI is the process of acquiring and analyzing information in
order to understand the market (both existing and potential
customers); to determine the current and future needs and
preferences, attitudes and behavior (Cornish, 1997)
In consonance with Cornishs definition, we take the view that
consumer sentiments and opinions can be useful for elicitation
of their preferences
Objective
To discover marketing intelligence like Feature Buzz related to
products and to analyze feature level opinion by sentiment
analysis and opinion mining
Developing novel approaches for analysis of opinionated text
information by bridging the gap among text mining, machine
learning and natural language processing techniques
The Framework
The Framework
Textual Pre-processing
The opinionated text documents were collected and then,
pre-processed to remove any non-textual information
The Vector Space Model (VSM) was adopted in order to
generate the bag of words for each document
Stemming was done to reduce words to their common
root or stem
Some of the stop words were removed but, we preserved
some useful sentiment expressing terms such as ok and
not
Top n-ranked terms were selected using Information Gain
feature selection
The Framework
The Framework
The Framework
Room
Small (139), Hot (92), Bad (82), Smell (60), Cold (52), Problem
(39), Poor (37), Stink (30), Costly (28), Worst (27), Damp (27),
Dark (26), Complain (18), Broken (18), Leak (15)
Food
Bad (104), Worst (78), Dislike (69), Wait (62), Cold (58),
Disappoint (49), Poor (42), Expensive (39), Horrible (39), Late
(33), Worse (31), Smell (27), Refuse (26), Complain (21), Pathetic
(17)
Stay
Experience
Price
Location
Summary
The study has demonstrated methods for
automatically extracting consumer opinions from
online reviews of hotels
It has shown that aggregated consumer sentiment
as well as specific opinion about product features
can be extracted using sentiment analysis
techniques
More Advanced:
Spatiotemporal Theme Mining
Given a collection of posted articles about a topic with
time and location information
Discover multiple themes (i.e., subtopics) being discussed in
these articles
For a given location, discover how each theme evolves over
time (generate a theme life cycle)
For a given time, reveal how each theme spreads over
locations (generate a theme snapshot)
Compare theme life cycles in different locations
Compare theme snapshots in different time periods
Challenges in
Spatiotemporal Theme Mining
How to represent a theme?
How to model the themes in a collection?
How to model their dependency on time and
location?
How to compute the theme life cycles and
theme snapshots?
All these must be done in an unsupervised
way
How?
Time-stamped data sets of weblogs, each about one
event (broad topic):
Data Set
# docs
Time Span(2005)
Query
Katrina
9377
08/16 -10/04
Hurricane Katrina
Rita
1754
08/16 - 10/04
Hurricane Rita
iPod Nano
1720
09/02 - 10/26
iPod Nano
China
Release of Nano
Canada
United Kingdom
ipod 0.2875
nano 0.1646
apple 0.0813
september 0.0510
mini 0.0442
screen 0.0242
new 0.0200
Applications / Caselets
Identifying the Target Segment
NEED
Wanted to build a
marketing campaign to
recruit brand advocates
into an online
community
ASSUMPTIONS
Knew Boomer Females
were great target for
sewing and crafts
Surprising findings
SOLUTION
Baseline read for
online chatter
Identify
demographics
FINDINGS
Found that Gen Y
females were
actually the right
target
AND, big issue
was online
crafters could be
mean
Applications / Caselets
Trend and Segmentation Analysis
Are Consumers Buying Green?
2007
2008
160%
De
c
Se
p
O
ct
No
v
Ju
l
Au
g
M
ar
Ap
r
M
ay
Ju
n
Ju
l
Au
g
Se
p
O
ct
No
v
De
c
Ja
n
Fe
b
M
ar
Ap
r
M
ay
Ju
n
Ja
n
Fe
b
Trend analysis
156,177
98,148
71,882
51,638
37,944
Negator
22%
Social
Activist
9%
Personal
Shifter
8%
AGREEMENT
DISAGREEMENT
Rejecter
14%
Uncertain
24%
Idler
5%
Skeptic
12%
Guilty
6%
Apathetic
(not measured)
INACTION
Negator
17%
Activist
10%
Social
Personal
Shifter
16%
AGREEMENT
DISAGREEMENT
Rejecter
12%
Uncertain
9%
Idler
13%
Skeptic
11%
Guilty
14%
Apathetic
(not measured)
INACTION
Negator
14%
Activist
8%
Personal
DISAGREEMENT
AGREEMENT
Shifter
19%
Rejecter
8%
Uncertain
10%
Idler
15%
Skeptic
13%
Guilty
13%
Apathetic
(not measured)
INACTION
Negator
3%
Activist
18%
Personal
DISAGREEMENT
AGREEMENT
Shifter
27%
Rejecter
5%
Uncertain
10%
Skeptic
10%
Idler
21%
Apathetic
Guilty
6%
(not measured)
INACTION