Beruflich Dokumente
Kultur Dokumente
Analysis
2
Dan
Jurafsky
• a
3
Dan
Jurafsky
Bing Shopping
• a
4
Dan
Jurafsky
Twi;er
sen%ment
versus
Gallup
Poll
of
Consumer
Confidence
Brendan O'Connor, Ramnath Balasubramanyan, Bryan R. Routledge, and Noah A. Smith. 2010.
From
Tweets
to
Polls:
Linking
Text
Sen+ment
to
Public
Opinion
Time
Series.
In
ICWSM-‐2010
Dan
Jurafsky
Twi;er
sen%ment:
Johan
Bollen,
Huina
Mao,
Xiaojun
Zeng.
2011.
TwiXer
mood
predicts
the
stock
market,
Journal
of
Computa+onal
Science
2:1,
1-‐8.
10.1016/j.jocs.2010.12.007.
6
Dan
Jurafsky
• CALM predicts
7
Dan
Jurafsky
8
Dan
Jurafsky
9
Dan
Jurafsky
Sen%ment
Analysis
• Sen+ment
analysis
is
the
detec+on
of
aGtudes
“enduring,
affec+vely
colored
beliefs,
disposi+ons
towards
objects
or
persons”
1. Holder
(source)
of
aftude
2. Target
(aspect)
of
aftude
3. Type
of
aftude
• From
a
set
of
types
• Like,
love,
hate,
value,
desire,
etc.
• Or
(more
commonly)
simple
weighted
polarity:
• posi1ve,
nega1ve,
neutral,
together
with
strength
4. Text
containing
the
aftude
13
• Sentence
or
en+re
document
Dan
Jurafsky
Sen%ment
Analysis
• Simplest
task:
• Is
the
aftude
of
this
text
posi+ve
or
nega+ve?
• More
complex:
• Rank
the
aftude
of
this
text
from
1
to
5
• Advanced:
• Detect
the
target,
source,
or
complex
aftude
types
Dan
Jurafsky
Sen%ment
Analysis
• Simplest
task:
• Is
the
aftude
of
this
text
posi+ve
or
nega+ve?
• More
complex:
• Rank
the
aftude
of
this
text
from
1
to
5
• Advanced:
• Detect
the
target,
source,
or
complex
aftude
types
Sentiment
Analysis
A
Baseline
Algorithm
Dan
Jurafsky
• Polarity
detec+on:
• Is
an
IMDB
movie
review
posi+ve
or
nega+ve?
• Data:
Polarity
Data
2.0:
• hXp://www.cs.cornell.edu/people/pabo/movie-‐review-‐data
Dan
Jurafsky
✓
✗
when
_star
wars_
came
out
some
twenty
years
“
snake
eyes
”
is
the
most
aggrava+ng
ago
,
the
image
of
traveling
throughout
the
stars
kind
of
movie
:
the
kind
that
shows
so
has
become
a
commonplace
image
.
[…]
much
poten+al
then
becomes
when
han
solo
goes
light
speed
,
the
stars
change
unbelievably
disappoin+ng
.
to
bright
lines
,
going
towards
the
viewer
in
lines
it’s
not
just
because
this
is
a
brian
that
converge
at
an
invisible
point
.
depalma
film
,
and
since
he’s
a
great
cool
.
director
and
one
who’s
films
are
always
_october
sky_
offers
a
much
simpler
image–that
of
greeted
with
at
least
some
fanfare
.
a
single
white
dot
,
traveling
horizontally
across
the
and
it’s
not
even
because
this
was
a
film
night
sky
.
[.
.
.
]
starring
nicolas
cage
and
since
he
gives
a
brauvara
performance
,
this
film
is
hardly
worth
his
talents
.
Dan
Jurafsky
22
Dan
Jurafsky
Nega%on
Das,
Sanjiv
and
Mike
Chen.
2001.
Yahoo!
for
Amazon:
Extrac+ng
market
sen+ment
from
stock
message
boards.
In
Proceedings
of
the
Asia
Pacific
Finance
Associa+on
Annual
Conference
(APFA).
Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up? Sentiment Classification
using Machine Learning Techniques. EMNLP-2002, 79—86.
Add NOT_ to every word between nega+on and following punctua+on:
count(w, c) +1
P̂(w | c) =
count(c) + V
24
Dan
Jurafsky
• Intui+on:
• For
sen+ment
(and
probably
for
other
text
classifica+on
domains)
• Word
occurrence
may
maXer
more
than
word
frequency
• The
occurrence
of
the
word
fantas1c
tells
us
a
lot
• The
fact
that
it
occurs
5
+mes
may
not
tell
us
much
more.
• Boolean
Mul+nomial
Naïve
Bayes
• Clips
all
the
word
counts
in
each
document
at
1
25
Dan
Jurafsky
27
Dan
Jurafsky
Cross-‐Valida%on
Iteration
31
Dan
Jurafsky
Problems:
What
makes
reviews
hard
to
classify?
• Subtlety:
• Perfume
review
in
Perfumes:
the
Guide:
• “If
you
are
reading
this
because
it
is
your
darling
fragrance,
please
wear
it
at
home
exclusively,
and
tape
the
windows
shut.”
•
Dorothy
Parker
on
Katherine
Hepburn
• “She
runs
the
gamut
of
emo+ons
from
A
to
B”
32
Dan
Jurafsky
Thwarted
Expecta%ons
and
Ordering
Effects
• “This
film
should
be
brilliant.
It
sounds
like
a
great
plot,
the
actors
are
first
grade,
and
the
suppor+ng
cast
is
good
as
well,
and
Stallone
is
aXemp+ng
to
deliver
a
good
performance.
However,
it
can’t
hold
up.”
• Well
as
usual
Keanu
Reeves
is
nothing
special,
but
surprisingly,
the
very
talented
Laurence
Fishbourne
is
not
so
good
either,
I
was
surprised.
33
Sentiment
Analysis
A
Baseline
Algorithm
Sentiment
Analysis
Sen+ment
Lexicons
Dan
Jurafsky
Riloff and Wiebe (2003). Learning extraction patterns for subjective expressions. EMNLP-2003.
• 6786
words
• 2006
posi+ve
• 4783
nega+ve
39
Dan
Jurafsky
Sen%WordNet
Stefano
Baccianella,
Andrea
Esuli,
and
Fabrizio
Sebas+ani.
2010
SENTIWORDNET
3.0:
An
Enhanced
Lexical
Resource
for
Sen+ment
Analysis
and
Opinion
Mining.
LREC-‐2010
41
Dan
Jurafsky
• How
likely
is
each
word
to
appear
in
each
sen+ment
class?
• Count(“bad”)
in
1-‐star,
2-‐star,
3-‐star,
etc.
• But
can’t
use
raw
counts:
• Instead,
likelihood:
P(w | c) = f (w, c)
∑w∈c f (w, c)
• Make
them
comparable
between
words
• Scaled
likelihood:
P(w | c)
P(w)
Dan
Jurafsky
0.28 ●
0.27 ●
Scaled
likelihood
Pr(c|w)
●
●
0.17 0.17 ●
0.16
●
P(w|c)/P(w)
●
●
0.12 ●
●
●
●
●
0.11 ●
0.1 ●
●
●
● ● ●
0.08 ● ●
●
●
●
●
●
●
●
● ● ●
0.05 ● ● ● ●
0.05 ●
0.05 ● ● ●
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Rating
NEG good (20,447 tokens) depress(ed/ing) (18,498 tokens) bad (368,273 tokens) terrible (55,492 tokens)
0.28 ●
Scaled
likelihood
0.21 ● ●
Pr(c|w)
P(w|c)/P(w)
0.16 ● ● ● ●
●
0.16
●
● 0.13 ● ●
0.12
●
0.1 0.11 ●
●
●
● ●
● ● ●
● ● ●
0.08 ● ●
●
●
● ●
●
0.03
●
● ● 0.04 ● ●
0.03
●
● ● ●
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Dan
Jurafsky
a
Scaled
likelihood
P(w|c)/P(w)
Sentiment
Analysis
Sen+ment
Lexicons
Sentiment
Analysis
Learning
Sen+ment
Lexicons
Dan
Jurafsky
48
Dan
Jurafsky
49
Dan
Jurafsky
nice, helpful
nice, classy
51
Dan
Jurafsky
corrupt irrational
nice
fair classy
52
Dan
Jurafsky
+
helpful
brutal
-‐
corrupt irrational
nice
fair classy
53
Dan
Jurafsky
54
Dan
Jurafsky
55
Dan
Jurafsky
Turney
Algorithm
Turney (2002): Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised
Classification of Reviews
56
Dan
Jurafsky
58
Dan
Jurafsky
P(x,y)
PMI(X,Y ) = log 2
P(x)P(y)
• PMI
between
two
words:
•
How
much
more
do
two
words
co-‐occur
than
if
they
were
independent?
P(word1,word2 )
PMI(word1, word2 ) = log 2
P(word1)P(word2 )
Dan
Jurafsky
Learning
Sen+ment
Lexicons
Sentiment
Analysis
Other
Sen+ment
Tasks
Dan
Jurafsky
70
Dan
Jurafsky
72
Dan
Jurafsky
73
Dan
Jurafsky
76
Dan
Jurafsky
• Emo%on:
• Detec+ng
annoyed
callers
to
dialogue
system
• Detec+ng
confused/frustrated
versus
confident
students
• Mood:
• Finding
trauma+zed
or
depressed
writers
• Interpersonal
stances:
• Detec+on
of
flirta+on
or
friendliness
in
conversa+ons
• Personality
traits:
• Detec+on
of
extroverts
Dan
Jurafsky
Other
Sen+ment
Tasks