Beruflich Dokumente
Kultur Dokumente
E very day Twitter handles more than 400 million tweets. Miley Cyrus’ 2013 VMA
fiasco generated more than 17 million tweets! Many of these tweets comment
on products, TV shows, or ads. These tweets contain a great deal of information
that is valuable to marketers. For example, if you read every tweet on a Super Bowl
ad, you could determine if the United States liked or hated the ad. Of course, it is
impractical to read every tweet that discusses a Super Bowl ad. What is needed is a
method to find all tweets and then derive some marketing insights from the tweets.
Text mining refers to the process of using statistical methods to glean useful infor-
mation from unstructured text. In addition to analyzing tweets, you can use text
mining to analyze Facebook and blog posts, movies, TV, and restaurant reviews,
and newspaper articles. The data sets from which text mining can glean meaningful
insights are virtually endless. In this chapter you gain some basic insights into the
methods you can use to glean meaning from unstructured text. You also learn about
some amazing applications of text mining.
In all of the book’s previous analysis of data, each data set was organized so that
each row represented an observation (such as data on sales, price, and advertising
during a month) and each column represented a variable of interest (sales each
month, price each month, or advertising each month). One of the big challenges
in text mining is to take an unstructured piece of text such as a tweet, newspaper
article, or blog post and transform its contents into a spreadsheet-like format. This
chapter begins by exploring some simple ways to transform text into a spreadsheet-
like format. After the text has been given some structure, you may apply many
techniques discussed earlier (such as Naive Bayes, neural networks, logistic regres-
sion, multiple regression, discriminant analysis, principal components, and cluster
analysis) to analyze the text. The chapter concludes with a discussion of several
important and interesting applications of text mining including the following:
■ Using text content of a review to predict whether a movie review was posi-
tive or negative
■ Using tweets to determine whether customers are happy with airline service
■ Using tweets to predict movie revenues
■ Using tweets to predict if the stock market will go up or down
■ Using tweets to evaluate viewer reaction to Super Bowl ads
SPSS, or STATISTICA) that can interface with Twitter and retrieve all tweets that
are relevant. Pulling relevant tweets is not as easy as you might think. You might
pull all tweets containing the tokens Sofia, Vergara, Diet, and Pepsi, but then you
would be missing tweets such as those shown in Figure 45-1, which contain the
token Sofia and not Vergara.
As you can see, extracting the relevant text documents is not a trivial matter.
To illustrate how text mining can give structure to text, you can use the following
guidelines to set the stage for an example:
After the corpus of relevant tweets has undergone stemming and stopping, you
must create a vector representation for each document that associates a value with
each of the W words occurring in the corpus. The three most common vector codings
are binary coding, frequency coding, and the term frequency/inverse document frequency
score (tf-idf for short). The three forms of coding are defined as follows:
NOTE Before coding each document, infrequently occurring words are often
deleted.
After stemming and stopping is performed, the documents are transformed into
the text shown in rows 10–12. Now you can work through the three vector codings
of the text.
Binary Coding
Rows 15–17 show the binary coding of the three documents. You simply assign a
1 if a word occurs in a document and a 0 if a word does not appear in a document.
For example, cell F16 contains a 1 because Document 2 contains the word great,
and cell G16 contains a 0 because Document 2 does not contain the word most.
Frequency Coding
Rows 21–23 show the frequency coding of the three documents. You simply count
the number of times the word appears in the document. For example, the word great
appears twice in Document 2, so you can enter a 2 in cell F22. Because the word
“most” does not occur in Document 2, you enter a 0 in cell G22.
Breen found that that JetBlue (84-percent positive tweets) and Southwest (74-
percent positive tweets) performed best. When Breen correlated each airline’s score
with the national survey evaluation of airline service conducted by the American
Consumer Satisfaction Index (ACSI) he found an amazing 0.90 correlation between
the percentage of positive tweets for each airline and the airline’s ACSI score. Because
tweets can easily be monitored in real time, an airline can track over time the percent-
age of positive tweets and quickly see whether its service quality improves or declines.
Forecast Sales”) correctly predicted the direction of change in the Dow 84 percent
of the time. This is truly amazing because the widely believed Efficient Market
Hypothesis implies that the daily directional movement of a market index cannot
be predicted with more than 50-percent accuracy.
Summary
In this chapter you learned the following:
■ To glean impact from the text, the text must be given structure through a
vector representation that implements a coding based on the words present
in the text.
■ Binary coding simply records whether a word is present in a document.
■ Frequency coding counts the number of times a word is present in a document.
■ The term-frequency/inverse document frequency score adjusts frequency
coding of a word to reduce the significance of a word that appears in many
documents.
■ After text is coded many techniques, such as Naive Bayes, neural networks,
logistic regression, multiple regression, discriminant analysis, principal com-
ponents, and cluster analysis, can be used to gain useful insights.
Exercises
1. Consider the following three snippets of text:
After stemming and stopping these snippets, complete the following tasks:
a. Create binary coding for each snippet.
b. Create frequency coding for each snippet.
c. Create tf-idf coding for each snippet.
2. Use your favorite search engine to find the definition of “Amazon mechani-
cal Turk.” If you were conducting a text mining study, how would you use
Amazon mechanical Turks?
3. Describe how text mining could be used to mechanically classify restaurant
reviews as favorable or unfavorable.
4. Describe how text mining could be used to determine from a member of
Congress’ tweets whether she is conservative or liberal.
5. Describe how text mining could be used to classify The New York Times stories
as international news, political news, celebrity news, financial news, science
and technology news, entertainment news, obituary, and sports news.
6. Alexander Hamilton, John Jay, and James Madison wrote The Federalist Papers,
a series of 85 essays that provide reasons for ratifying the U.S. Constitution.
The authorship of 73 of the papers is beyond dispute, but for the other 12
papers, the author is unknown. How can you use text mining in an attempt
to determine the authorship of the 12 disputed Federalist Papers?
7. Suppose you are a brand manager for Lean Cuisine. How can use text mining
of tweets on new products to predict the future success of new products?
8. Suppose that on the same day the Sofia Vergara Diet Pepsi ad with David
Beckham aired on two different shows. How would you make a decision about
future placement of the ad on the same two TV shows?
9. Two word phrases are known as bigrams. How can coding text with bigrams
improve insights derived from text mining? What problems might arise in
using bigrams?