Beruflich Dokumente
Kultur Dokumente
708
Proceedings of NAACL-HLT 2018, pages 708–713
c
New Orleans, Louisiana, June 1 - 6, 2018.
2018 Association for Computational Linguistics
transcripts and demonstrate predictive cues be- of these, each player posts a detailed text descrip-
tween characters and actions by building neural tion of the role (character) she is going to play in
language models with facility for adding side in- the game, which we call a character description.
formation. We show that a language model over This description includes the character’s physi-
action text obtains lower perplexity when we also cal appearance, personality, family background, as
make available a representation of the character well as special and supernatural powers, and pos-
who produced each token. Likewise, a language sessions. A second thread consists of the actual
model for character descriptions benefits from in- game play where each player contributes a post
formation about the actions the character made. when his turn comes. Each post describes how the
Our findings open up new possibilities for making character that is assumed by that specific player
sophisticated inference over narrative texts. responds to the game situation. Thus the story de-
velops collaboratively. We call each post in the
2 Related work story thread an action description. An example
from our corpus of a character description and an
In work on narratives, both characters and actions
action description is shown in Table 1.
have received significant attention, albeit sepa-
rately. There is work on inducing types of char- A noteworthy aspect of these RPGs is that char-
acters (Bamman et al., 2013, 2014) or relation- acter attributes are determined by writing the de-
ships between characters (Chang et al., 2009; El- scriptions before the game starts. The story thread
son et al., 2010; Chaturvedi et al., 2016; Iyyer itself then focuses predominantly on the actions
et al., 2016). Often these approaches are based and does not reiterate character attributes. More-
on probabilistic topic models or more recently dis- over, we know unambiguously which character is
tributed word representations computed by neural associated with each action post. Such mapped
networks. Others focus on learning regular and pairs of clean character descriptions and associ-
repetitive event sequences in stories (Chambers ated actions would be difficult to obtain from nov-
and Jurafsky, 2009; McIntyre and Lapata, 2009), els or other stories without sophisticated analysis.
together with some information about the agent Our corpus contains 1,544 RPGs spanning a va-
of the actions. These extractions are fairly low- riety of themes—fantasy, apocalyptic, romance,
level, in the form of noun-verb pairs. There are anime, military, horror, and adventure. There are
also models for clustering stories either based on a total of 56,576 posts, comprising of 25.3M to-
their characters (Frermann and Szarvas, 2017), or kens. The maximum number of posts in a story
sentiment and topic (Elsner, 2012, 2015). is 753, minimum 2, and the average is 26. Note
The above approaches mine types of actions that many stories are in progress and some are long
or characters. This work focuses on infering the running. There are 9,771 unique characters in the
latent ties between actions and characters, and corpus, and their descriptions amount to 8.5M to-
whether one aspect can help predict the other. kens. There is a minimum of 1, average 6, and
Flekova and Gurevych (2015) present recent work maximum 24 characters in a single story.
related to this latter idea. They classify characters
Even though each character or action descrip-
based on their speech and actions into an intro-
tion focuses on a single character, it nevertheless
vert or extrovert class. In contrast, we focus on
contains descriptions of background settings of the
attributes of characters and actions beyond such
scene, and interactions of other characters (eg. de-
coarse traits, and when these attributes are ex-
scriptions of the parents of a character). Hence
pressed as complex descriptions.
we preprocess the texts to only retain parts most
3 A corpus of RPG transcripts related to the character in focus. To this end, in
character descriptions, we only keep those sen-
Traditionally, RPGs are played orally with players tences which mention the character’s name or the
seated around a table. But there are also online personal pronouns ‘he’ or ‘she’. The use of pro-
forums where users play RPGs by posting text de- nouns reflects an intuition that since the descrip-
scriptions instead. tion is of one key character, the pronoun is most
We collected a corpus of RPG threads from one likely to refer to this salient entity. We also take
such website roleplayerguild.com. Here sentences which mention personality describing
each game play is recorded in two threads. In one words such as ‘personality’, ‘skill’, ‘specialize’,
709
‘ability’, ‘profile’, ‘talent’, etc., even when they R|V |∗|h| where |h| is the hidden size and |V | is the
do not contain the name. For action descriptions, vocabulary size; bv is the bias vector.
we only keep sentences which start with the char- To take the character descriptions into account
acter’s name. We do not use pronouns since action when generating actions, we define a second
text may refer to other salient characters as well. model ACTION -LMS which estimates
Finally, we replace the main character (contrib- K
Y
utor) of a post with an “ENT” (for entity) token. P (X|C) = p(xi |zi , x1 ...xi−1 , z1 ...zi−1 ),
Other proper names in a post are replaced with a i=1
“NAME” token and numbers with a “NUM” to- where zl is a variable indicating which char-
ken. We drop all punctuation and any text with acter produced the token xl . For this model, we
less than 5 tokens. After these preprocessing steps, essentially augment the RNNs with the character
we have 1,439 stories containing 1.48M tokens for descriptions as side information. For each token
action descriptions and 2.95M for characters. xl , the side information is the character descrip-
tion indicated by zl , i.e, Czl . We follow the ap-
4 Learning character-action interactions proach by Mikolov and Zweig (2012), and Hoang
We examine the feasibility of inferring character- et al. (2016), where a feature embedding vector e
action interactions from text using neural language representing side information is input to both the
models (LM) with side information. RNN’s hidden and output layers, or to one of them.
During development, we found that concatenating
4.1 ACTION and C HAR language models the feature embedding with the token embedding
The story line, that is, the full text of a story is the at the input layer, and with the hidden state at out-
token sequence X = x1 ...xK created by concate- put layer gave the best performance. More for-
nating the tokens across all the action descriptions mally, ACTION -LMS computes:
of the story. The posts are taken in time order with- hi = LSTM hi−1 ,
xi−1
out any mark for post boundaries. Let C be the set ei
of all characters in a story. For each character j, hi
we denote the character description as the token P (xi |x1 . . . xi−1 ) = softmax Wrv + bv
ei
sequence Cj = cj1 . . . cjm .
We build separate language models for action where ei is a representation of the character which
sequences and for character descriptions. The ac- produced the token xi . The hidden state hi−1 now
tion sequence model is over the story lines (the se- summarizes both the action tokens up to i − 2 and
quence of all action descriptions in a story), i.e. X the character information up to i − 1. The output
as defined above. The character description model layer weight matrix is Wrv ∈ R|V |∗(|h|+|e|) where
is over individual character descriptions i.e., Cj . |h| is the size of the RNN hidden unit, and |e| the
First we describe the language model P (X) for feature embedding size.
the story line. We hypothesize in this work that a In our work, the feature embedding itself comes
better model of X can be built by taking into ac- from a feedforward neural network trained jointly
count the character in focus for each individual ac- within the LM. This feature network takes as in-
tion description. First, a baseline recurrent neural put the average value of pretrained embeddings3
network (RNN) language model, which we denote for the tokens in the character description (we re-
ACTION -LM, would be move stopwords4 ). This initial vector is passed
through hidden layers to yield the feature embed-
hi = LSTM (hi−1 , xi−1 ) ding e (reminiscent of deep averaging networks by
P (xi |x1 . . . xi−1 ) = softmax (Whv hi + bv ) Iyyer et al. (2015)).
The language models for character descriptions
Here xi−1 is the embedding of the input token are similar in structure. First, we call the un-
xi−1 , and hi−1 is the hidden state which summa- conditioned model P (Ci ) for a character descrip-
rizes the token sequence x1 . . . xi−2 . LSTM com- tion Ci as CHAR -LM; this is again an LSTM lan-
putes the next hidden state using an LSTM cell guage model. Second, we implement CHAR -LMS
(Hochreiter and Schmidhuber, 1997). The out- 3
300 dimension word2vec (Mikolov et al., 2013) embed-
put layer produces a probability distribution over dings trained on the 1 billion word Google News Corpus.
the LM vocabulary using weight matrix Whv ∈ 4
We remove stopwords for side information only
710
which estimates P (Ci |XCi ), where XCi is the Action-LMS Model
Prime text: hbosi ENT called . . .
subsequence of X only containing the tokens pro- Char. context Generated continuation
duced by Ci . We obtain this conditional proba- small girl . . . her name heosi
bility based on the same architecture as ACTION - cheerful
bulky male . . . out to the group heosi
LMS. Here the input to the feature neural network hunter bow . . . over and walked over to the
is the average pretrained embeddings of the tokens forest large king had been making sure
(without stopwords) in XCi . fear afraid . . . her her brother heosi
angry irritated . . . back at name with her thick
road with disappointment heosi
4.2 Experiments brutal violent . . . out of ENT hard to help ENT
help ENT help ENT help heosi
We randomly divide our corpus into 100 stories for school student . . . out in the way of the
testing, 20 for development, and the rest, 1319 for romantic conversation heosi
training. We compare the two ACTION language Char-LMS Model
models, based on a vocabulary size of 20,000, and Prime text: hbosi ENT is . . .
Action context Generated continuation
the CHAR models have a vocabulary of 10,000. appeared . . . a very young man who has a
Some posts are long even after our filtering disappeared flew few scars on his body heosi
steps, and create a winding story line when con- walked looked . . . a very friendly person heosi
stayed
catenated. So we also explore whether limits on waited . . . a little girl who is a little girl
description lengths is useful. In ACTION models, who is a little
a limit of g means that only the first g words of pause stare . . . a very very young woman heosi
strike slap . . . a bad boy heosi
each post are concatenated to form X. For CHAR follow creep . . . a slim and slim but slim
models, only the first g words of the description physique heosi
Ci is used as the sequence for the LM. The same Table 2: Samples from our language models
limit g is given to both the models with and with-
Model Train Dev Test
out side information. When using side informa- ACTION -LM 82.56 106.83 105.06
tion, we can restrict the conditioning text as well, ACTION -LMS 57.38 94.95 96.91
to a maximum of h words. We tune these limit C HAR -LM 69.45 118.78 106.12
C HAR -LMS 61.84 110.13 100.86
parameters, as well as the number of hidden lay-
ers, hidden unit sizes and dropout probability on a Table 3: Perplexities of our models
development set.
For the ACTION models, we set g to 100 words. pling from the models (Table 2). For side informa-
ACTION -LM uses 2 layers with 256 hidden units
tion, we use simple words (taken from the descrip-
each. ACTION -LMS has 1 layer with 256 hid- tions in our test corpus) for closer examination.
den units for the feature network with h set to 25 For ACTION -LMS, we seed the story line with
words, and 1 layer with 50 units for the RNN part. the priming text “hbosi ENT called”, where ENT
For the CHAR models, g = 200 words. CHAR -LM is the token in our vocabulary referring to the main
has one hidden layer with 100 units. For CHAR - character of a post. hbosi is a beginning of sen-
LMS, the best network was the same as ACTION - tence marker. Different inputs for the condition-
LMS but with h = 100 (the first 100 words of ing character description are shown under “Char.
all the action posts by that character are com- context”. We then sample from the LM follow-
bined as the side information). We apply a dropout ing a greedy approach taking the most likely to-
probability of 0.65, clip gradients at 5.0, and use ken at each step until either the end token heosi or
the Adam algorithm (Kingma and Ba, 2015) for a maximum of 12 tokens is reached. The sample
optimization. All our models can trained in an is shown under “generated continuation”. Simi-
hour, ACTION -LM with 14 epochs, CHAR -LM larly, we sample from C HAR -LMS where the se-
62, ACTION -LMS 60 and CHAR -LMS 91 epochs. quence is first primed with “ENT is a”. We find
We implemented the models in TensorFlow5 . that both models capture interesting ties between
character attributes and actions. However, there is
4.3 Results much scope for improved models of generation.
In this work, we have focused on the possibility
First, we provide examples of the patterns cap-
of capturing the interactions. For that, we com-
tured by ACTION -LMS and CHAR -LMS by sam-
pare the impact of side information using perplex-
5
https://www.tensorflow.org ity on held-out data (Table 3). For both charac-
711
ter and action LMs, adding side information leads Micha Elsner. 2015. Abstract representations of plot
to a significant decrease in perplexity showing structure. Linguistic Issues in Language Technology
12(5):1–31.
that the interdependence between the two aspects
can be learned computationally. Again, there is David K. Elson, Nicholas Dames, and Kathleen R.
a lot of scope for improving the language mod- McKeown. 2010. Extracting social networks from
literary fiction. In Proceedings of the 48th Annual
els given that the development and test perplexities
Meeting of the Association for Computational Lin-
are much higher than those during training. guistics. pages 138–147.
712
Tomas Mikolov and Geoffrey Zweig. 2012. Context
dependent recurrent neural network language model.
In Proceedings of IEEE Spoken Language Technol-
ogy Workshop. pages 234–239.
713