Sie sind auf Seite 1von 6

Deep Dungeons and Dragons: Learning Character-Action Interactions

from Role-Playing Game Transcripts

Annie Louis Charles Sutton


University of Edinburgh University of Edinburgh
10 Crichton Street 10 Crichton Street
Edinburgh EH8 9AB Edinburgh EH8 9AB
alouis@inf.ed.ac.uk sutton@inf.ed.ac.uk

Abstract Character description


Name: Ana Blackclaw; Age: 27; Gender: Female
Appearance: Standing at a mighty 6’5, she is a giant
An essential aspect to understanding narra- among her fellow humans. Her face is light, though
tives is to grasp the interaction between char- paler than the average man or woman’s, and is marked
acters in a story and the actions they take. by scars. ... Her body is muscular, as it would have
We examine whether computational models to be to carry both her armor and the hammer. Her
light grey eyes nearly always keep a bored expression.
can capture this interaction, when both char-
Her canines seem a tad larger than the normal person’s.
acter attributes and actions are expressed as Preferred Weapon: Hammer. Preferred Armor: Heavy.
complex natural language descriptions. We Gift: Binoculars. Darksign: No.
propose role-playing games as a testbed for Action description
this problem, and introduce a large corpus1 of She stopped dead in her tracks as the hissing began. A
game transcripts collected from online discus- grumble escaped her as it did so, and she looked over to
make sure the other woman was doing fine. Seeing that
sion forums. Using neural language models all was not entirely well, she allowed herself to slide
which combine character and action descrip- down, her hand gripping the slope side once more to
tions from these stories, we show that we can slow herself. Once that was accomplished, she reached
learn the latent ties. Action sequences are bet- out and grabbed the back of the girl’s neck, pulling her
ter predicted when the character performing back to steady herself. The giant remained silent as she
did so, and then glanced over to the nearby skeletons.
the action is also taken into account, and vice They would be upon them soon. Her grip tightened on
versa for character attributes. the hammer as she glanced from side to side. It would
not be a fun fight.
1 Introduction
Table 1: Example descriptions from our RPG corpus
Imagine a giant, a dwarf, and a fairy in a combat
situation. We would expect them to act differently, description, and an action text for the same char-
and conversely, if we are told of even a few actions acter. This example shows how the ties between
taken by a character in a story, we naturally start characters and their actions are subtly present in
to draw inferences about that character’s person- the text descriptions, and learning the latent ties
ality. Communicating narrative is a fundamental between them is a difficult task. Based on our cor-
task of natural language, and understanding nar- pus, and using neural language models, this work
rative requires modelling the interaction between demonstrates an initial success on this problem.
events and characters.2 The ability to understand and generate narra-
In this paper, we propose that collaboratively- tives is a useful skill for natural language systems,
told stories that arise in certain types of games pro- for example, to plan a coherent answer to a ques-
vide a natural test bed for the problem of inferring tion, or to generate a summary of a document.
interactions between characters and actions in nar- Prior work on narrative processing has focused on
ratives. We present a corpus of role-playing game inducing disjoint sets of character and event types
(RPG) transcripts where characters and action se- (as topic models), capturing the relationship be-
quences are described with complex natural lan- tween characters in the same story, or extracting
guage texts. Table 1 shows an example character character-action pairs as low level noun-verb tu-
ples. However, these models do not aim to match
1
http://groups.inf.ed.ac.uk/cup/ddd/ or infer characters and actions from each other.
2
In this paper, the word character will always be used in
the sense of “character in a story” rather than the sense of We make two contributions towards closing this
“character in a token”. gap. We introduce a corpus of thousands of RPG

708
Proceedings of NAACL-HLT 2018, pages 708–713
c
New Orleans, Louisiana, June 1 - 6, 2018. 2018 Association for Computational Linguistics
transcripts and demonstrate predictive cues be- of these, each player posts a detailed text descrip-
tween characters and actions by building neural tion of the role (character) she is going to play in
language models with facility for adding side in- the game, which we call a character description.
formation. We show that a language model over This description includes the character’s physi-
action text obtains lower perplexity when we also cal appearance, personality, family background, as
make available a representation of the character well as special and supernatural powers, and pos-
who produced each token. Likewise, a language sessions. A second thread consists of the actual
model for character descriptions benefits from in- game play where each player contributes a post
formation about the actions the character made. when his turn comes. Each post describes how the
Our findings open up new possibilities for making character that is assumed by that specific player
sophisticated inference over narrative texts. responds to the game situation. Thus the story de-
velops collaboratively. We call each post in the
2 Related work story thread an action description. An example
from our corpus of a character description and an
In work on narratives, both characters and actions
action description is shown in Table 1.
have received significant attention, albeit sepa-
rately. There is work on inducing types of char- A noteworthy aspect of these RPGs is that char-
acters (Bamman et al., 2013, 2014) or relation- acter attributes are determined by writing the de-
ships between characters (Chang et al., 2009; El- scriptions before the game starts. The story thread
son et al., 2010; Chaturvedi et al., 2016; Iyyer itself then focuses predominantly on the actions
et al., 2016). Often these approaches are based and does not reiterate character attributes. More-
on probabilistic topic models or more recently dis- over, we know unambiguously which character is
tributed word representations computed by neural associated with each action post. Such mapped
networks. Others focus on learning regular and pairs of clean character descriptions and associ-
repetitive event sequences in stories (Chambers ated actions would be difficult to obtain from nov-
and Jurafsky, 2009; McIntyre and Lapata, 2009), els or other stories without sophisticated analysis.
together with some information about the agent Our corpus contains 1,544 RPGs spanning a va-
of the actions. These extractions are fairly low- riety of themes—fantasy, apocalyptic, romance,
level, in the form of noun-verb pairs. There are anime, military, horror, and adventure. There are
also models for clustering stories either based on a total of 56,576 posts, comprising of 25.3M to-
their characters (Frermann and Szarvas, 2017), or kens. The maximum number of posts in a story
sentiment and topic (Elsner, 2012, 2015). is 753, minimum 2, and the average is 26. Note
The above approaches mine types of actions that many stories are in progress and some are long
or characters. This work focuses on infering the running. There are 9,771 unique characters in the
latent ties between actions and characters, and corpus, and their descriptions amount to 8.5M to-
whether one aspect can help predict the other. kens. There is a minimum of 1, average 6, and
Flekova and Gurevych (2015) present recent work maximum 24 characters in a single story.
related to this latter idea. They classify characters
Even though each character or action descrip-
based on their speech and actions into an intro-
tion focuses on a single character, it nevertheless
vert or extrovert class. In contrast, we focus on
contains descriptions of background settings of the
attributes of characters and actions beyond such
scene, and interactions of other characters (eg. de-
coarse traits, and when these attributes are ex-
scriptions of the parents of a character). Hence
pressed as complex descriptions.
we preprocess the texts to only retain parts most
3 A corpus of RPG transcripts related to the character in focus. To this end, in
character descriptions, we only keep those sen-
Traditionally, RPGs are played orally with players tences which mention the character’s name or the
seated around a table. But there are also online personal pronouns ‘he’ or ‘she’. The use of pro-
forums where users play RPGs by posting text de- nouns reflects an intuition that since the descrip-
scriptions instead. tion is of one key character, the pronoun is most
We collected a corpus of RPG threads from one likely to refer to this salient entity. We also take
such website roleplayerguild.com. Here sentences which mention personality describing
each game play is recorded in two threads. In one words such as ‘personality’, ‘skill’, ‘specialize’,

709
‘ability’, ‘profile’, ‘talent’, etc., even when they R|V |∗|h| where |h| is the hidden size and |V | is the
do not contain the name. For action descriptions, vocabulary size; bv is the bias vector.
we only keep sentences which start with the char- To take the character descriptions into account
acter’s name. We do not use pronouns since action when generating actions, we define a second
text may refer to other salient characters as well. model ACTION -LMS which estimates
Finally, we replace the main character (contrib- K
Y
utor) of a post with an “ENT” (for entity) token. P (X|C) = p(xi |zi , x1 ...xi−1 , z1 ...zi−1 ),
Other proper names in a post are replaced with a i=1
“NAME” token and numbers with a “NUM” to- where zl is a variable indicating which char-
ken. We drop all punctuation and any text with acter produced the token xl . For this model, we
less than 5 tokens. After these preprocessing steps, essentially augment the RNNs with the character
we have 1,439 stories containing 1.48M tokens for descriptions as side information. For each token
action descriptions and 2.95M for characters. xl , the side information is the character descrip-
tion indicated by zl , i.e, Czl . We follow the ap-
4 Learning character-action interactions proach by Mikolov and Zweig (2012), and Hoang
We examine the feasibility of inferring character- et al. (2016), where a feature embedding vector e
action interactions from text using neural language representing side information is input to both the
models (LM) with side information. RNN’s hidden and output layers, or to one of them.
During development, we found that concatenating
4.1 ACTION and C HAR language models the feature embedding with the token embedding
The story line, that is, the full text of a story is the at the input layer, and with the hidden state at out-
token sequence X = x1 ...xK created by concate- put layer gave the best performance. More for-
nating the tokens across all the action descriptions mally, ACTION -LMS computes:
  
of the story. The posts are taken in time order with- hi = LSTM hi−1 ,
xi−1
out any mark for post boundaries. Let C be the set ei
   
of all characters in a story. For each character j, hi
we denote the character description as the token P (xi |x1 . . . xi−1 ) = softmax Wrv + bv
ei
sequence Cj = cj1 . . . cjm .
We build separate language models for action where ei is a representation of the character which
sequences and for character descriptions. The ac- produced the token xi . The hidden state hi−1 now
tion sequence model is over the story lines (the se- summarizes both the action tokens up to i − 2 and
quence of all action descriptions in a story), i.e. X the character information up to i − 1. The output
as defined above. The character description model layer weight matrix is Wrv ∈ R|V |∗(|h|+|e|) where
is over individual character descriptions i.e., Cj . |h| is the size of the RNN hidden unit, and |e| the
First we describe the language model P (X) for feature embedding size.
the story line. We hypothesize in this work that a In our work, the feature embedding itself comes
better model of X can be built by taking into ac- from a feedforward neural network trained jointly
count the character in focus for each individual ac- within the LM. This feature network takes as in-
tion description. First, a baseline recurrent neural put the average value of pretrained embeddings3
network (RNN) language model, which we denote for the tokens in the character description (we re-
ACTION -LM, would be move stopwords4 ). This initial vector is passed
through hidden layers to yield the feature embed-
hi = LSTM (hi−1 , xi−1 ) ding e (reminiscent of deep averaging networks by
P (xi |x1 . . . xi−1 ) = softmax (Whv hi + bv ) Iyyer et al. (2015)).
The language models for character descriptions
Here xi−1 is the embedding of the input token are similar in structure. First, we call the un-
xi−1 , and hi−1 is the hidden state which summa- conditioned model P (Ci ) for a character descrip-
rizes the token sequence x1 . . . xi−2 . LSTM com- tion Ci as CHAR -LM; this is again an LSTM lan-
putes the next hidden state using an LSTM cell guage model. Second, we implement CHAR -LMS
(Hochreiter and Schmidhuber, 1997). The out- 3
300 dimension word2vec (Mikolov et al., 2013) embed-
put layer produces a probability distribution over dings trained on the 1 billion word Google News Corpus.
the LM vocabulary using weight matrix Whv ∈ 4
We remove stopwords for side information only

710
which estimates P (Ci |XCi ), where XCi is the Action-LMS Model
Prime text: hbosi ENT called . . .
subsequence of X only containing the tokens pro- Char. context Generated continuation
duced by Ci . We obtain this conditional proba- small girl . . . her name heosi
bility based on the same architecture as ACTION - cheerful
bulky male . . . out to the group heosi
LMS. Here the input to the feature neural network hunter bow . . . over and walked over to the
is the average pretrained embeddings of the tokens forest large king had been making sure
(without stopwords) in XCi . fear afraid . . . her her brother heosi
angry irritated . . . back at name with her thick
road with disappointment heosi
4.2 Experiments brutal violent . . . out of ENT hard to help ENT
help ENT help ENT help heosi
We randomly divide our corpus into 100 stories for school student . . . out in the way of the
testing, 20 for development, and the rest, 1319 for romantic conversation heosi
training. We compare the two ACTION language Char-LMS Model
models, based on a vocabulary size of 20,000, and Prime text: hbosi ENT is . . .
Action context Generated continuation
the CHAR models have a vocabulary of 10,000. appeared . . . a very young man who has a
Some posts are long even after our filtering disappeared flew few scars on his body heosi
steps, and create a winding story line when con- walked looked . . . a very friendly person heosi
stayed
catenated. So we also explore whether limits on waited . . . a little girl who is a little girl
description lengths is useful. In ACTION models, who is a little
a limit of g means that only the first g words of pause stare . . . a very very young woman heosi
strike slap . . . a bad boy heosi
each post are concatenated to form X. For CHAR follow creep . . . a slim and slim but slim
models, only the first g words of the description physique heosi
Ci is used as the sequence for the LM. The same Table 2: Samples from our language models
limit g is given to both the models with and with-
Model Train Dev Test
out side information. When using side informa- ACTION -LM 82.56 106.83 105.06
tion, we can restrict the conditioning text as well, ACTION -LMS 57.38 94.95 96.91
to a maximum of h words. We tune these limit C HAR -LM 69.45 118.78 106.12
C HAR -LMS 61.84 110.13 100.86
parameters, as well as the number of hidden lay-
ers, hidden unit sizes and dropout probability on a Table 3: Perplexities of our models
development set.
For the ACTION models, we set g to 100 words. pling from the models (Table 2). For side informa-
ACTION -LM uses 2 layers with 256 hidden units
tion, we use simple words (taken from the descrip-
each. ACTION -LMS has 1 layer with 256 hid- tions in our test corpus) for closer examination.
den units for the feature network with h set to 25 For ACTION -LMS, we seed the story line with
words, and 1 layer with 50 units for the RNN part. the priming text “hbosi ENT called”, where ENT
For the CHAR models, g = 200 words. CHAR -LM is the token in our vocabulary referring to the main
has one hidden layer with 100 units. For CHAR - character of a post. hbosi is a beginning of sen-
LMS, the best network was the same as ACTION - tence marker. Different inputs for the condition-
LMS but with h = 100 (the first 100 words of ing character description are shown under “Char.
all the action posts by that character are com- context”. We then sample from the LM follow-
bined as the side information). We apply a dropout ing a greedy approach taking the most likely to-
probability of 0.65, clip gradients at 5.0, and use ken at each step until either the end token heosi or
the Adam algorithm (Kingma and Ba, 2015) for a maximum of 12 tokens is reached. The sample
optimization. All our models can trained in an is shown under “generated continuation”. Simi-
hour, ACTION -LM with 14 epochs, CHAR -LM larly, we sample from C HAR -LMS where the se-
62, ACTION -LMS 60 and CHAR -LMS 91 epochs. quence is first primed with “ENT is a”. We find
We implemented the models in TensorFlow5 . that both models capture interesting ties between
character attributes and actions. However, there is
4.3 Results much scope for improved models of generation.
In this work, we have focused on the possibility
First, we provide examples of the patterns cap-
of capturing the interactions. For that, we com-
tured by ACTION -LMS and CHAR -LMS by sam-
pare the impact of side information using perplex-
5
https://www.tensorflow.org ity on held-out data (Table 3). For both charac-

711
ter and action LMs, adding side information leads Micha Elsner. 2015. Abstract representations of plot
to a significant decrease in perplexity showing structure. Linguistic Issues in Language Technology
12(5):1–31.
that the interdependence between the two aspects
can be learned computationally. Again, there is David K. Elson, Nicholas Dames, and Kathleen R.
a lot of scope for improving the language mod- McKeown. 2010. Extracting social networks from
literary fiction. In Proceedings of the 48th Annual
els given that the development and test perplexities
Meeting of the Association for Computational Lin-
are much higher than those during training. guistics. pages 138–147.

5 Conclusions Lucie Flekova and Iryna Gurevych. 2015. Personal-


ity profiling of fictional characters using sense-level
We have proposed and demonstrated the feasi- links between lexical resources. In Proceedings of
bility of capturing interactions between charac- the Conference on Empirical Methods in Natural
ters and their actions in stories. While our neu- Language Processing. pages 1805–1816.
ral models show that the data can be better mod- Lea Frermann and György Szarvas. 2017. Inducing
eled by combining both aspects, one might eventu- semantic micro-clusters from deep multi-view rep-
resentations of novels. In Proceedings of the Con-
ally want to infer a missing modality by sampling
ference on Empirical Methods in Natural Language
or generation from the model. We plan to work Processing. pages 1873–1883.
on these improvements for future work, and also
Cong Duy Vu Hoang, Trevor Cohn, and Gholamreza
explore evaluation methods which go beyond lan-
Haffari. 2016. Incorporating side information into
guage model perplexities, and capture model as- recurrent neural network language models. In Pro-
pects closer to the task and domain. ceedings of the 2016 Conference of the North Amer-
ican Chapter of the Association for Computational
Linguistics: Human Language Technologies. pages
References 1250–1255.
David Bamman, Brendan O’Connor, and Noah A. Sepp Hochreiter and Jürgen Schmidhuber. 1997.
Smith. 2013. Learning latent personas of film char- Long short-term memory. Neural Computation
acters. In Proceedings of the 51st Annual Meet- 9(8):1735–1780.
ing of the Association for Computational Linguis-
tics. pages 352–361. Mohit Iyyer, Anupam Guha, Snigdha Chaturvedi, Jor-
dan Boyd-Graber, and Hal Daumé III. 2016. Feud-
David Bamman, Ted Underwood, and Noah A. Smith. ing families and former friends: Unsupervised learn-
2014. A bayesian mixed effects model of literary ing for dynamic fictional relationships. In Proceed-
character. In Proceedings of the 52nd Annual Meet- ings of the Conference of the North American Chap-
ing of the Association for Computational Linguis- ter of the Association for Computational Linguistics:
tics. pages 370–379. Human Language Technologies. pages 1534–1544.
Nathanael Chambers and Dan Jurafsky. 2009. Unsu- Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber,
pervised learning of narrative schemas and their par- and Hal Daumé III. 2015. Deep unordered compo-
ticipants. In Proceedings of the Joint Conference of sition rivals syntactic methods for text classification.
the 47th Annual Meeting of the ACL and the 4th In- In Proceedings of the 53rd Annual Meeting of the
ternational Joint Conference on Natural Language Association for Computational Linguistics and the
Processing of the AFNLP. pages 602–610. 7th International Joint Conference on Natural Lan-
guage Processing. pages 1681–1691.
Jonathan Chang, Jordan Boyd-Graber, and David M.
Blei. 2009. Connections between the lines: Aug- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A
menting social networks with text. In Proceedings method for stochastic optimization. In Proceedings
of the 15th ACM SIGKDD International Conference of the International Conference on Learning Repre-
on Knowledge Discovery and Data Mining. pages sentations.
169–178.
Neil McIntyre and Mirella Lapata. 2009. Learning to
Snigdha Chaturvedi, Shashank Srivastava, Hal tell tales: A data-driven approach to story genera-
Daumé III, and Chris Dyer. 2016. Modeling tion. In Proceedings of the Joint Conference of the
evolving relationships between characters in lit- 47th Annual Meeting of the ACL and the 4th Interna-
erary novels. In Proceedings of the Thirtieth tional Joint Conference on Natural Language Pro-
AAAI Conference on Artificial Intelligence. pages cessing of the AFNLP. pages 217–225.
2704–2710.
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Cor-
Micha Elsner. 2012. Character-based kernels for nov- rado, and Jeffrey Dean. 2013. Distributed represen-
elistic plot structure. In Proceedings of the 13th tations of words and phrases and their composition-
Conference of the European Chapter of the Associa- ality. In Proceedings of the Conference on Neural
tion for Computational Linguistics. pages 634–644. Information Processing Systems.

712
Tomas Mikolov and Geoffrey Zweig. 2012. Context
dependent recurrent neural network language model.
In Proceedings of IEEE Spoken Language Technol-
ogy Workshop. pages 234–239.

713

Das könnte Ihnen auch gefallen