Can Computers Create Humor?: Graeme Ritchie

Articles
Can Computers
Create Humor?
Graeme Ritchie
A
n Despite the fact that AI has always been
adventurous in trying to elucidate complex
aspects of human behavior, only recently has lthough the terms creative and humor are both very broad
there been research into computational model- in what they can cover, there is little doubt that the construc-
ing of humor. One obstacle to progress is the tion of humor is generally regarded as creative, with the most
lack of a precise and detailed theory of how
successful creators of humor sometimes being hailed as “comic
humor operates. Nevertheless, since the early
geniuses.” This suggests that the task of getting a computer to
1990s, there have been a number of small pro-
grams that create simple verbal humor, and produce humor falls within the area of computational creativi-
more recently there have been studies of the ty, and any general theory of creativity should have something
automatic classification of the humorous status to say about humor.
of texts. In addition, there are a number of This article reviews some of the work in computational
advocates of the practical uses of computation- humor, makes some observations about the more general issues
al humor: in user interfaces, in education, and that these projects raise, and considers this work from the view-
in advertising. Computer-generated humor is
point of creativity, concluding with an outline of some of the
still quite basic, but it could be viewed as a form
challenges ahead.
of exploratory creativity. For computational
humor to improve, some hard problems in AI
will have to be addressed.
The Place of Humor within AI
The use of humor is a major factor in social and cultural life
throughout the world, influencing human behavior in various
ways (as well as giving rise to the whole industry of comedy).
Despite this, the cognitive and emotional processes involved in
creating or responding to humor are still unexplained. Thus
humor constitutes a complex but baffling human behavior, inti-
mately related to matters such as world knowledge, reasoning,
and perception; on the face of it, this should be an obvious tar-
get for AI research. After all, the remit of artificial intelligence,
since its very beginning, has been to make computers behave in
the ways that humans do. The proposal for the historic Dart-
mouth Conference in 1956, often taken as the founding event
in AI, argued:
… that every aspect of learning or any other feature of intelligence
can in principle be so precisely described that a machine can be
Copyright © 2009, Association for the Advancement of Artificial Intelligence. All rights reserved. ISSN 0738-4602 FALL 2009 71
Articles
made to simulate it. (Charniak and McDermott els that are very limited in their scope, but that can
1985, 11) at least be implemented. Such programs typically
Despite this, “intelligence” was rather narrowly make little or no use of theories of humor. Occa-
understood as being primarily about reasoning and sionally, computational humorists will acknowl-
about acquiring concepts (learning). Emotional edge some theoretical framework, but it can be
aspects of human behavior were not deemed a suit- hard to see how the actual details of the imple-
able matter for AI research until about 1990, and mentations follow from the theoretical ideas cited.
mainstream work in AI completely ignored humor An exception to this trend is the program of Tin-
for the first 35 years or so of the field’s existence. holt and Nijholt (2007), which uses a mechanism
Two major encyclopedias in AI and cognitive sci- derived from the Raskin (1985) criterion for a text
ence (Shapiro 1992, Wilson and Kidd 1999) have being a joke; this is discussed in more detail later
no sections or index entries for “humo(u)r,” and on.
there is no sign of this topic in the table of con- Humor-generating programs are typically
tents of a more recent reference work (Dopico, designed by examining some extremely small area
Dorado, and Pazos 2008). Of course, it could be of humor, abstracting some structural patterns
argued that the creation of humor is not truly from existing examples (for example, jokes found
“intelligence,” but it is certainly a complex and in joke books), formulating computationally man-
subtle form of human behavior, with both cogni- ageable rules that describe these patterns, and
tive and emotional aspects, the mechanisms for implementing an algorithm to handle these rules.
which are far from obvious. AI no longer avoids Does this mean that computational projects
the topic of humor completely, but computational cannot contribute to wider theories of humor?
modeling of humor is still at an early stage, with Opinions are divided on this. The implementors of
few established precepts or results. In assessing the the existing programs might argue that even very
state of the field, it is important to realize that this small studies can throw up interesting insights into
subarea of AI has received few resources and rela- the nature of humor, gradually chipping away at
tively little effort. Hardly any computational the mystery and slowly extending the scope of the
humor projects have received external funding investigation to wider and more general areas; this
from mainstream grant-awarding bodies (probably position is espoused by Ritchie (2004). Sceptics
the largest examples are those described by Stock would argue that these tiny implementations are
and Strapparava [2003] and by Manurung et al. overly narrow and based on extremely ad hoc
[2008]). mechanisms, which means that they tell us noth-
ing about any other forms of humor. The latter
position has been vigorously expressed by Raskin
Where’s the Theory? (1996, 2002), who argues that only by adopting
The centuries preceding the birth of AI witnessed linguistically general and theoretically based con-
discussions of humor by such famous names as cepts can computational humor progress.
Cicero, Kant, Schopenhauer, and Freud (see Mor-
reall [1987] and Attardo [1994] for historical
reviews) so it might seem that the starting point for
Humor in Language
computational work should be the current theory Humor out in the everyday world appears in many
of humor. Why should AI researchers try to rein- forms: visual gags, physical slapstick, funny sound
vent what these great minds have provided? effects, unexpected twists in situation comedies,
Unfortunately, it is not as simple as that. Although and so on. However, it is very common for humor
there are many books and articles on humor from to be conveyed in language: funny stories, witty
the perspective of philosophy and literature (and— sayings, puns, and so on. A great deal of academic
more recently—of psychology), none of that has research into humor has focused on humor
resulted in agreement about a theory of humor. expressed in language, and this is not surprising, as
Nor is it the case that there are several competing there are some very good reasons for concentrat-
theories to choose from, if “theory” means a ing on this genre. Even without the added compli-
detailed, precise, coherent, empirically grounded cations of vision, physical movement, and so on,
set of principles that explain the known phenom- verbally expressed humor still displays a very rich
ena. Although the published works on humor pro- and interesting range of phenomena to study, and
vide stimulating ideas about what factors may be the surrounding culture provides a wide variety of
involved in humor, they do not provide the level established genres for textual humor, with lavish
of detail, or the clear definition of concepts, that quantities of data available for study (for example,
would be required to formulate a computer model. joke books, collections of aphorisms). This means
In view of this lack of a directly usable framework, not only that there is a plentiful supply of data, but
researchers have applied conventional AI method- also that the examples are attested, in the sense that
ologies to the problem of humor, developing mod- the researcher knows that the particular texts are
72 AI MAGAZINE
Articles
recognized by society as actually being instances of some very simple types of wordplay jokes (puns)
humor. Data (that is, examples of humor) are rely on linguistic properties that are structural and
much easier to record and pass around in the form relatively close to the “surface” (that is, actual
of texts. In contrast, complete depictions of acts of words and grammatical units) rather than being
slapstick (particularly if not performed for the cam- “deeper” (for example, abstract concepts of incon-
era) are harder to set down. Moreover, there are rel- gruity or of social appropriateness).
atively good formal models of the more obvious Probably the earliest actual joke generator is that
aspects of language, because linguists have devel- of Lessard and Levison (1992), which built very
oped a wealth of terminology and concepts for simple puns such as this one, in which a quoted
describing the structure of chunks of text (spoken utterance puns with an adverb:
or written). This means it is much easier to provide “This matter is opaque,” said Tom obscurely.
a detailed account of the contents of a humorous The next stage of development was punning riddles,
passage. which began emerging from computer programs
These advantages apply even more strongly in 1993. A punning riddle is a short joke that con-
when it comes to the computational treatment of sists of a question and a one-line answer, with
humor, as it is important to be able to represent the some sort of wordplay involving the answer.
data in machine-readable form, and it is helpful to Another very small text generator by Lessard and
have some formal model of the structure of the Levison (1993) produced examples such as the fol-
data items. All the working humor programs dis- lowing:
cussed here handle textual humor only; there are
What kind of animal rides a catamaran?
almost no computer programs dealing with non-
A cat.
textual humor (but see Nack [1996] and Nack and
Parkes [1995] for another approach). Around the same time, the joke analysis and pro-
Even within verbally expressed humor, there is a duction engine (JAPE) program (Binsted 1996; Bin-
further subclassification routinely made by humor sted and Ritchie 1994, 1997) was constructed. It
scholars: humor created by the language form itself could spew out hundreds of punning riddles, one
(often called verbal humor) and humor merely con- of the better ones being:
veyed by language (often called conceptual or refer- What is the difference between leaves and a car?
ential humor). Attardo (1994) cites many authors, One you brush and rake, the other you rush and
as far back as Cicero, who have made this distinc- brake
tion. Verbal humor depends directly on the lin- JAPE was a conventional rule-driven program,
guistic features of the words and phrases used: implemented in Prolog, using a relatively small
puns, use of inappropriate phrasings, ambiguities, number of pattern-matching rules to compose the
and so on. Such humor relies on the particular riddle structures and using a conventional diction-
characteristics of one language, and so may be dif- ary based on WordNet (Miller et al. 1990; Fellbaum
ficult or (usually) impossible to translate. In con- 1998) to supply words and two-word phrases. The
trast, conceptual or referential humor involves all large size of the dictionary (more than 100,000
sorts of events and situations and may be con- entries) meant that JAPE could generate a vast
veyed in any medium: cartoons, video, text, number of texts, although only a few of them were
speech, and so on. Where a textual joke depends good jokes, and many were incomprehensible
on referential humor, it can generally be translat- (Binsted, Pain, and Ritchie 1997).
ed into another language without distorting or los- A few years later, Venour (1999) built a pun gen-
ing its humorous properties. erator that could construct simple two-word phras-
es punning on a preceding setup sentence, as in
Verbal Humor
the following:
Traditional symbolic AI can be seen as an attempt The surgeon digs a garden. A doc yard.
to reduce some complex behavior (game playing,
reasoning, language understanding, and so on) to Another small-scale pun generator was that of
the manipulation of formal relationships between McKay (2002), whose program could build simple
abstract entities. Computational humor programs one-line puns like the following one:
are no exception. They are all based on the The performing lumberjack took a bough.
assumption that the humorous items under con- All the early pun generators were programmed
sideration can be characterized by a statement of without reference to any general plan of how a
various formal relations between parts of those punning program should be constructed (since
items or some underlying abstract entities. The there was no precedent), and there was no obvious
interesting question is: what sorts of entities and commonality in their components. The architec-
relations are involved? The earliest humor-gener- ture adopted by the JAPE program, however, has a
ating programs (see Ritchie [2004, chapter 10] for relatively clean modular structure (and formed the
a review) were founded on the observation that basis for a later, larger system, System to Augment
FALL 2009 73
Articles
Nonspeakers Dialogue Using Puns (STANDUP) the others to find an incorrect candidate that is
[Manurung et al. 2008]). With hindsight, this “opposed” to the correct one. Two potential refer-
architecture could be used quite naturally to pro- ents are deemed to be opposed if they have most of
duce the output of the other simple pun builders. their properties in common according to Concept-
Although most computer-generated verbal Net (Liu and Singh 2004) but also have at least one
humor consists of puns, there has also been a non- property with opposite (antonymous) values. If
punning system that may lie on the boundary of these criteria are met, the program generates a
verbal and referential humor: the HAHAcronym pseudoclarificatory question asking which of these
program (Stock and Strapparava 2003, 2005). This two potential antecedents is involved in the rela-
system can perform two tasks: given an existing tion mentioned in the text. Tinholt and Nijholt
acronym (for example, DMSO, for defense modeling tested this method on chatterbot transcripts and
and simulation office), it computes alternative words story texts, comparing its identification of joke
with these initials to form a whimsical phrase (for opportunities with a manual annotation of the
example, defense meat eating and salivation texts. They state that it was “still quite susceptible
office). Alternatively, given two words that indicate to errors” so that it was “unfeasible to use the sys-
some concept, it finds an English word pertinent to tem in existing chat applications,” but opine that
the concept whose letters are the initials of a this method “does provide future prospects in gen-
phrase also (humorously) relevant to the concept. erating conversational humor.”
For example, {fast, processor} yields TORPID, for
Traitorously Outstandingly Rusty Processor for Inad-
vertent Data processing.
Could a Computer
Learn to Make Jokes?
Referential Humor
A dominant paradigm in AI at present is that of
On the whole, computational humor has concen- machine learning (in various forms). This raises
trated, so far, on verbal humor, although there the question: could a learning program develop
have been a few small forays into referential joke-making skills? There are certainly plenty of
humor. Bergen and Binsted (2003) examine a sub- positive instances (jokes) to use for training.
class of joke based on exaggerated statements on Mihalcea and Strapparava (2006) carried out a
some scale, such as the following, and Binsted, number of studies in which various machine-learn-
Bergen, and McKay (2003) briefly allude to imple- ing and text-classification methods were applied to
menting a generator of such jokes. large quantities of textual data, both humorous
It was so cold, I saw a lawyer with his hands in his and nonhumorous. The aim was to see whether it
own pockets. was possible to evolve classifiers (of various kinds)
Stark, Binsted, and Bergen (2005) describe a pro- that distinguish humorous texts from nonhumor-
gram that can select (from a short list of options) ous. They used a collection of one-liner jokes
the most suitable punchline for a joke of the type found on the web and four nonhumorous datasets:
illustrated as follows: Reuters news titles, proverbs, ordinary sentences
I asked the bartender for something cold and filled from a large corpus of English (the British Nation-
with rum, so he recommended his wife. al Corpus [Leech, Rayson, and Wilson 2001]), and
“commonsense” statements from a knowledge
However, this appears to be a very small imple-
base (Singh 2002). Using certain stylistic features
mentation that works on one or two examples
(alliteration, antonymy, adult slang), their pro-
using a very specific knowledge base.
gram automatically constructed a decision tree for
Tinholt and Nijholt (2007) describe a small pro-
each pair of humorous data and nonhumorous
totype system for generating jokes in the form of data sets (that is, there were four different compar-
deliberately misunderstood pronoun references isons). The accuracy of the classification learned
(see also Nijholt [2007]). Such humor is exempli- varied from 54 percent (humor versus proverbs) to
fied by the following exchange within a Scott 77 percent (humor versus news titles). They also
Adams cartoon:1 tried using aspects of the content of the text, with
“Our lawyers put your money in little bags, then we two other classifier techniques (naïve Bayes, sup-
have trained dogs bury them around town.” “Do port vector machine), obtaining broadly similar
they bury the bags or the lawyers?’ results in the two cases: accuracy of 73 percent for
(The actual cartoon has a further punchline: humor versus ordinary sentences, and 96 percent
“We’ve tried it both ways.”) Tinholt and Nijholt’s for humor versus news titles. Combining the sty-
process works as follows. If a pronoun occurs with- listic and content features hardly improved per-
in a text, a general antecedent-finding program formance at all. Mihalcea and Pulman (2007) dis-
(Lappin and Leass 1994) fetches all candidate cuss these results, concluding that certain features
antecedents, including the proposed correct one. show up as more salient in the computed classifi-
The correct antecedent is compared with each of cations: human-centered vocabulary, negation,
74 AI MAGAZINE
Articles
negative orientation, professional communities, program, which takes in a collection of existing

and human weakness. humorous items and, by some process of general-
Mihalcea and Strapparava’s experiments suggest ization, manages to construct further humorous
that humorous texts have certain gross textual items that are noticeably distinct from the original
properties that distinguish them from other texts, input, then that is a significant achievement. (It is
to the extent that it might well be possible to have also one of the types of situation that stimulates
a program guess, with much better accuracy than interesting debates about ascribing creativity to
chance, whether a text was a joke or not. While programs [Ritchie 2007]).
this is interesting, there is some way to go before On the other hand, if the program merely recy-
such findings can give rise to a humor-creation sys- cles the input items, perhaps with some superficial
tem (and this is not what Mihalcea and Strappara- rearrangement or packaging, then that is not very
va were proposing). It is possible to imagine some interesting, as it casts no light on how humor can
form of “generate-and-test” architecture, or an evo- be created. Hence, programs that do not aspire to
lutionary algorithm, into which a text classifier the kind of learning model just sketched should
might be fitted. Even if such an arrangement could not accept humorous items within their input.
be made to work, it would be quite hard to argue This is because a program can embody a computa-
that this demonstrates creativity of any depth. tional model of humor creation only if it con-
The main problem is that the automatic classi- structs the humor from nonhumorous material.
fiers detect correlations (loosely speaking) between The whole point of developing models of humor,
certain features and humor but do not determine in the general tradition of reductionist science, is
causality from features to humor. A text could be to decompose the nature of humor creation into
replete with the types of feature listed by Mihalcea some configuration of other (nonhumorous)
and Pulman without being funny. processes. Furthermore, there should not be some
module within the system whose function is to
create humor and that is a “black box,” in the
The Methodology of
sense that its inner workings are not known, or not
Humor Generation documented, or not regarded as part of the pro-
Although it is exciting to think of humorous proposed model. If the reason that the program man-
grams as being models of creative or witty humans, ages to produce humor is because it has a box
perhaps even displaying a “sense of humor,” there inside it that (mysteriously) “creates humor,” that
is a less glamorous, but potentially very helpful, would not illuminate the problem of humor cre-
perspective on the relationship between computa- ation. Instead, this would introduce a form of
tion and humor research. Most of the generating regress, in which there is now a new problem: how
programs built so far have not been embedded in does that black box achieve its results?
some larger cognitive model, do not function in an This is subtly different from allowing the pro-
interpersonal context, and perform no specific gram to have knowledge that is about how to cre-
application task (although Tinholt and Nijholt’s ate humor, perhaps localized to one module. That
pronoun misunderstander attempts to move out- is completely acceptable: somewhere in the pro-
side these limitations). Instead, they simply follow gram there must be a capability to form humorous
symbolic rules to produce output. In this way, they items. If the model is to cast light on the nature of
function as testbeds for whatever formal models of humor, it is essential to know what that knowledge
joke structure are embodied in their implementa- is and how it is used. This means that the builder
tion. Just as there was once a vogue for “grammar- of the humor generator should be quite clear about
testing” programs that checked the consequences which parts of his system, or which of the knowl-
of a linguist’s proposed formal grammar (Friedman edge sources upon which the program relies, or
1972), a joke generator tests the implications of a which interactions between modules are regarded
formal model of joke mechanisms. Although the as central to creating the humor and which are
programs do not yet connect to general theories of merely supporting facilities.
humor, this testbed role would be very valuable This is illustrated in the early pun generators.
where the implementation was based on a pro- These systems crucially had to find words that were
posed theory. in a well-defined relationship, such as sounding
When testing a formal model, the program can alike. These items are the core of the pun, but oth-
be run under laboratory conditions, since it is not er text must be created around them to set up the
intended as a simulation of a human joke teller in pun. Within certain constraints on the internal
a real context. Under these circumstances, certain coherence of the overall text, there could be many
methodological principles follow (some of which such setups for the same central pun. For example,
are more generally applicable to creative programs). given two core lexical items in which one sounds
The first issue is the status of humorous input to similar to the start of the other, such as pylon and
the program. If the system is some sort of learning pa, the answer for a punning riddle can be con-
FALL 2009 75
Articles
structed, “a pa-lon,” and the question for the riddle from the set of all possible values, or the data items
can be framed by asking about some entity that might be selected according to some principles,
has properties of both the named concepts. This such as featuring in certain linguistic sources in
can be done using other words that are semanti- some particular way (for example, frequency of
cally related to these two words, such as hyper- occurrence in a large collection of naturally occur-
nyms or synonyms, but the question can be ring texts). What should not be done, in the inter-
formed in various ways, such as “what kind of tow- ests of objective testing, is for the parameters to be
er is a dad?” “what do you get when you cross a tower selected in an ad hoc manner by hand, as there is
with a father?” “what do you call a tower with a fam- the risk of subjective estimates of “suitable values”
ily?” and so on. influencing the choice.
Hence the pun generators can be viewed as hav- If a system produces only a small number of
ing central modules that select suitable words and items, then there is no problem about deciding
set out the minimum needs for the setup, but there which of these to test: all of them can be tested,
are also basic text-building modules that supply giving an exhaustive assessment of the program’s
the surrounding context to meet that requirement. output. If, on the other hand, the system can pro-
Any purely supporting facilities should not only duce large amounts of data then the next question
be nonhumorous, they should, as far as possible, is: which output items are to be subjected to test-
be general purpose. That is, they should not be too ing? To avoid bias creeping in, some form of ran-
finely tuned for performing the supporting role in dom selection is probably necessary. If the testing
the program, since there is a risk that such tuning mechanism is constrained in some way (for exam-
might stray accidentally into covert injection of ple, only certain other texts are available as control
humorous content in a way that would obscure the items), it may be necessary to restrict the choice of
real contributions of the parts of the model. output data for compatibility with the available
Although the text-generation capabilities of past control items, but the selection should be as objec-
pun generators are usually not wholly general text- tive as possible.
generation programs, they are not fine-tuned for The most obvious, and still the best, way to test
puns, in that there are conventional language-gen- humor output is with human judges. The exact
eration programs, for real practical applications, details of how to elicit judgments about the texts
that use equally crude and narrowly targeted rules from appropriate humans (for example, children
(Reiter and Dale 2000). Lessard and Levison, and for child-oriented humor) need to be planned, but
Venour, made use of a general text generator, Vin- this area is not mysterious, as this type of study is
ci (Levison and Lessard 1992), and the the lexicon routine within psychology. There should be con-
used by Binsted was a wholly general-purpose dic- trol items alongside the program output items, the
tionary (Miller et al. 1990) containing the kind of order of presentation of items should be varied, the
information that linguists would normally associ- mix of types of items should be balanced, the ques-
ate with words and phrases (grammatical cate- tions to the judges should be carefully phrased,
gories, relations such as synonymy, and so on). It there should be a sufficiently large number of sub-
might be that all the knowledge resources are “sup- jects, and so on. There is thus no obstacle in prin-
porting,” in the sense that the humor is created ciple to assessing the quality of computer-generat-
entirely from the interaction of various types of ed humor, but most implementations have been
nonhumorous knowledge; in that case, the “recipe on such a small and exploratory scale (for exam-
for humor” would be in the interaction (as defined ple, as student projects) that there have been few
by the program designer), and this should be made thorough evaluations. Probably the fullest and
completely explicit and open to scrutiny. most systematic testing was the evaluation of the
The testing of humor-creating programs is also a JAPE program (Binsted, Pain, and Ritchie 1997).
matter worth reflecting upon, if such evaluations
are to illuminate the phenomenon of humor. The
pun generators summarized earlier do not have
Practical Applications
any real notion of an “input parameter”—they Science is not the only motivation for computa-
simply run with all their knowledge bases every tional modeling of humor: there is also engineer-
time. In such cases, there is no question of selecting. From an early stage, there has been the idea
ing values for the input (beyond the general that computer-generated humor might be of prac-
hygiene principles outlined above, concerning the tical use. It is clear that AI is a long way from the
nature of the knowledge bases). However, if the automated writing of comedy scripts or writing
program can be run on specific data items—as in gags for comedians, but there have been some pro-
the case of the HAHAcronym system—then any posals for the use of humor for real-world tasks.
trials of the system must have a rational and docu- Strapparava, Valitutti, and Stock (2007) describe
mented way of selecting the values used. For exam- some explorations of the possibility of using com-
ple, the input values might be randomly selected putational pun making in the “creative” develop-
76 AI MAGAZINE
Articles
ment of an advertising campaign, but as yet that by Binsted, Bergen, and McKay (2003), although
area is unexploited. The two main types of appli- neither provides a design for a real tutoring system.
cation that have been advocated and at least ten- Since existing computational humor is mostly con-
tatively explored are friendlier user interfaces and cerned with language, it is relatively plausible to
educational software. argue that a program that manipulates textual
Binsted (1995) considered whether a natural lan- items for humorous effect may well help a student
guage user interface for a computer system would to learn about language. For example, a simple pun
be more congenial if the system could make use of is usually based on similarity of sound, often uses
humor to lessen the impact of inadequacies in its ambiguity, and may rely on the fact that two dif-
performance (for example, lack of factual knowl- ferent words mean the same. Thus a schoolchild
edge, poor language understanding, slow who encounters such a pun may absorb some
response). She suggested a number of types of understanding of the workings of the language,
humor that might work: self-deprecating humor while being motivated by the “fun” aspect of the
(the system mocking its own failings), wry obser- experience. It is also possible to automate the
vations (for example about the system’s lack of explanation of why a particular text qualifies as a
speed), whimsical referring expressions. Stock pun; for example, McKay (2002) shows how his
(1996) developed this theme and, in passing, men- pun generator could outline the links behind the
tioned the possible application areas of advertising jokes. Such explicit pedagogy, sweetened by the
and “edutainment.” amusing material, might be quite effective.
These two essays were essentially speculative One educationally motivated joke-generation
and preceded the experimental results of Morkes, system is STANDUP (Manurung et al. 2008), in
Kernal, and Nass (1999), who found that users which the ideas of Binsted’s JAPE were reimple-
reacted more positively to a computer in which the mented as the core of an interactive, child-friend-
user interface contained examples of humor (not ly, robust, and efficient riddle generator.
computer generated, sadly). Nevertheless, Binsted The aim was to create a “language playground”
and Stock raise some valid general points. The for children with complex communication needs
humor will probably be more effective if it is attrib- (that is, impairments of motor or cognitive ability
uted to some sort of quasi-human agent that is not that impede normal language use). The system was
synonymous with the entire computer system. In evaluated by a number of children with cerebral
that way, frivolous wisecracks can be seen as sepa- palsy who had limited literacy and conversational
rate from the main task, and any failings of the skills. Overall, the children were greatly engaged
computer system are not associated with the pur- by being able to interact with the software and
veyor of humor, and vice versa—poor attempts at build simple puns and appeared to become more
humor are not seen as symptoms of a useless over- lively and communicative (Black et al. 2007;
all system. Conversely, if there are independent Waller et al. 2009). However, that study was small
reasons for having a lifelike artificial agent as part and so it is hard to draw firm conclusions. The
of the interaction, then imbuing it with humorous STANDUP system is capable of producing literally
skills will help to make it more lifelike and likeable millions of different jokes, but the overall quality
(Nijholt 2002). Humorous remarks that are of poor of its output has not been tested.
quality, or are contextually inappropriate, will be As with user interfaces, the arguments in favor
counterproductive, being likely to irritate rather of joking software tutors are plausible, but the state
than engage the user. This is particularly pertinent of the art does not make it easy to build working
to the argument for implementing humorous systems, let alone perform rigorous trials.
interfaces at the moment, because the current state
of computational humor makes it extremely hard
to generate automatically humor that is of good
How Creative Are These Systems?
quality and moderate relevance. This means that A central question remains: to what extent are
the arguments in favor of having humor-capable computational humor programs creative? By adopt-
interfaces are in some contrast to the abilities of ing a rather trivial notion of creativity, which
the field to provide humor generation of a suffi- demands only the production of novel items (com-
cient quality to make implementation a reality. pare Chomsky’s arguments that routine human
It is often argued that the use of humor within language use is inherently creative), then the ver-
education can have beneficial effects, although the dict is simple. But creativity means more than sim-
empirical evidence is mixed (Martin 2007, chapter ply producing hitherto unseen material. It is not
11). The idea that computational humor might be possible to do justice here to the long-standing
used in interactive teaching software was men- debate about when a computer program can be
tioned in passing by Binsted (1996), and Stock seen as creative (Boden 1998, 2003; Ritchie 2007;
(1996) alludes to the scope for edutainment. This Colton 2008), but a few humor-specific remarks are
motivation is also invoked by McKay (2002) and in order.
FALL 2009 77
Articles
For humans, simply appreciating humor—of for, even if the programs are exploring an already
which everyone is capable—is not usually regard- structured space. In the current state of the art,
ed as creative. Hence, even where a computer pro- there is plenty of room for improvement within
gram manages a first step toward appreciation this type of humorous creativity, without yet con-
(namely, recognizing that an item is humorous), it sidering what it would mean to be radically origi-
is hard to argue that the software is being creative, nal (“transformational”). Humor generators
no matter how effectively it performs this step. should show that they are proficient at walking
Humans who produce novel humor (particular- before they attempt to run.
ly humor of a high standard) are usually deemed to
be creative, so perhaps humor-generating pro-
grams should have similar status. For programs,
The Challenges Ahead
the results so far show some promise in the area of Twenty years ago, computational humor was
novelty, an important component of creativity. unheard of, and—as noted at the start of this arti-
Boden (2003) makes the useful distinction between cle—little attention was being paid to humor with-
artifacts that are novel for the creator (for example, in AI. In such a setting, it was an interesting step to
where an inexperienced mathematician proves a set down the simple structural properties of certain
result, not realizing that it is already known in the kinds of basic puns, and hence automatically con-
field), as opposed to results that are novel for the struct texts that had the gross attributes of jokes.
whole culture (such as the development of cubist This showed that not all humor was beyond the
art). The former type of achievement can lead to P- reach of AI and that there was an “abstract syntax”
creativity (“personal”), whereas the latter are candi- to certain classes of joke. However, it is difficult to
dates for H-creativity (“historical”). Where do joke- argue now that building another pun generator
generating programs stand in this respect? Some of would, in itself, be a significant step forward,
the programs definitely produce jokes that have unless it introduced some nontrivial step forward
not been seen before, so they have at least the min- in the direction of wider generality.
imum qualification for an accolade of H-creativity. Even in the case of just verbal humor (wordplay,
Unfortunately, these programs score very poorly and so on), there are two ways in which computer-
on the scale of quality, another factor that is regard- generated jokes such as puns could be more inter-
ed as highly important for an overall verdict of esting.
being creative. The created texts generally consist First, there is the issue of contextual relevance.
of relatively feeble humor. Although attempts at joke generation have implic-
Whereas “novelty” could simply mean that the itly been tackling the issue of “quality”—since the
specific created entity has not been encountered aim is to create jokes that are as good as possible—
before, there is a more demanding criterion, which the joke generators are usually designed to create
concerns the extent to which the created entity is single texts in isolation. That is, these are “self-con-
radically different from previous artefacts of this tained” puns, in which the central words or phras-
kind; this could be labelled “originality.” Boden es are directly related to other words in the same
has proposed that there are two qualitatively dif- text. In none of these cases is an utterance created
ferent forms of creativity, determined by the extent that somehow links to a realistically complex con-
to which the act of creation overturns previous text of use. But such contextually related wise-
norms. The first, exploratory creativity, involves the cracks are part of the essence of punning in every-
navigation of a structured space of possibilities, day life (at least in those cultures where punning
forming new artifacts according to existing guide- occurs freely). The existing pun generators are
lines (perhaps abstract, unwritten, unacknowl- competing with “canned” puns, such as those
edged). More striking results will come from trans- found in some children’s comics, or—in Britain—
formational creativity, in which the existing Christmas crackers, but have not begun to imitate
structured space is significantly altered. From this contextual wisecracking. Loehr (1996) presented a
perspective, the joke machines can at best aspire to preliminary study in using keyword matching to
exploratory creativity: they rely on a small set of select JAPE-generated jokes for insertion in a dia-
hand-crafted rules, and the joke texts are the logue, but that was a very rough and basic trial that
results of applying these rules in different combi- did not really grapple with the complexities of
nations or with different data values. However, contextual relevance.
much human-created humor can also be viewed in There might be some ways to widen the range of
this way—it is rare for humor to cross established computational punning (see Ritchie [2005] for
boundaries or restructure the space of humorous some suggestions), but at the moment the state of
possibilities. Exploratory creativity, in this sense, is the art is still limited. This has consequences for
not trivial, and can even reach high levels of qual- the short-term prospects of using current software
ity. This means that there are still demanding stan- in applications where impromptu and natural joke
dards for computer generation of humor to aim making might provide an appearance of lifelike
78 AI MAGAZINE
Articles
behavior, as discussed earlier. Relevance of content scripts (complex knowledge structures) that,
is just one of two criteria that Nijholt (2007) lists as although having a large portion in common (an
central to the contextually appropriate use of overlap), also differ crucially in certain aspects (are
humor; there is also the larger question of whether opposite). This could come about, for example, by
it is acceptable to use humor at a particular stage of the text being ambiguous. The SSTH then provides
the conversation. This latter issue is extremely dif- a list of what counts as “opposite.” The pronoun
ficult, requiring high-level cultural and social misunderstander of Tinholt and Nijholt, discussed
knowledge, and goes far beyond the very low-lev- earlier, is heavily influenced by the SSTH, in its
el mechanical approach currently used in compu- method for comparing linguistic structures for
tational humor. It is still difficult for designers of humorous potential.
interactive computer systems to ensure that Any solution to the incongruity question,
human-computer interactions flow naturally when whether from the SSTH or elsewhere, should help
all that is required is nonhumorous discourse. If to widen the focus of computational humor to
the interface is to allow spontaneous humor, other forms of humor. As noted earlier, computer
which may violate the norms of informative com- generation of humor has largely concentrated on
munication, the task becomes even more compli- “verbal” humor, because the crucial relationships
cated. in the textual structures are primarily lexical, and
The second possible improvement would be in hence can be computed. However, referential
the quality of the humor. Although existing pro- humor constitutes a very large proportion of the
grams produce structurally well-formed puns, most humor that appears in our culture.
of the examples are not at all funny, and some Even if there was a good definition of incon-
barely qualify as jokes (as opposed to unmotivated gruity (and its resolution), referential humor is still
ad hoc wordplay). If some abstract model could be vastly more challenging than simple punning.
devised that resulted in a program consistently Stock (1996) refers to fully developed humor pro-
producing genuinely funny output, that would be cessing as being “AI-complete”—that is, a full com-
a step forward not only for possible applications putational account of humor will not be possible
but also for humor theory. That is, research must without a solution to the central problems of AI.
go beyond mere structural well-formedness in This view is echoed by Nijholt (2007). The prob-
order to produce funny output consistently. What lems become clear with even a superficial inspec-
might lead to such improvements in funniness? It tion of quite simple jokes, which typically involve
has often been argued that incongruity is a vital fac- cultural and real-world knowledge, sometimes in
tor in humor, possibly combined with some resolu- considerable quantities. Most referential humor
tion of that incongruity (Attardo 1994, Ritchie consists of short stories (ending in a punchline),
2004), but there is still no formal and precise defi- but AI has yet to produce a high-quality, fully auto-
nition of what constitutes incongruity (or resolu- matic story generator, let alone one that can per-
tion). A typical dictionary explanation of incon- form subtle tricks of narrative. It is common to
gruity is “want of accordance with what is observe that many jokes rely on ambiguity, but
reasonable or fitting; unsuitableness, inappropri- current text generators have nothing like the
ateness, absurdity; want of harmony of parts or ele- sophistication that would be needed to create
ments; incoherence” (Little et al. 1969). The exactly the right ambiguities for humor. Subtle
knowledge representation techniques employed in manipulation of the audience’s expectations, indi-
AI should be a fruitful framework for stating a rect allusions, and carefully timed revelations, all
more precise and formal definition of incongruity of which are central to many jokes, require infer-
(and perhaps also “resolution”). If such a notion of ence mechanisms of some sophistication. For
incongruity resolution could be shown empirical- example, the following, consisting of just one
ly to result in humor (by using it in a joke genera- short sentence, requires considerable hypothetical
tor), that would be a major advance. reasoning.
One widely proposed account of humorous A skeleton walked into a bar and said: “I’ll have a
incongruity has its origin in the AI of the 1970s, Budweiser and a mop, please.” (Tibballs 2000, No.
when frames (Minsky 1975) and scripts (Schank 297, 49)
and Abelson 1977) were proposed as knowledge An even more vast and challenging aspect of
structures for stereotypical situations and events. humor is still to be tackled by AI. Science fiction
The application of these ideas to humor comes in might tell us that one crucial distinction between
the semantic script theory of humor, or SSTH (Raskin computers and humans is that machines (or soft-
1985), which in its more recent versions (Raskin ware) can never have a sense of humor. It would be
1996, 2002) continues to connect to this strand of hard to collect firm evidence for or against such a
AI work through ontological semantics (Nirenburg view, given the lack of a single accepted definition
and Raskin 2004). The central idea of the SSTH is of “sense of humor” (Ruch 1996, 2007), but com-
that a text is a joke if it is associated with two putational humor is still far from addressing these
FALL 2009 79
Articles
deeper issues. Research still focuses narrowly on Dopico, J. R. R.; Dorado, J.; and Pazos, A., eds. 2008. Ency-
the structural properties of the humorous items clopedia of Artificial Intelligence. Hershey, PA: IGI Global.
(jokes, and so on) without considering any of the Fellbaum, C. 1998. WordNet: An Electronic Lexical Data-
human cognitive or emotional processes. There is base. Cambridge, MA: The MIT Press.
therefore an exciting but unexplored area: the Friedman, J. 1972. Mathematical and Computational
modeling of the effects of humor on the thoughts Models of Transformational Grammar. In Machine Intelli-
and emotions of the recipient. gence 7, ed. B. Meltzer and D. Michie, 293–306. Edin-
burgh: Edinburgh University Press.
Acknowledgements Lappin, S., and Leass, H. J. 1994. An Algorithm for
The writing of this article was partly supported by Pronominal Anaphora Resolution. Computational Lin-
guistics 20(4): 535–561.
grants EP/E011764/01 and EP/G020280/1 from the
Engineering and Physical Sciences Research Coun- Leech, G.; Rayson, P.; and Wilson, A. 2001. Word Fre-
quencies in Written and Spoken English: Based on the British
cil in the UK.
National Corpus. London: Longman.
Note Lessard, G., and Levison, M. 1993. Computational Mod-
1. www.dilbert.com. elling of Riddle Strategies. Paper presented at the Associ-
ation for Literary and Linguistic Computing (ALLC) and
the Association for Computers and the Humanities
References (ACH) Joint Annual Conference, Georgetown University,
Attardo, S. 1994. Linguistic Theories of Humour. Number 1 Washington, DC, 15–19 June.
in Humor Research. Berlin: Mouton de Gruyter.
Lessard, G., and Levison, M. 1992. Computational Mod-
Bergen, B., and Binsted, K. 2003. The Cognitive Linguis- elling of Linguistic Humour: Tom Swifties. In Research in
tics of Scalar Humor. In Language, Culture and Mind, ed. Humanities Computing 4: Selected Papers from the 1992
M. Achard and S. Kemmer. Stanford, California: CSLI Association for Literary and Linguistic Computing (ALLC)
Publications. and the Association for Computers and the Humanities (ACH)
Binsted, K. 1996. Machine Humour: An Implemented Joint Annual Conference, 175–178. Oxford, UK: Oxford
Model Of Puns. Ph.D. dissertation, University of Edin- University Press.
burgh, Edinburgh, Scotland. Levison, M., and Lessard, G. 1992. A System for Natural
Binsted, K. 1995. Using Humour to Make Natural Lan- Language Generation. Computers and the Humanities
guage Interfaces More Friendly. Paper presented at the 26(1): 43–58.
International Joint Conference on Artificial Intelligence Little, W.; Fowler, H. W.; Coulson, J.; and Onions, C. T.,
Workshop on AI and Entertainment, Montreal, Quebec, eds. 1969. The Shorter Oxford English Dictionary on Histor-
21 August. ical Principles, revised 3rd ed. London: Oxford University
Binsted, K., and Ritchie, G. 1997. Computational Rules Press.
for Generating Punning Riddles. Humor: International Liu, H., and Singh, P. 2004. Commonsense Reasoning in
Journal of Humor Research 10(1):25–76. and over Natural Language. In Proceedings of the 8th Inter-
Binsted, K., and Ritchie, G. 1994. An Implemented Mod- national Conference on Knowledge-Based Intelligent Informa-
el of Punning Riddles. In Proceedings of the Twelfth Nation- tion and Engineering Systems (KES 2004), volume 3213 of
al Conference on Artificial Intelligence (AAAI-94). Menlo Lecture Notes in Computer Science, ed. M. Negoita, R.
Park, CA: AAAI Press. Howlett, and L. Jain, 293–306. Heidelberg: Springer.
Binsted, K.; Bergen, B.; and McKay, J. 2003. Pun and Non- Loehr, D. 1996. An Integration of a Pun Generator with a
pun Humour in Second-Language Learning. Paper pre- Natural Language Robot. In Proceedings of the Internation-
sented at the CHI 2003 Workshop on Humor Modeling al Workshop on Computational Humor, 161–172. Enschede,
in the Interface. Fort Lauderdale, April 6. Netherlands: University of Twente.
Binsted, K.; Pain, H.; and Ritchie, G. 1997. Children’s Manurung, R.; Ritchie, G.; Pain, H.; Waller, A.; O’Mara,
Evaluation of Computer-Generated Punning Riddles. D.; and Black, R. 2008. The Construction of a Pun Gen-
Pragmatics and Cognition 5(2): 305–354. erator for Language Skills Development. Applied Artificial
Black, R.; Waller, A.; Ritchie, G.; Pain, H.; and Manurung, Intelligence 22(9): 841–869.
R. 2007. Evaluation of Joke-Creation Software with Chil- Martin, R. A. 2007. The Psychology of Humor: An Integrative
dren with Complex Communication Needs. Communica- Approach. London: Elsevier Academic Press.
tion Matters 21(1): 23–27. McKay, J. 2002. Generation of Idiom-Based Witticisms to
Boden, M. A. 2003. The Creative Mind, 2nd ed. London: Aid Second Language Learning. In The April Fools’ Day
Routledge. Workshop on Computational Humor, 77–87. Enschede,
Boden, M. A. 1998. Creativity and Artificial Intelligence. Netherlands: University of Twente.
Artificial Intelligence 103(1–2): 347–356. Mihalcea, R., and Pulman, S. 2007. Characterizing
Charniak, E., and McDermott, D. 1985. Introduction to Humour: An Exploration of Features in Humorous Texts.
Artificial Intelligence. Reading, MA: Addison-Wesley. In Proceedings of the Conference on Computational Linguis-
Colton, S. 2008. Creativity Versus the Perception of Cre- tics and Intelligent Text Processing (CICLing), Lecture Notes
ativity in Computational Systems. In Creative Intelligent in Computer Science 4394, ed. A. F. Gelbukh, 337–347.
Systems: Papers from the AAAI Spring Symposium, ed. D. Berlin: Springer.
Ventura, M. L. Maher, and S. Colton. AAAI Press Techni- Mihalcea, R., and Strapparava, C. 2006. Learning to
cal Report SS-08-03. Menlo Park, CA: AAAI Press. Laugh (Automatically): Computational Models for
80 AI MAGAZINE
Articles
Humor Recognition. Computational Intelligence 22(2): Shapiro, S. C., ed. 1992. Encyclopedia of Artificial Intelli-
126–142. gence, 2nd ed. New York: John Wiley.
Miller, G. A.; Beckwith, R.; Fellbaum, C.; Gross, D.; and Singh, P. 2002. The Public Acquisition of Commonsense
Miller, K. 1990. Five papers on WordNet. International Knowledge. In Acquiring (and Using) Linguistic (and World)
Journal of Lexicography 3(4). Revised March 1993. Knowledge for Information Access: Papers from the AAAI
Minsky, M. 1975. A Framework for Representing Knowl- Spring Symposium. AAAI Technical Report SS-02-09. Men-
edge. In the Psychology of Computer Vision, ed. P. H. Win- lo Park, CA: AAAI Press.
ton, 211–277. New York: McGraw-Hill.
Stark, J.; Binsted, K.; and Bergen, B. 2005. Disjunctor
Morkes, J.; Kernal, H. K.; and Nass, C. 1999. Effects of
Selection for One-Line Jokes. In Proceedings of First Inter-
Humor in Task-Oriented Human-Computer Interaction
national Conference on Intelligent Technologies for Interactive
and Computer-Mediated Communication: A Direct Test
Entertainment, volume 3814 of Lecture Notes in Comput-
of SRCT Theory. Human-Computer Interaction 14(4): 395–
435. er Science, ed. M. T. Maybury, O. Stock, and W. Wahlster,
174–182. Berlin: Springer.
Morreall, J., ed. 1987. The Philosophy of Laughter and
Humor. Albany, NY: SUNY Press. Stock, O. 1996. “Password Swordfish”: Verbal Humor in
Nack, F. 1996. AUTEUR: The Application of Video Seman- the Interface. In Proceedings of the International Workshop
tics and Theme Representation for Automated Film Edit- on Computational Humor, 1–8. Enschede, Netherlands:
ing. Ph.D. dissertation, Lancaster University, Bailrigg, University of Twente.
Lancaster, UK. Stock, O., and Strapparava, C. 2005. The Act of Creating
Nack, F., and Parkes, A. 1995. Auteur: The Creation of Humorous Acronyms. Applied Artificial Intelligence 19(2):
Humorous Scenes Using Automated Video Editing. Paper 137–151.
presented at the International Joint Conference on Arti- Stock, O., and Strapparava, C. 2003. HAHAcronym:
ficial Intelligence Workshop on AI and Entertainment,
Humorous Agents for Humorous Acronyms. Humor: Inter-
Montreal, Quebec, 21 August.
national Journal of Humor Research 16(3): 297–314.
Nijholt, A. 2007. Conversational Agents and the Con-
Strapparava, C.; Valitutti, A.; and Stock, O. 2007. Autom-
struction of Humorous Acts. In Conversational Informat-
ics: An Engineering Approach, ed. T. Nishida, chapter 2, 21– atizing Two Creative Functions for Advertising. Paper
47. London: John Wiley and Sons. presented at the 4th International Joint Workshop on
Computational Creativity, Goldsmiths, University of
Nijholt, A. 2002. Embodied Agents: A New Impetus to
Humor Research. In The April Fools’ Day Workshop on London, London, UK. 17–19 June.
Computational Humor, 101–111. Enschede, Netherlands: Tibballs, G., ed. 2000. The Mammoth Book of Jokes. Lon-
University of Twente. don: Constable and Robinson.
Nirenburg, S., and Raskin, V. 2004. Ontological Semantics. Tinholt, H. W., and Nijholt, A. 2007. Computational
Cambridge, MA: The MIT Press. Humour: Utilizing Cross-Reference Ambiguity for Con-
Raskin, V. 2002. Quo Vadis Computational Humor? In The versational Jokes. In 7th International Workshop on Fuzzy
April Fools’ Day Workshop on Computational Humor, 31–46. Logic and Applications (WILF 2007), Camogli (Genova),
Enschede, Netherlands: University of Twente. Italy, volume 4578 of Lecture Notes in Artificial Intelli-
Raskin, V. 1996. Computer Implementation of the Gen- gence, ed. F. Masulli, S. Mitra, and G. Pasi, 477–483. Ber-
eral Theory of Verbal Humor. In Proceedings of the Inter- lin: Springer Verlag.
national Workshop on Computational Humor, 9–20.
Venour, C. 1999. The Computational Generation of a
Enschede, Netherlands: University of Twente.
Class of Puns. Master’s thesis, Queen’s University,
Raskin, V. 1985. Semantic Mechanisms of Humor. Dor- Kingston, Ontario.
drecht: Reidel.
Waller, A.; Black, R.; O’Mara, D.; Pain, H.; Ritchie, G.; and
Reiter, E., and Dale, R. 2000. Building Natural Language
Manurung, R. 2009. Evaluating the STANDUP Pun Gen-
Generation Systems. Cambridge, UK: Cambridge Universi-
ty Press. erating Software with Children with Cerebral Palsy. ACM
Transactions on Accessible Computing (TACCESS) 1(3). Arti-
Ritchie, G. 2007. Some Empirical Criteria for Attributing
cle 16. New York: Association for Computing Machinery.
Creativity to a Computer Program. Minds and Machines
17(1): 67–99. Wilson, R. C., and Kidd, F. C., eds. 1999. The MIT Ency-
Ritchie, G. 2004. The Linguistic Analysis of Jokes. London: clopedia of the Cognitive Sciences. Cambridge, MA: The MIT
Routledge. Press.
Ritchie, G. 2005. Computational Mechanisms for Pun
Generation. Paper presented at the Tenth European
Workshop on Natural Language Generation, Aberdeen, Graeme Ritchie has degrees in pure mathematics, theo-
Scotland. 8–10 August. retical linguistics, and computer science and has pub-
Ruch, W., ed. 2007. The Sense of Humor: Explorations of a lished in various areas of AI and computational linguis-
Personality Characteristic. Mouton Select. Berlin: Mouton de tics over the past four decades. He has worked on models
Gruyter. of verbally expressed humor since 1993 and at present is
Ruch, W., ed. 1996. Humor: International Journal of Humor a senior research fellow in the Natural Language Genera-
Research 9(3/4). tion group at the University of Aberdeen.
Schank, R., and Abelson, R. 1977. Scripts, Plans, Goals and
Understanding. Hillsdale, NJ: Lawrence Erlbaum.
FALL 2009 81

Can Computers Create Humor?: Graeme Ritchie

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Can Computers Create Humor?: Graeme Ritchie

Hochgeladen von

Copyright:

Verfügbare Formate

Articles

negative orientation, professional communities, program, which takes in a collection of existing

Das könnte Ihnen auch gefallen