Beruflich Dokumente
Kultur Dokumente
@inproceedings{Iyyer:Boyd-Graber:Claudino:Socher:Daume-III-2014,
Author = {Mohit Iyyer and Jordan Boyd-Graber and Leonardo Claudino and Richard Socher and Hal {Daum\{e} III}},
Url = {docs/2014_emnlp_qb_rnn.pdf},
Booktitle = {Empirical Methods in Natural Language Processing},
Location = {Doha, Qatar},
Year = {2014},
Title = {A Neural Network for Factoid Question Answering over Paragraphs},
}
Links:
Code/Data [http://www.cs.umd.edu/~miyyer/qblearn]
Introduction
Dependency-Tree Recursive
Neural Networks
Model Description
As in other rnn models, we begin by associating each word w in our vocabulary with a
vector representation xw Rd . These vectors
are stored as the columns of a d V dimensional word embedding matrix We , where V is
the size of the vocabulary. Our model takes
dependency parse trees of question sentences
(De Marneffe et al., 2006) and their corresponding answers as input.
Each node n in the parse tree for a particular sentence is associated with a word w, a
word vector xw , and a hidden vector hn Rd
of the same dimension as the word vectors. For
internal nodes, this vector is a phrase-level representation, while at leaf nodes it is the word
vector xw mapped into the hidden space. Unlike in constituency trees where all words reside
at the leaf level, internal nodes of dependency
trees are associated with words. Thus, the dtrnn has to combine the current nodes word
vector with its childrens hidden vectors to form
hn . This process continues recursively up to
the root, which represents the entire sentence.
We associate a separate dd matrix Wr with
each dependency relation r in our dataset and
learn these matrices during training.1 Syntactically untying these matrices improves com1
We had 46 unique dependency relations in our quiz
bowl dataset.
ROOT
POBJ
POSS
DET
This
NSUBJ
POSSESSIVE
city
economy
AMOD
PREP
depended
on
subjugated
VMOD
peasants
DOBJ
called
helots
(1)
(2)
(3)
where R(n, k) is the dependency relation between node n and child node k.
3.2
Training
Our goal is to map questions to their corresponding answer entities. Because there are
a limited number of possible answers, we can
view this as a multi-class classification task.
While a softmax layer over every node in the
tree could predict answers (Socher et al., 2011;
Iyyer et al., 2014), this method overlooks that
most answers are themselves words (features)
in other questions (e.g., a question on World
XX
L(rank(c, s, Z))max(0,
sS zZ
1 xc hs + xz hs ), (5)
1 X
C(t, ).
N
(6)
tT
(7)
tT
While past work on rnn models have been restricted to the sentential and sub-sentential
levels, we show that sentence-level representations can be easily combined to generate useful
representations at the larger paragraph level.
The simplest and best6 aggregation method
is just to average the representations of each
sentence seen so far in a particular question.
As we show in Section 4, this method is very
powerful and performs better than most of our
baselines. We call this averaged dt-rnn model
qanta: a question answering neural network
with trans-sentential averaging.
Experiments
Datasets
Because some categories contain substantially fewer questions than others (e.g., astronomy has only 331 questions), we consider only
literature and history questions, as these two
categories account for more than 40% of the
corpus. This leaves us with 21,041 history questions and 22,956 literature questions.
4.1.1
Data Preparation
https://pypi.python.org/pypi/Whoosh/
Code and non-naqt data available at http://cs.
umd.edu/~miyyer/qblearn.
9
Baselines
4.3
DT-RNN Configurations
tanh(v)
.
ktanh(v)k
(8)
Human Comparison
Previous work provides human answers (BoydGraber et al., 2012) for quiz bowl questions.
We use human records for 1,201 history guesses
and 1,715 literature guesses from twenty-two of
the quiz bowl players who answered the most
questions.15
The standard scoring system for quiz bowl is
10 points for a correct guess and -5 points for
an incorrect guess. We use this metric to compute a total score for each human. To obtain
the corresponding score for our model, we force
it to imitate each humans guessing policy. For
example, Figure 3 shows a human answering
in the middle of the second sentence. Since our
model only considers sentence-level increments,
we compare the models prediction after the
first sentence to the human prediction, which
means our model is privy to less information
than humans.
The resulting distributions are shown in Figure 4our model does better than the average
player on history questions, tying or defeating sixteen of the twenty-two players, but it
does worse on literature questions, where it
only ties or defeats eight players. The figure
indicates that literature questions are harder
than history questions for our model, which is
corroborated by the experimental results discussed in the next section.
Discussion
In this section, we examine why qanta improves over our baselines by giving examples
of questions that are incorrectly classified by
all baselines but correctly classified by qanta.
We also take a close look at some sentences that
all models fail to answer correctly. Finally, we
visualize the answer space learned by qanta.
13
The standard tanh function produced heavy saturation at higher levels of the trees, and corrective
weighting as in Socher et al. (2014) hurt our model
because named entities that occur as leaves are often
more important than non-terminal phrases.
14
Initial experiments with L2 regularization hurt performance on a validation set.
15
Participants were skilled quiz bowl players and are
not representative of the general population.
History
Model
Literature
Pos 1
Pos 2
Full
Pos 1
Pos 2
Full
bow
bow-dt
ir-qb
fixed-qanta
qanta
27.5
35.4
37.5
38.3
47.1
51.3
57.7
65.9
64.4
72.1
53.1
60.2
71.4
66.2
73.7
19.3
24.4
27.4
28.9
36.4
43.4
51.8
54.0
57.7
68.2
46.7
55.7
61.9
62.3
69.1
ir-wiki
qanta+ir-wiki
53.7
59.8
76.6
81.8
77.5
82.3
41.8
44.7
74.0
78.7
73.3
76.6
200
150
100
50
0
50
100
150
200
Model loses
Model wins
200
Score Difference
Score Difference
Table 1: Accuracy for history and literature at the first two sentence positions of each question
and the full question. The top half of the table compares models trained on questions only, while
the IR models in the bottom half have access to Wikipedia. qanta outperforms all baselines
that are restricted to just the question data, and it substantially improves an IR model with
access to Wikipedia despite being trained on much less data.
100
0
100
200
300
400
Model loses
Model wins
Figure 4: Comparisons of qanta+ir-wiki to human quiz bowl players. Each bar represents an
individual human, and the bar height corresponds to the difference between the model score and
the human score. Bars are ordered by human skill. Red bars indicate that the human is winning,
while blue bars indicate that the model is winning. qanta+ir-wiki outperforms most humans
on history questions but fails to defeat the average human on literature questions.
A minor character in this play can be summoned
by a bell that does not always work; that character
also doesnt have eyelids. Near the end, a woman
who drowned her illegitimate child attempts to stab
another woman in the Second Empire-style 3 room
in which the entire play takes place. For 10 points,
Estelle and Ines are characters in which existentialist
play in which Garcin claims Hell is other people,
written by Jean-Paul Sartre?
Experimental Results
or equivalent analogues in the training question data. With that said, ir methods can
also operate over data that does not follow the
special constraints of quiz bowl questions (e.g.,
every sentence uniquely identifies the answer,
answers dont appear in their corresponding
questions), which qanta cannot handle. By
combining qanta and ir-wiki, we are able to
leverage access to huge knowledge bases along
with deep compositional representations, giving us the best of both worlds.
5.2
Related Work
james_buchanan
john_quincy_adams franklin_pierce
james_monroe
john_tyler
andrew_jackson
george_washington
thomas_jefferson
john_adams
zachary_taylor
millard_fillmore
william_henry_harrison
grover_cleveland
benjamin_harrison
ronald_reagan
jimmy_carter
woodrow_wilson
martin_van_buren william_mckinley
calvin_coolidge
william_howard_taft
herbert_hoover
TSNE-2
ethiopia
claudius
caligula
margaret_thatcher
haile_selassie
nero
adolf_hitler
jawaharlal_nehru
marco_polo
toyotomi_hideyoshi
solon
vyacheslav_molotov
hadrian
winston_churchill
shaka
tokugawa_ieyasu
nikita_khrushchev
charlemagne
cleisthenes
portugal
oda_nobunaga
suleiman_the_magnificent
muhammad
pericles
akbar
shah_jahan
mikhail_gorbachev
leonid_brezhnev napoleon_iii
charles_martel
ivan_the_terrible
chandragupta_maurya babur
benito_mussolini
neville_chamberlain
gaius_marius
david_lloyd_george
fidel_castro
attila
pierre_trudeau
pepin_the_short
mao_zedong
alfred_the_great
charles_de_gaulle
cyrus_the_great
edward_the_confessor
eleanor_of_aquitaine
atahualpakublai_khan
hugh_capet
salvador_allende
sparta
justinian_i
henry_i_of_england
georges_clemenceau
genghis_khan
canossa
cardinal_mazarin
hyksos
brian_mulroney
indira_gandhi
paul_von_hindenburg
gamal_abdel_nasser
darius_i diocletian
maria_theresa
marcus_licinius_crassus
giuseppe_garibaldi
mark_antony
william_the_conqueror
nelson_mandela
cecil_rhodes
benjamin_disraeli
pancho_villa
william_ewart_gladstone
carthage
william_tecumseh_sherman
konrad_adenauer robert_walpole
kwame_nkrumah emilio_aguinaldo
francisco_pizarro
black_panther_party
hannibal
alfred_dreyfus
john_wycliffe
leon_trotsky
oliver_cromwell
philip_ii_of_macedon
peter_the_great
hideki_tojo
battle_of_hastings
daniel_boone
anti-masonic_party
thomas_paine
samuel_gompers
adolf_eichmann
thomas_nast douglas_macarthur
ambrose_burnside
jacques_marquette
horatio_gates
sam_houston
emiliano_zapata
john_wilkes_booth
mobutu_sese_seko
girolamo_savonarola
george_meade
muhammad_ali_jinnah
mapp_v._ohio
benedict_arnold
henry_the_navigator
huey_long
george_armstrong_custer
roald_amundsen
vasco_da_gama francis_drake
charles_stewart_parnell
ethan_allen
charles_lindbergh
peasants'_revolt
chester_a._arthur
julius_nyerere
samuel_de_champlain
hernando_de_soto
henry_vii_of_england
tecumseh
victoria_woodhull
maximilien_de_robespierre
chiang_kai-shek
fort_ticonderoga
george_h._pendleton
alexander_kerensky
easter_rising
henry_morton_stanley
getulio_vargas
mustafa_kemal_ataturk
william_penn
william_walker_(filibuster)
zebulon_pike
bernardo_o'higgins
george_b._mcclellan
henry_l._stimsonrutherford_b._hayes
thebes,_greece
david_livingstone
battle_of_gettysburg
huldrych_zwingli
labour_party_(uk)
roger_williams_(theologian)
albert_b._fall
battle_of_bosworth_field
battle_of_antietam
kellogg-briand_pact
adlai_stevenson_ii
battle_of_the_thames
green_mountain_boys
harry_s._truman
john_paul_jones
porfirio_diaz
wars_of_the_roses
golden_horde
battle_of_the_little_bighorn
mary,_queen_of_scots
rough_riders
pontiac_(person)
mary_baker_eddy
bartolomeu_dias
alexander_ii_of_russia
henry_hudson
vandals
byzantine_empire
john_a._macdonald
clara_barton
hittites
battle_of_the_coral_sea
ghana
lollardyamelia_earhart
battle_of_chancellorsville
battle_of_shiloh
first_crusade
jamaica
battle_of_trenton
hudson's_bay_company
john_t._scopes
first_battle_of_bull_run
jacques_cartier
john_cabot
william_h._seward
treaty_of_brest-litovsk
lester_b._pearson
miguel_hidalgo_y_costilla
st._bartholomew's_day_massacre
first_triumvirate
yuan_dynasty
parthian_empire
mughal_empire
battle_of_agincourt
ferdinand_magellan
battle_of_zama mali_empire
a._philip_randolph
roger_b._taney
han_dynasty
ulysses_s._grant
tang_dynasty
platt_amendment
battle_of_lepanto peloponnesian_war
vitus_bering
christopher_columbus
battle_of_bunker_hill
amerigo_vespucci
battle_of_culloden
ming_dynasty
john_brown_(abolitionist)
battle_of_blenheim
battle_of_salamis
fourth_crusade
battle_of_the_bulge
umayyad_caliphate
battle_of_kings_mountain
whiskey_ring
john_peter_zenger
spartacus
pedro_alvares_cabral
battle_of_trafalgar
warren_g._harding
battle_of_plassey
third_crusade
shays'_rebellion
battle_of_actium
whig_party_(united_states)
james_a._garfield
coxey's_army
booker_t._washington
aethelred_the_unready
francisco_i._madero
schenck_v._united_states
battle_of_marathon
spiro_agnew
black_hole_of_calcutta
second_boer_war
verdun
eleanor_roosevelt
plessy_v._ferguson
james_g._blaine battle_of_ayacucho
opium_wars
guadalcanal
dawes_plan
walter_mondale
teutonic_knights
fashoda_incident
albigensian_crusade
camillo_benso,_count_of_cavour
salmon_p._chase
louis_brandeis
nagasaki
albany_congress
john_foster_dulles
salem_witch_trials
battle_of_midway
great_northern_war
joseph_mccarthy
knights_of_labor
luddite
credit_mobilier_of_america_scandal
henry_cabot_lodge,_jr.
arthur_wellesley,_1st_duke_of_wellington
war_of_the_spanish_succession
charles_evans_hughes
william_henry_harrison
teapot_dome_scandal
maginot_line
edict_of_nantes
homestead_strike
battle_of_leyte_gulf
wounded_knee_massacre
kitchen_cabinet
jefferson_davis
franklin_pierce
caroline_affair
stephen_a._douglas
peace_of_westphalia
battle_of_austerlitz
treaty_of_utrecht
zachary_taylor
john_c._calhoun
barry_goldwater
triangle_shirtwaist_factory_fire
free_soil_party
john_hay
andrew_jackson
battle_of_tannenberg
lewis_cass
horace_greeley
brigham_young
war_of_the_austrian_succession
benito_juarez
spanish_civil_war
francisco_vasquez_de_coronado
clarence_darrow
antonio_lopez_de_santa_anna
martin_luther_(1953_film)
ancient_corinth
george_wallace
earl_warren
henry_clay
william_jennings_bryan
louis_riel
treaty_of_tordesillas crimean_war
thaddeus_stevens
bastille
war_of_devolution
grover_cleveland
hubert_humphrey
john_sherman
tammany_hall
antonio_de_oliveira_salazar battle_of_caporetto
boxer_rebellion
william_mckinley
samuel_j._tilden
alexander_h._stephens
william_lloyd_garrison
provisions_of_oxford
john_quincy_adams
tokugawa_shogunate
millard_fillmore
eugene_v._debs
battle_of_bannockburn
long_march meiji_restoration
july_revolution
thomas_hart_benton_(politician)
congress_of_vienna
daniel_webster
thirty_years'_war
treaty_of_portsmouth
charles_sumner
james_monroe
james_k._polk
state_of_franklin
susan_b._anthony
benjamin_harrison
taiping_rebellion
guelphs_and_ghibellines
james_buchanan
alexander_hamilton
songhai_empire
inca_empire
john_adams
aaron_burr
khmer_empirekamakura_shogunate
john_marshall
night_of_the_long_knives
john_tyler
william_howard_taft
john_jay
maine
thomas_jefferson
martin_van_buren
hanseatic_league
marcus_garvey
safavid_dynasty
jacquerie
elizabeth_cady_stanton
calvin_coolidge
council_of_trent
lateran_treaty
haymarket_affair
samuel_adams
george_washington henry_cabot_lodge
frederick_douglass
council_of_chalcedon
molly_maguires
john_hancock
herbert_hoover
brook_farm
october_manifesto
cultural_revolution
second_vatican_council
gunpowder_plot
council_of_constance
battle_of_puebla
league_of_nations
peterloo_massacre
diet_of_worms
kulturkampf
dorr_rebellion
boston_massacre
paris_commune
suez_crisis
hartford_convention
otto_von_bismarck
fugitive_slave_laws
bonus_army
francisco_franco
seminole_wars
congress_of_berlin
war_of_the_pacific
whiskey_rebellion
civilian_conservation_corps
jameson_raid
xyz_affair
treaty_of_waitangi
greenback_party
black_hawk_war
treaty_of_ghent
truman_doctrine
clayton_antitrust_act
embargo_act_of_1807
atlantic_charter
gadsden_purchase
falklands_war
bay_of_pigs_invasion
paraguay
trent_affair
prague_spring
finland
louisiana_purchase
gang_of_four seneca_falls_convention
second_bank_of_the_united_states
ostend_manifesto
erie_canal
compromise_of_1850
cherokee jay_treaty
missouri_compromise
wilmot_proviso
oregon_trail
equal_rights_amendment
sherman_antitrust_act
fourteen_points
monroe_doctrine
TSNE-1
jane_addams
hull_house
jimmy_carter
ronald_reagan
woodrow_wilson
Factoid Question-Answering
Thomas Mann
Henrik Ibsen
Joseph Conrad
Henry James
Franz Kafka
Future Work
While we have shown that dt-rnns are effective models for quiz bowl question answering,
other factoid qa tasks are more challenging.
Questions like what does the aarp stand for?
from trec qa data require additional infrastructure. A more apt comparison would be to
IBMs proprietary Watson system (Lally et al.,
2012) for Jeopardy, which is limited to single
sentences, or to models trained on Yago (Hoffart et al., 2013).
We would also like to fairly compare qanta
A
Q
A
Q
A
Q
A
Q
Akbar
Muhammad
Shah Jahan
Ghana
Babur
Conclusion
Acknowledgments
We thank the anonymous reviewers, Stephanie
Hwa, Bert Huang, and He He for their insightful comments. We thank Sharad Vikram, R.
Hentzel, and the members of naqt for providing our data. This work was supported by
nsf Grant IIS-1320538. Boyd-Graber is also
supported by nsf Grant CCF-1018625. Any
opinions, findings, conclusions, or recommendations expressed here are those of the authors
and do not necessarily reflect the view of the
sponsor.
References
Yoshua Bengio, Rejean Ducharme, Pascal Vincent, and
Christian Jauvin. 2003. A neural probabilistic language model. JMLR.
Quoc V Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In ICML.
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word
representations in vector space. arXiv preprint
arXiv:1301.3781.
Jeff Mitchell and Mirella Lapata. 2008. Vector-based
models of semantic composition. In ACL.
Mark Palatucci, Dean Pomerleau, Geoffrey E. Hinton,
and Tom M. Mitchell. 2009. Zero-shot learning with
semantic output codes. In NIPS.
P. Pasupat and P. Liang. 2014. Zero-shot entity extraction from web pages. In ACL.
Asad B Sayeed, Jordan Boyd-Graber, Bryan Rusk, and
Amy Weinberg. 2012. Grammatical structures for
word-level sentiment detection. In NAACL.
Dan Shen. 2007. Using semantic role to improve question answering. In EMNLP.
Richard Socher, Jeffrey Pennington, Eric H. Huang,
Andrew Y. Ng, and Christopher D. Manning. 2011.
Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions. In EMNLP.
Richard Socher, John Bauer, Christopher D. Manning,
and Andrew Y. Ng. 2013a. Parsing With Compositional Vector Grammars. In ACL.
Richard Socher, Danqi Chen, Christopher D. Manning,
and Andrew Y. Ng. 2013b. Reasoning With Neural
Tensor Networks For Knowledge Base Completion.
In NIPS.
Richard Socher, Alex Perelygin, Jean Y Wu, Jason
Chuang, Christopher D Manning, Andrew Y Ng, and
Christopher Potts. 2013c. Recursive deep models for
semantic compositionality over a sentiment treebank.
In EMNLP.
Richard Socher, Quoc V Le, Christopher D Manning,
and Andrew Y Ng. 2014. Grounded compositional
semantics for finding and describing images with
sentences. TACL.
Nicolas Usunier, David Buffoni, and Patrick Gallinari.
2009. Ranking with ordered weighted pairwise classification. In ICML.
Laurens Van der Maaten and Geoffrey Hinton. 2008.
Visualizing data using t-sne. JMLR.
Mengqiu Wang, Noah A. Smith, and Teruko Mitamura. 2007. What is the Jeopardy model? a quasisynchronous grammar for QA. In EMNLP.