Beruflich Dokumente
Kultur Dokumente
Graeme Hirst,* Vanessa Wei Feng,* Christopher Cochrane,† and Nona Naderi*
*Department of Computer Science and † Department of Political Science
University of Toronto, Toronto, Ontario, Canada
*{gh,weifeng,nona}@cs.toronto.edu
† christopher.cochrane@utoronto.ca
Figure 1: Overall framework of our research on The rhetorical or discourse structure of an argu-
argumentation schemes. mentative text contributes to (or is, in part, de-
termined by) the structure of the argument that
it expresses. Consequently, much of our recent
gument from example, argument from cause to work has focused on discourse parsing, that is,
effect, practical reasoning, argument from con- determining the hierarchical rhetorical structure of
sequences, and argument from verbal classifica- the text: the logical relationships between sen-
tion. Casting the problem as one of text classifica- tences. Following the tenets of Rhetorical Struc-
tion, we built a pruned C4.5 decision tree (Quin- ture Theory (RST) (Mann and Thompson, 1988),
lan, 1993) for both one-against-others classifica- this is a tree structure that covers the text whose
tion of each scheme and for pairwise classification leaves are the elementary discourse units (EDUs)
of each possible pairing of schemes. We used a of text (roughly speaking, clauses and clause-like
variety of textual features, some of them specific constituents) and whose edges are the RST rela-
to a particular argument scheme and others iden- tions that hold between EDUs or spans of related
tical across schemes. They ranged from specific text. The set of relations include many that are
keywords and phrases to word-pair similarity be- pertinent to the structure of argumentation, such
tween the premise and the conclusion, the starting as CONTRAST, CAUSE , SUMMARY and ENABLE -
point of the premise or conclusion in its sentence, MENT. Also, as we noted above, an analysis of
and various syntactic dependency relations. Addi- discourse structure may help us to discriminate
tionally, we used one feature that cannot at present convergent from linked arguments. So while an
be automatically derived from text, but which we RST structure is not an argumentation structure
assume may be determined by cues such as dis- per se, it clearly contains information that con-
course relations: whether the argument is linked tributes to building an argumentation structure.
or convergent; that is, whether or all just one of Our research on discourse parsing has three
the premises suffice for the conclusion. facets: improving the initial segmentation of text
Using Araucaria for both training and testing, into EDUs (Feng and Hirst, 2014b); improving
we achieved high accuracy in one-against-others the parsing itself by using rich linguistic fea-
tures (Feng and Hirst, 2012); and technically im- be determined just from a quantitative analysis of
proving the parser both in accuracy and in effi- the vocabulary that they use — both from the way
ciency by separating the parsing of intra-sentence they talk about particular topics and (in some con-
and multi-sentence structures into separate pro- texts) from the topics that they tend to talk about
cesses (following Joty et al. (2013)), and adding a (Lin et al., 2006; Mullen and Malouf, 2006; Yu
post-editing pass to each process (Feng and Hirst, et al., 2008; Diermeier et al., 2012; Zirn, 2014).
2014a). Bringing the improvements together, and Typically, these studies attempt to induce a clas-
training and testing in the RST Discourse Tree- sifier from word-frequency vectors. Results have
bank (Carlson et al., 2001), we achieved an F1 been mixed; for example, extreme positions in the
score of 92.6% on discourse segmentation, and an U.S. Congress can be distinguished from those of
accuracy of 58.2% (against a baseline of 29.6%)2 the other side — sometimes by the use of topic-
on recognizing discourse relations on a gold- dependent shibboleths such as gay (liberal Demo-
standard segmentation. crat) or homosexual (conservative Republican) —
Our next task will be to combine our discourse but more-moderate positions cannot be (Yu et al.,
parser with our earlier work on identifying argu- 2008).
mentation schemes. We will augment our classi- In our earlier work (Hirst et al., 2010; Hirst
fier with new features derived from the discourse et al., 2014), we showed that the U.S. results do
structure in order to improve its accuracy. We will not apply to the Canadian Parliament. On one
also use discourse structure features to improve hand, we were able to classify party membership
the upstream classification that feeds into the ar- more reliably overall than the U.S. research did,
gumentation scheme classifier, and to begin the but on the other hand we also showed that dis-
task of further downstream analysis. In particu- tinctions in the vocabulary of the speakers de-
lar, this will include analysis of arguments to de- pend far more upon whether their party was in
termine the underlying ideology of a text. government or in opposition than upon their ide-
ological position. The differences reflect primar-
4 Ideology and issue framing ily defence (government) and attack (opposition),
Social scientists usually define ideology as a be- a feature inherent to parliamentary governments
lief system: “a configuration of ideas and attitudes in general, and especially to the Canadian parlia-
in which the elements are bound together by some ment where party discipline is particularly strict
form of constraint or functional interdependence” (Savoie, 1999). When we applied classification
(Converse, 1964, p. 207). The left / right polit- methods based on word-frequency to the proceed-
ical divide is a systematic and enduring ideolog- ings of the European Parliament, in which the
ical cleavage that divides “the world of political factor of government–opposition status is absent,
thought and action” in democratic countries (Bob- we achieved a more-accurate ideological classi-
bio, 1996). Systematic left / right differences ap- fication of speakers from the five major parties
pear in the voting records of politicians in legisla- across the left / right spectrum (Hirst et al., 2014).
tive assemblies (Hix et al., 2006), in the election This confounding role of institutions on left / right
platforms of political parties (Budge et al., 2001; differences align with what others have recently
Klingemann et al., 2006), and in the patterns of uncovered in cross-national analysis of legislative
public opinion (Jost, 2006). The left / right divide voting patterns (Hix and Noury, 2013).
is so pervasive and enduring that many now won- Casual observers of politics recognize left /
der whether these political differences are mani- right differences when they see them, but even ex-
festations of deeply rooted, and perhaps heritable, perts struggle to define these terms. The root of
psychological traits (Alford et al., 2005; Carney the problem is the effort to define left and right by
et al., 2008; Haidt, 2012). reducing each side to a single idea or “essential
Several computational studies have looked at core”. The morphology of left and right is incon-
the question of whether a political speaker’s ide- sistent with such a specification. Rather, left and
ological position on the left / right spectrum can right describe “family resemblances” between the
2 This
systems of political ideas that actors on each side
is the majority baseline of always labeling the re-
sulting subtree with the relation ELABORATION with the cur- advance on the questions of political disagreement
rent span as the nucleus and the next span as the satellite. (Cochrane, 2014). Although no single idea de-
fines the left or the right, ideas are more or less compiled arguments.4 Nonetheless, for automatic
central to one of these resemblances to the ex- text analysis, quantifiable semantic characteristics
tent that they are more common among the be- of the speaker’s presentation of a position are in-
lief systems of actors that are inside each category dicators or proxies of the framing, which can then
than they are among the beliefs systems of actors be interpreted qualitatively (by a human). In a sim-
that are outside each category. From this van- ple analysis, this might be a statistical analysis of
tage point, the central ideas on the political left are the key concepts of the text, as denoted by con-
commitments to equality, pacifism, and, more re- tent words, significant collocations of words, and
cently, the environment. The distinguishing ideas syntactic structures, much as in the simple text-
on the right are support for capitalist economic or- classification–based ideology studies mentioned
thodoxy, law and order, and patriotic militarism above, or a topic-model–based analysis, as in the
(Cochrane, 2014). The differences between polit- work of Nguyen et al. (2013).
ical parties in their support for these ideas explain In our research, however, we are also propos-
more than two-thirds of the variation in how cit- ing a novel, more-sophisticated analysis in which
izens and experts position the parties on a left / we also look at the actual argumentation structures
right dimension (Cochrane, 2014). and discourse relationships of the text and how
The “content” of a belief system is the set of the concepts adduced by the lower-level linguis-
preferences that an actor harbours about political tic components are used in these structures. We
issues. The “structure” of a belief system is the will describe these proposals in the next section.
way in which an actor puts different political is-
sues together into bundles of constrained prefer- 5 Argumentation and issue framing in
ences. Actors that think about politics from the parliamentary speech
vantage point of altogether different ideas not only Left / right speech is a subset of ideological speech
disagree in their positions on issues, they also dis- more generally. Ideological speech is a subset
agree in their views of how different issues fit to- of political speech more generally. As we noted
gether logically in the political world around them. above, previous analyses of political speech at-
Thus, the content and the structure of belief sys- tempt to induce left / right classifiers from anal-
tems varies on the left and the right (Cochrane, yses of vocabulary across all of the many top-
2013). ics of discussion in a dataset. But this ap-
Because of these differences, individuals from proach disregards the results of an extensive body
different ideological positions will often frame of political science research that analyzes left /
things differently in argumentation on any partic- right ideological disagreement in legislative vot-
ular issue. For example, on the issue of how much ing records (Poole and Rosenthal, 2007; Hix and
immigration should be allowed into their country, Noury, 2013), party election manifestos (Budge
one person might frame the argument as one of et al., 2001; Klingemann et al., 2006), and opin-
economic benefit or detriment, a second person as ions (Jost, 2006). A key finding from these studies
one of the benefits or problems of multicultural- concerns the varying centrality of specific actors,
ism, and a third person as one of social justice.3 ideas, and topics to left / right political disagree-
These differences will be reflected in the vocabu- ment. Some actors are more central to the left or
lary that each of these people uses, which accounts to the right than are other actors. Some ideas are
for the results presented above on identifying ide- more central to the left or to the right than are other
ology based on vocabulary alone; in the absence ideas. Left / right disagreements implicate some
of confounding factors, as we saw most clearly in political issues and not others. This provides an
the case of the European Parliament, vocabulary is informative prior for models that seek to uncover
a strong indicator all by itself. left / right differences from the patterns of vocabu-
So we see that the framing of an issue by a
4 A fortiori, framing is a political action: “Framing es-
speaker in an argumentative text is not, ultimately,
sentially involves selection and salience. To frame is to se-
a linguistic entity; it’s an ideological viewpoint or lect some aspects of a perceived reality and make them more
perspective: a set of beliefs, assumptions and pre- salient in a communicating text, in such a way as to promote
a particular problem definition, causal interpretation, moral
3 Immigration is in fact the particular topic on which we evaluation, and/or treatment recommendation for the item de-
will conduct our case study on the framing of arguments; see scribed” (Entman, 1993). But here, we focus on the linguistic
section 5 below. and argumentative aspects of framing.
lary and argumentation in political text. The like- lations to find claims and analyze the reasoning
lihood that speech conveys information about left / structure that is used to justify, support, and derive
right argumentation is a function of the speaker the claims. In addition, we will take into account
and the topic. how the concepts adduced by lower-level linguis-
Thus, the goal of our work, broadly speaking, is tic components — phrases, syntactic dependency
to develop computational models for the automatic structures — are used in the actual argumenta-
analysis of ideology and issue framing in politi- tion structures and discourse relationships of the
cal speech that are better informed than the simple text. We hope to be able to recognize instances of
vocabulary-based models and that draw on auto- known frames in the text, and possibly even dis-
matic discourse parsing and automatic analysis of cover new ones. Because we will be developing
argumentation as their primary mechanism. We deeper and hence more tentative methods of com-
would like to look more narrowly and more deeply putational linguistic analysis, we do not expect to
at argumentation on specific issues by individuals provide a complete automated analysis of text in
across the left / right spectrum, and develop au- the first instance, but rather to provide data that
tomatic methods of analysis that will identify, or can then be interpreted by a human analyst.
help analysts to identify, different frames and ide- In parallel with this approach, we will also de-
ological positions. Our “help to” hedge reflects velop text-classification methods for identifying
the difficulty of the goal and the context of our ideological positions in speech that will look be-
research as part of a much-larger project that is yond vocabulary and also take into consideration
building datasets and tools to assist political sci- frequent collocations and lexicalized syntactic de-
entists and political historians in their analyses. pendency structures as features. This will allow
The primary data for our work is the annotated us to include differences in the way that particular
parliamentary proceedings, from the present back words are used (even where speakers use the word
to the mid-1800s or earlier, that are being pro- with the same frequency) as a feature of the clas-
duced by the Dilipad project (see section 1 above), sification. This will provide a new, higher base-
from which we will draw speech5 on specific top- line against which the results of the discourse- and
ics for diachronic and cross-national analysis of argumentation-based analysis can be evaluated. It
argumentation and framing. Immigration is a topic may also provide information that can itself be a
of special interest here, as it has been an important component of that analysis. In addition, the words,
and recurring issue since the nineteenth century in collocations, and dependency structures that are
all three participating countries. We hope to iden- most informative for classification will, as with
tify national and temporal differences and similar- our other methods, be available for human inter-
ities in the frames used to discuss the issue. pretation.
In our models, we will bring together, and ex-
6 Conclusion
tend, the work on discourse parsing and argu-
mentation scheme identification described in sec- Our work focuses on the structure of discourse
tions 2 and 3 above. Although these techniques and arguments to better understand ideological po-
are far from perfect, we hypothesize that typical sitions and issue framing through their linguistic
political speech contains a sufficiently well-cued realizations. By applying discourse parsing and
discourse structure that the analyses that we can the analysis of argumentation to parliamentary de-
achieve, although still quite imperfect, will be use- bates, we hope to determine how speakers with
fully indicative of issue framing and other ideo- various ideologies argue on a range of issues. Ide-
logical signals, and will be more immune to con- ologies are manifested not only by the vocabu-
founding factors, such as the attack-and-defence laries used, but also by how the differing beliefs
dynamics of parliamentary debates, than simple of political speakers lead to different framing of
vocabulary classification. In particular, we will issues. Ideology detection can therefore benefit
use features from discourse units and rhetorical re- from argumentation and discourse analysis tech-
5 Although
niques.
we refer to political and parliamentary speech
and speakers, as is conventional, we are working only with
the published textual transcriptions of the parliamentary de-
bates. We are not using audio data or any kind of automatic
speech recognition.
Acknowledgements Feng, Vanessa Wei and Hirst, Graeme (2012). Text-
level discourse parsing with rich linguistic features.
The Digging Into Linked Parliamentary Data project is Proceedings, 50th Annual Meeting of the Associa-
funded through the Digging Into Data Challenge. The Cana-
tion for Computational Linguistics, Jeju, Korea, 60–
dian arm of the project is funded by the Natural Sciences and
Engineering Research Council of Canada through its Discov- 68.
ery Frontiers program and by the Social Sciences and Hu-
manities Research Council. We are grateful to Kaspar Beelen Feng, Vanessa Wei and Hirst, Graeme (2014a). A
for helpful comments on an earlier draft of this paper. linear-time bottom-up discourse parser with con-
straints and post-editing. Proceedings, 52nd Annual
Meeting of the Association for Computational Lin-
guistics, Baltimore.
References
Alford, John R.; Funk, Carolyn L.; and Hibbing, Feng, Vanessa Wei and Hirst, Graeme (2014b). Two-
John R. (2005). American Political Science Review, pass discourse segmentation with pairing and global
99(2), 153–167. features. Submitted.
Bobbio, Noberto (1996). Left and Right: The Signif- Haidt, Jonathan (2012). The Righteous Mind: Why
icance of a Political Distinction. Cambridge, UK: Good People are Divided by Politics and Religion.
Polity Press. New York, NY: Pantheon Books.
Budge, Ian; Klingemann, Hans-Dieter; Volkens, An- Hirst, Graeme; Riabinin, Yaroslav; and Graham, Jory
drea; and Bara, Judith (2006). Mapping Policy Pref- (2010). Party status as a confound in the automatic
erences: Estimates for Parties, Electors, and Gov- classification of political speech by ideology. Pro-
ernments 1945–1998. Oxford, UK: Oxford Univer- ceedings, 10th International Conference on Sta-
sity Press. tistical Analysis of Textual Data / 10es Journées
internationales d’Analyse statistique des Données
Carlson, Lynn; Marcu, Daniel; and Okurowski, Mary Textuelles (JADT 2010), Rome, 731–742.
Ellen (2001). Building a discourse-tagged corpus in
the framework of Rhetorical Structure Theory. Pro- Hirst, Graeme; Riabinin, Yaroslav; Graham, Jory;
ceedings of Second SIGDial Workshop on Discourse Boizot-Roche, Magali; and Morris, Colin (2014).
and Dialogue (SIGDial 2001), Aalborg, 1–10. Text to ideology or text to party status? In: Kaal,
Bertie; Maks, E. Isa; and van Elfrinkhof, Annemarie
Carney, Dana R.; Jost, John T.; Gosling, Samuel D.; M.E. (editors), From Text to Political Positions: Text
and Potter, Jeff (2012). The secret lives of liberals analysis across disciplines, Amsterdam: John Ben-
and conservatives: Personality profiles, interaction jamins, 93–115.
styles, and the things they leave behind. Political
Psychology, 29(6), 807–840. Hix, Simon; and Noury, Abdul (2013). Government-
opposition or left-right? The institutional determi-
Cochrane, Christopher (2013). The asymmetrical struc-
nants of voting in legislatures. Working paper, re-
ture of left / right disagreement: Left-wing coher-
trieved from http://personal.lse.ac.uk/hix/
ence and right-wing fragmentation in comparative
Research.HTM, 2014-06-14.
party policy. Party Politics, 19(1), 104–121.
Cochrane, Christopher (2014). Left and Right: The Hix, Simon; Noury, Abdul; and Roland, Gérare (2006).
Small World of Political Ideas. MS under review. Democratic politics in the European Parliament.
American Journal of Political Science, 50(2), 494–
Converse, Philip E. (1964). The nature of belief sys- 520.
tems in mass publics. In David E. Apter, ed.
Ideology and Discontent. London, UK: Collier- Jost, John T. (2006). The end of the end of ideology.
MacMillan, 206–261. American Psychologist, 61(7), 651–670.
Diermeier, Daniel; Godbout, Jean-François; Yu, Bei; Joty, Shafiq; Carenini, Giuseppe; Ng, Raymond;
and Kaufmann, Stefan (2012). Language and ide- and Mehdad, Yashar (2013). Combining intra- and
ology in Congress. British Journal of Political Sci- multi-sentential rhetorical parsing for document-
ence, 42(1), 31–55. level discourse analysis. Proceedings of the 51st An-
nual Meeting of the Association for Computational
Entman, Robert M. (1993). Framing: Toward clarifica- Linguistics (ACL 2013), Sofia, Bulgaria, 486–496.
tion of a fractured paradigm. Journal of Communi-
cation, 43(4), 51–58. Klingemann, Hans-Dieter; Volkens, Andrea; Bara, Ju-
dith; Budge, Ian; and McDonald, Michael (2006).
Feng, Vanessa Wei and Hirst, Graeme (2011). Classi- Mapping Policy Preferences II: Estimates for Par-
fying arguments by scheme. Proceedings, 49th An- ties, Electors, and Governments in Eastern Europe,
nual Meeting of the Association for Computational European Union, and OECD 1990–2003. Oxford,
Linguistics, Portland, Oregon, 978–996. UK: Oxford University Press.
Lin, Wei-Hao; Wilson, Theresa; Wiebe, Janyce; and Walton, Douglas; Reed, Chris; and Macagno, Fabrizio
Hauptmann, Alexander (2006). Which side are you (2008). Argumentation Schemes. Cambridge Univer-
on? Identifying perspectives at the document and sity Press.
sentence levels. Proceedings of the 10th Conference
on Natural Language Learning (CoNLL-X), 109– Yu, Bei; Kaufmann, Stefan; and Diermeier, Daniel
116. (2008). Classifying party affiliation from political
speech. Journal of Information Technology in Poli-
Mann, William and Thompson, Sandra (1988). Rhetor- tics, 5(1), 33–48.
ical structure theory: Toward a functional theory of
text organization. Text, 8(3), 243–281. Zirn, Cäcilia (2014). Analyzing positions and topics in
political discussions of the German Bundestag. Pro-
Mochales, Raquel and Moens, Marie-Francine (2008). ceedings of the ACL 2014 Student Research Work-
Study on the structure of argumentation in case shop, Baltimore, 26–33.
law. Proceedings of the 2008 Conference on Legal
Knowledge and Information Systems, Amsterdam:
IOS Press, 11–20.