Sara Dicerto (Auth.) - Multimodal Pragmatics and Translation - A New Model For Source Text Analysis - 2018)

PALGRAVE STUDIES IN
TRANSLATING AND INTERPRETING

Series Editor: Margaret Rogers
MULTIMODAL
PRAGMATICS AND
TRANSLATION
A New Model for
Source Text Analysis
Sara Dicerto
Palgrave Studies in Translating and Interpreting
Series editor
Margaret Rogers
Department of Languages and Translation
University of Surrey
Guildford, UK
This series examines the crucial role which translation and interpreting in
their myriad forms play at all levels of communication in today’s world,
from the local to the global. Whilst this role is being increasingly recog-
nised in some quarters (for example, through European Union legisla-
tion), in others it remains controversial for economic, political and social
reasons. The rapidly changing landscape of translation and interpreting
practice is accompanied by equally challenging developments in their aca-
demic study, often in an interdisciplinary framework and increasingly
reflecting commonalities between what were once considered to be sepa-
rate disciplines. The books in this series address specific issues in both
translation and interpreting with the aim not only of charting but also of
shaping the discipline with respect to contemporary practice and research.
More information about this series at

http://www.palgrave.com/series/14574
Sara Dicerto
Multimodal
Pragmatics and
Translation
A New Model for Source Text Analysis
Sara Dicerto
King’s College London
London, UK
Palgrave Studies in Translating and Interpreting

ISBN 978-3-319-69343-9 ISBN 978-3-319-69344-6 (eBook)
https://doi.org/10.1007/978-3-319-69344-6
Library of Congress Control Number: 2017960751
© The Editor(s) (if applicable) and The Author(s) 2018

This work is subject to copyright. All rights are solely and exclusively licensed by the
Publisher, whether the whole or part of the material is concerned, specifically the rights of
translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on
microfilms or in any other physical way, and transmission or information storage and retrieval,
electronic adaptation, computer software, or by similar or dissimilar methodology now
known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are
exempt from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information
in this book are believed to be true and accurate at the date of publication. Neither the pub-
lisher nor the authors or the editors give a warranty, express or implied, with respect to the
material contained herein or for any errors or omissions that may have been made. The
publisher remains neutral with regard to jurisdictional claims in published maps and institu-
tional affiliations.
Cover pattern © Melisa Hasan
Printed on acid-free paper
This Palgrave Macmillan imprint is published by Springer Nature

The registered company is Springer International Publishing AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Acknowledgements
If there is one thing I have learnt out of writing this book, it is that life is
multimodal. Meaning comes to us in all forms at all times, and getting the
best out of it depends entirely on our ability to make sense of our acquain-
tances, readings, experiences and circumstances.
As the product of a multimodal life, a book on multimodality (or on
any other topic, for that matter) is hardly the result of a person’s effort,
but rather it is the outcome of a community’s work. For this reason, I
would like to thank the following people for their special contributions:
Ark Globe Academy and King’s College for granting me the time and the
resources to work on this publication;
Prof Sabine Braun and Dr Dimitris Asimakoulas for their continued help
and support when this work was in its infancy;
Prof Margaret Rogers for her extensive feedback and lots of food for
thought;
Giacinto Palmieri for being my academic sparring partner;
Filon, for reasons he knows very well.
v
Contents
1 A New Model for Source Text Analysis in Translation 1
2 On the Road to Multimodality: Semiotics 15
3 Multimodal Meaning in Context: Pragmatics 37
4 Analysing Multimodal Source Texts for Translation:

A Proposal 61
5 Multimodal ST Analysis: The Model Applied 99
6 Multimodal ST Analysis: Current Status,

Opportunities, Ways Forward 159
References 167
Index 175
vii
List of Figures
Fig. 3.1 Meaning detection scheme, after Levinson (2000: 188) 47

Fig. 4.1 Lord Kitchener poster, Alfred Leete, 1914 69
Fig. 4.2 Relationships of status, after Martinec and Salway (2003: 351) 74
Fig. 4.3 Visual-verbal logico-semantic relationships—after Martinec
and Salway (2005: 360) 75
Fig. 4.4 Cross-media interaction relations (COSMOROE)—after
Pastra (2008: 308) 79
Fig. 4.5 Visual-verbal relations, full diagram 83
Fig. 5.1 Steve Bell (2013) The Backbone. The Guardian,
22 May 2013 105
Fig. 5.2 Latymer restaurant review, Surrey Life 110
Fig. 5.3 Dr Seuss (2004) The Cat in the Hat, p. 1. London:
Harper Collins 114
Fig. 5.4 Pat Bagley (2013) Bush Library. Salt Lake Tribune,
24 April 2013 118
Fig. 5.5 Arthur Thomson (1964), A Handbook of Anatomy for
Art Students, p.34 123
Fig. 5.6 EPA—Climate Concepts, Student Guide to Global
Climate Change 126
Fig. 5.7 Yalta Conference entry, from Wikipedia 130
Fig. 5.8 Save the Children 134
Fig. 5.9 UNICEF (2007) Water Campaign 139
Fig. 5.10 WWF (2012) Earth Hour 143
Fig. 5.11 Reproduction of American Red Cross leaflet 148
Fig. 5.12 Coldwater Creek advertisement 151
ix
List of Tables
Table 3.1 Pragmatic approaches to meaning, after Levinson

(2000: 195) 50
Table 4.1 Sender’s meaning 71
Table 4.2 Sender’s meaning, first elaboration 84
Table 4.3 Tripartite structure of the model for the analysis of
multimodal STs 89
Table 4.4 Transcription table for static multimodal texts, after
Baldry and Thibault (2005: 29) 91
Table 4.5 Transcription table for dynamic multimodal texts, after
Baldry and Thibault (2005) 92
Table 4.6 Table of transcription and analysis for dynamic
multimodal texts 93
Table 4.7 Table of transcription and analysis for static multimodal
texts95
Table 5.1 Summary table, The Backbone 108
Table 5.2 Summary table, Latymer restaurant review 112
Table 5.3 Summary table, The Cat in the Hat116
Table 5.4 Summary table, Bush Liebury 120
Table 5.5 Summary table, Anatomy for Art Students124
Table 5.6 Summary table, Climate Concepts 128
Table 5.7 Summary table, Yalta Conference 132
Table 5.8 Summary table, Save the Children 136
Table 5.9 Summary table, Unicef Water Campaign 141
Table 5.10 Summary table, WWF Earth Hour campaign 145
Table 5.11 Summary table, American Red Cross leaflet 150
Table 5.12 Summary table, Coldwater Creek 153
xi
CHAPTER 1
A New Model for Source Text Analysis

in Translation
Abstract What is considered important in translation has undergone sev-

eral changes over time. The way translation is approached has changed,
also because source texts have changed. Modern translators more than
ever find themselves working on texts that communicate by more than
‘just’ words. Translation is an activity that is growing ever more complex
and cannot be accounted for in linguistic terms any longer. Given the lack
of a general picture of multimodal translation in the literature, a new study
is needed to move towards a more comprehensive understanding of mul-
timodal translation. This book offers a model for multimodal ST analysis
that can be used as a tool to improve our understanding of how multi-
modal texts are organised to convey meaning, and of what this means for
their translation.
Keywords Translation theory • Multimodality • Translation • ST analy-

sis • Equivalence
Scholars in translation studies have debated for decades what the inform-
ing principle of the activity of translation should be. The roots of this
debate, however, date back to long before the advent of the discipline
itself.
© The Author(s) 2018 1

S. Dicerto, Multimodal Pragmatics and Translation,
Palgrave Studies in Translating and Interpreting,
https://doi.org/10.1007/978-3-319-69344-6_1
2 S. DICERTO
What was and is considered important in translation has undergone

several changes over time, leading to the literature in the field making dif-
ferent, and sometimes contrary, claims on the subject. The way translation
is approached has changed, not only because of the influence of new trans-
lation approaches that have informed translator training but also because
source texts have changed, too. Different theories of translation aim to
provide translators with varying views of the translation process that can
help them make translation choices based on different principles.
Translation guidelines have a very long history, starting from the likes of
Cicero, Horace and St Jerome all the way until today, when the discipline
is investigating areas such as audiovisual translation (e.g. Díaz-Cintas
2004), advertising (e.g. Munday 2004) and social media (Desjardins
2017). As outlined by Munday, technology, for example, has played an
important role in the evolution of texts and the development of new
approaches to tackle their translation (2012: 267–283).
Until relatively recently, translation theory has evolved with a strong
focus on the verbal component of texts, whether from a linguistic or a
cultural viewpoint; however, modern translators more than ever find
themselves working on texts in which the message is communicated by
more than ‘just’ words. In an age of technological advancements that are
providing people with new forms of communication, or increasing the
communicative potential of forms previously available, the combined use
of words and images, that is, multimodality, is increasingly coming to the
fore. This is now widely acknowledged directly and indirectly by the pres-
ence in translation studies of a wealth of research investigating non-
linguistic textual resources (e.g. Orero 2004; Ventola et al. 2004; Chiaro
et al. 2008; O’Sullivan and Jeffcote 2013). Source texts (STs) nowadays
are increasingly multimodal, as modern technology provides users with
the ability to weave into their texts resources other than language ever
more simply and cost-effectively. For a number of communicative forms,
multimodality is no longer just an option—rather, for some types of media
(e.g. web pages), it is becoming a prerequisite. For example, a study car-
ried out by BuzzSumo on 100 million online articles points out that an
article is almost twice as likely to be shared on social media if it includes at
least one image than if it does not; therefore, authors of content with an
aspiration to go ‘viral’ are recommended by BuzzSumo to ‘add a photo to
EVERY post’ as these ‘determine what potential readers see before they
A NEW MODEL FOR SOURCE TEXT ANALYSIS IN TRANSLATION 3
even visit your article’ (Kagan 2014). As the use of non-verbal sources of
meaning in a variety of texts for all sorts of purposes (e.g. technical texts,
illustrated books, comics, websites) is ubiquitous in today’s world, it is
worth considering carefully how these resources (i.e. images and sounds)
interact with the verbally communicated message (written or spoken),
sometimes even changing its meaning drastically, as will be shown. More
specifically, different textual resources can be said to influence each other
and create a multimodal message, the interpretation of which requires dif-
ferent types of literacy and the ability to combine them. Examples of these
types of text come from all domains, and the influence of the multimodal
phenomenon on translation is pervasive—medical texts, promotional
material, catalogues, webpages, advertisements, newspaper articles, com-
ics, user manuals are all translatable materials, and they are just a few
examples of potential STs likely to include elements of multimodality the
translator needs to take into account.
In line with the continuous effort in translation studies to develop rel-
evant frameworks in order to support developments in the discipline with
adequate theoretical tools, this book intends to offer a model for multi-
modal ST analysis that can be used as a tool to improve our understanding
of how multimodal texts are organised to convey meaning and of what this
means when it comes to rendering them into a target text (TT). Therefore,
the central focus of this work is ST analysis—this is defined by Williams
and Chesterman as the area of translation studies that consists of a careful
analysis of the text’s potentially problematic aspects as a step in the prepa-
ration for translation (2014: 9).
The multimodal focus of this book distinguishes it from other work in
the same area, shifting the spotlight from language to a detailed analysis of
how a variety of multimodal text types convey meaning. This work includes
theoretical provision for both static and dynamic multimodal texts (i.e.
respectively, texts including images and written language, and texts which
also make use of moving images, spoken language and/or sound sources),
irrespective of their genre, following Munday’s suggestion that concepts
from research on visual and multimodal communication need to be incor-
porated into the study of all types of translation (2004: 199). The more
detailed discussion in later chapters is, however, focused on the analysis of
verbal-visual interactions in static multimodal texts, for reasons discussed
later in this chapter.
4 S. DICERTO
1.1 Moving Towards Multimodality

Translation studies scholars have only relatively recently started to be alert
to the particular problem of the interaction between different semiotic
sources of meaning and the impact of this interaction on translation activ-
ity. Semiotic resources other than language, which can and do intervene in
the composition of texts, are largely under-researched in translation stud-
ies with a few notable exceptions regarding specific sub-areas of the field,
such as audiovisual translation (AVT) and the translation of comics.
As observed at the beginning of this introductory chapter, translation
theories have had different orientations over time. The long-standing
debate around the nature of translation often seems to have worked on the
basis of dichotomies: translation can be ‘free’ or ‘literal’, ‘overt’ or ‘covert’,
‘semantic’ or ‘communicative’; equivalence in translation can be ‘formal’
or ‘dynamic’. However, all these concepts were largely elaborated from a
verbal point of view, mostly without explicitly addressing the contribu-
tions made to a message by other textual resources.
The notion of equivalence is a good example of this: some theoretical
frameworks (e.g. Jakobson 1959; Nida 1964; Newmark 1981; Baker
2011) have presented equivalence as the key to achieving effective transla-
tion, and even though the various authors offer different takes on the
subject, they work with a concept of equivalence that is mostly verbal.
Equivalence has been studied at different levels (e.g. word equivalence,
grammatical equivalence, textual equivalence) and from various angles,
but mostly in relation to the verbal features of texts.
A similar discussion could take place about the concept of translation
norms (cf. Toury 1995; Chesterman 1997; Hermans 1999; Pedersen
2011); whether norms are seen as prescriptive or descriptive, bottom-up
or top-down, identified translation norms are investigated with a strong
emphasis on the linguistic component of a text in translation, even if the
text in itself includes other sources of meaning.
Skopos theory (Vermeer 1996), which sees translation as a goal-
oriented activity, does not address explicitly the multimodal aspect of
meaning production either. The central tenet of Skopos theory is that all
aspects of translation should be governed by the purpose of the translation
activity itself. As shown later in this work, translation as a process and as a
product is indeed highly influenced by its purpose; however, Skopos the-
ory does not cater explicitly for scenarios in which the translation includes
ST elements that cannot be changed or that may be difficult to change in
light of the translation’s purpose. This is often the case with texts includ-
ing visual content; for example, translators of comics generally have to fit
their work around the visual content of the original cartoon, and subtitlers
of audiovisual texts by and large can only intervene in the verbal content
they need to translate, as the visual component cannot be easily adapted.
Among the translation models that explicitly mention multimodal
meaning, Reiss’ research on text types (1977/1989), a precursor to
Skopos theory, is worthy of note. However, Reiss claims that in multi-
modal texts, the verbal content is somehow supported by the presence of
other textual resources. This view seems to have limitations: the role of
other semiotic resources is not only to support the verbal content (or vice
versa), but rather to merge with it to produce a multimodal message.
Although Reiss at first claimed that there were four text types (informa-
tive, operative, expressive and the ‘multimedial’), she later modified her
position, claiming that multimedial texts are actually a ‘hyper-type’, a
‘super-structure for the three basic types’ that ‘possesses its own regulari-
ties, which ought to be taken into account when translating, besides – and
above – the regularities of the three basic forms of written communica-
tion’ (1981/2004: 164). However, these regularities are not investigated
in any detail by Reiss, who in her work was more concerned with the
analysis of the three basic text types she identified than with issues regard-
ing multimodality.
The approach proposed by Snell-Hornby (1995), one that aims to inte-
grate approaches from linguistics and translation, has as its main focus the
linguistic aspect of texts as well; Snell-Hornby, however, acknowledges the
importance of investigating what she terms ‘audiomedial’ texts in later
work, in which she mentions how this ‘might well prove to be a topic
worth resurrecting’ (1997: 288). She later goes on to discuss a few aspects
related to multimodality (2006: 84–90), albeit briefly, pointing out how
virtually no research on multimodal aspects of translation was carried out
until the 1980s; in the same context, Snell-Hornby (2006) proposes a
classification of texts that depend on non-verbal elements (which, using
her terminology, are divided into ‘multimedial’, ‘multimodal’, ‘multisemi-
otic’ and ‘audiomedial’) and reviews studies that deal with translation
challenges closely connected to specific genres of such texts (e.g. the rhet-
oric and speakability of texts that are ‘written to be spoken’). However,
her focus is mainly on the audiomedial category, that is, texts that are
essentially language-based and whose multimodal component lies in the
fact that they are written to be performed (e.g. theatre play scripts). Such
6 S. DICERTO
texts do not necessarily show meaningful interaction between semiotic

modes in their textual organisation, but rather they show issues connected
to the linguistic content and its delivery in performance that can influence
translation; therefore, the review provided by Snell-Hornby’s contribu-
tion has a different focus from the one adopted in this book, which never-
theless acknowledges issues of multimodal interaction as a reality worth
exploring.
While certainly not intended as an exhaustive list of the major works on
translation theory (for a more complete overview, see Pym 2010), these
few examples show how translation studies has had its focus set on mean-
ing as a linguistic product; as O’Sullivan claimed recently,
translation is usually thought of as being about the printed word […].

[Translation] theory remained until recently almost exclusively word and
script-based, [and translation] is generally conceived as the rendering of
written text into written text; the particular resources used to write the text,
and the other semiotic modes used to construct meaning around the text,
have been all but ignored. (2013: 2)
Simply looking at how translation theory has dealt so far with multi-
modal meaning would produce a rather incomplete result; indeed, the
problem might also be looked at from an alternative perspective, that is,
reviewing research focusing on multimodality that has also dealt with
translation issues. However, existing research on multimodality seems
mostly to address the way visual and verbal content create mutual connec-
tions, without any mention of translation (e.g. Lemke 1998; Marsh and
White 2003; Baldry and Thibault 2005; Martinec and Salway 2005;
Salway and Martinec 2005; Hughes et al. 2007; Pastra 2008; Liu and
O’Halloran 2009; Bateman 2014). In some cases, these studies have only
addressed the topic of visual/verbal relations; in other cases, they have
started a discussion on how these relations are connected with meaning
production. In particular, the work by Bateman (2014) represents a com-
prehensive resource on the various approaches to visual/verbal relations
and the production of multimodal narratives. However, to my knowledge,
at the time of writing, no extensive study on multimodality has taken one
further step, linking visual/verbal relations not only to issues connected to
meaning but also to translation matters.
Research trying to build this connection in relation to written texts still
seems to be in its infancy. In some literature on multimodal translation,
scholars have expressed the view that the larger picture of multimodal
translation would be better explained with a multidisciplinary approach

(Cattrysse 2001; Remael 2001), but without any indication of how this
multidisciplinary approach should be organised; also, to date there is no
consensus as to what disciplines should be involved in this approach. The
vast majority of studies on multimodal translation conducted to date
address specific areas of multimodal translation (e.g. studies on screen
translation, such as Gambier and Gottlieb 2001; Chaume 2004; Pérez-
González 2014 and studies on comics, e.g. Kaindl 2004, Zanettin 2004).
The picture provided in each case is informative with regard to the area
addressed, but not general enough to account for the activity of multi-
modal translation irrespective of ST genre. Some studies on the topic of
multimodal translation are even more specialised, dealing with multimodal
meaning in relation to specific aspects of AVT, such as verbal-visual refer-
ence (e.g. Baumgarten 2008); these works seem particularly focused on a
practice-based approach dealing with a specific type of audiovisual transla-
tion (e.g. Díaz-Cintas and Remael 2007; Pedersen 2008, 2011 for subti-
tling) or even specific texts as case studies (e.g. Taylor 2003), adopting a
narrow scope. Studies on AVT thus show a lack of general theoretical
orientation, which results in the discipline suffering from a degree of frag-
mentation. Examples of all these types of work can be found in the edited
volume on AVT by Orero (2004) and in the special issue of JoSTrans on
multimodality edited by O’Sullivan and Jeffcote (2013). Although this
type of research offers analytical insights and specialist support that can be
useful for specific translation tasks, the issues they deal with can and should
also be looked at from a wider perspective to further our knowledge of
multimodal texts and their translation in general terms. Indeed, it is not
only audiovisual translators who find themselves dealing with texts that are
about ‘more than words’; given the ubiquitous nature of multimodality in
modern texts, an understanding of the multimodal phenomenon in gen-
eral terms should be part of every contemporary translator’s ‘toolbox’,
regardless of their specialist focus.
Given the lack of a generally applicable framework for the analysis of
multimodal texts in translation regardless of genre in the current litera-
ture, the present volume aims to bring together the findings of previous,
more specialised work, at the same time moving towards a more compre-
hensive understanding of multimodality in translation from the perspec-
tive of its starting point, the ST, and the challenges this poses. In order to
do this, genre-specific viewpoints will be reframed, and it will be argued
that a composite approach is needed, looking beyond social semiotics.
8 S. DICERTO
The need for new perspectives in multimodality has been stated in a

study by Mubenga (2009), although his own study is more narrowly
focused on the development of a model for film discourse analysis called
‘Multimodal Pragmatic Analysis (MPA)’. In Mubenga’s own words, ‘MPA
[…] is the confluence of pragmatics […] and multimodality’ (2009: 467).
However, the pragmatic component in Mubenga’s model is limited, and
the model itself appears to be more like an application of the Hallidayan
systemic functional theory to multimodal texts than a multidisciplinary
approach centred on pragmatics. Nevertheless, Mubenga’s claim that
pragmatics should play a more central role in the study of audiovisual STs
is in itself an important innovation and an attempt to point explicitly
towards approaches that can contribute to shaping the future of multi-
modal text analysis in translation. The validity of Mubenga’s suggestion is
also supported by more recent observations from other researchers (see,
e.g. Clark 2011) who have claimed that integrating pragmatics and studies
on multimodality could benefit both fields of research. At present, research
is being carried out, for example, on the possibility of expanding Relevance
Theory to embrace visual aspects of ostensive communication (Forceville
and Clark 2014), which is in itself an implicit acknowledgement of the
requirement to integrate a component of multimodality in cognitive mod-
els of communication. Also, studies on AVT have made use of concepts
taken from pragmatics to investigate case studies of screen translation (e.g.
Pedersen 2008; Desilla 2014), suggesting that a synergy between the two
disciplines can be suitable for field research in translation studies and that
this synergy should be investigated in further depth at a theoretical level.
However, the focus of theories in pragmatics, such as the ones based on
cooperativeness and relevance (respectively Grice 1989 and Sperber and
Wilson 1986/1995), has traditionally been on the (spoken) word.
Therefore, in order to propose an extensive application of pragmatic the-
ory to multimodal STs which not only ‘borrows’ individual concepts but
also uses pragmatic models as pillars of the analysis, it is important to
ascertain the applicability of pragmatic analysis to multimodal
communication.
1.2 Overview of the Book

In order to map out the path through the theoretical concepts relevant to
this interdisciplinary study, Chap. 2 provides a selective review of existing
literature focusing on social semiotics and multimodality. A brief review of
semiotic research on sign systems sets out the first theoretical foundations
of this book. Indeed, a general understanding of the role of different types
of signs in communication is an important first step towards grasping the
general multimodal picture. Major research in the area, such as the studies
carried out by Peirce (1960), Jakobson (1968), Eco (1976) and Barthes
(1977), are used to provide the semiotic basis for the study of meaning as
produced in the different semiotic systems.
Literature on social semiotics is acknowledged for its fundamental
role in influencing current views of multimodality and of the interaction
between signs from different semiotic modes. Social semiotics has
engaged with how semiotically different modes can produce meaning in
cooperation with one another, why this happens and what the possible
implications in terms of text comprehension can be. Studies on this topic
include Van Leeuwen (1999), Kress and Van Leeuwen (2001, 2006),
Norris (2004), Machin (2007) and Liu and O’Halloran (2009). Studies
and reviews on multimodality with specific reference to visual/verbal
relations and their connection to meaning (mainly Marsh and White
2003; Martinec and Salway 2005; Pastra 2008; Bateman 2014) are also
reviewed here.
However, it will be argued that defining multimodal text comprehen-
sion as the comprehension of the message carried by the single modes and
by their interaction would mean taking too narrow a view of how multi-
modal texts convey meaning. Understanding individual textual resources
does not explain fully how meaning is derived from a text, and nor does
understanding their interaction. Indeed, the same message can be com-
municated with different intentions under different circumstances, and
the context in which a message is communicated influences its meaning.
Part of the literature review is therefore dedicated to discussing contextual
influence as explained in pragmatics (Chap. 3), through the work of schol-
ars like Grice (1989) and Sperber and Wilson (1995).
It is well known that pragmatics has already been applied to translation,
but the relevant literature directly addressing multimodal translation is
relatively scarce and fragmented. Pragmatics has influenced some models
in translation studies (see House 1997; Baker 2011 for notable examples),
but in spite of this, the only full-fledged attempt towards a translation
theory in which pragmatics occupies a central role comes from Gutt
(2000). The insights pragmatics can offer into the activity of translation
are analysed to establish to what extent this discipline can influence the
current study on multimodal translation (Chap. 3).
10 S. DICERTO
Chapter 4 presents the development of the proposed interdisciplinary

model, grounding it in the considerations made in Chaps. 2 and 3 and
building progressively on this foundation to establish its full form. Some
of the frameworks of analysis for visual-verbal relations already mentioned,
such as those by Martinec and Salway (2005) and by Pastra (2008), are
used extensively for the purpose of building the model. Martinec and
Salway offer a classification of visual-verbal relations based on the work by
Halliday and Barthes, and Pastra makes an interesting attempt to connect
visual-verbal relationships with multimodal meaning in a systematic way.
The model proposed in this book is then applied to authentic examples
of written multimodal texts in order to show the applicability of its theo-
retical tenets (Chap. 5). To this end, a set of multimodal texts from various
genres is analysed by applying the proposed model. This analysis is pre-
sented in three sections based on the notion of text type after Reiss
(1977/1989): expressive, operative and informative texts. Although the
model developed in this study sets out to include provision for the analysis
of all types of multimodal STs, as a first step, its application is exemplified
using only static multimodal texts (i.e. multimodal texts which include
images and written language, but not dynamic texts, which also make use
of moving images, spoken language and/or sound sources). The reasons
for the focus on static texts include the following. The range of genres is
very broad in both static and dynamic multimodal texts. A comprehensive
discussion of both dynamic and static multimodal texts would therefore
go beyond the scope of a single volume of the current kind. As dynamic
texts develop over both time and space, and often involve language, images
and sounds, their analysis is more complex and lengthier, both in terms of
processing time and space occupied by the final result, than that of static
texts. These latter only develop over space, making use of language and
images, meaning that their analysis can be presented more concisely,
allowing for more examples from various genres to be presented.
Nevertheless, given the purpose to include theoretical provision for all
types of STs, the temporal aspect will still be discussed in Chap. 4, bearing
in mind the need to provide the basis for future research on dynamic texts.
Chapter 6 offers some reflections on the analysis of the selected texts,
both in general terms regarding the model’s validity and specifically
regarding the model’s application to the three text types. The model’s
contribution to the current research on multimodality and translation is
also discussed, at the same time acknowledging its limitations, suggesting
ideas for its further improvement and for the research that could follow,
including also its potential applicability to dynamic multimodal texts.
References
Baker, M. (2011). In Other Words: A Coursebook on Translation (2nd ed.).
New York: Routledge.
Baldry, A., & Thibault, P. J. (2005). Multimodal Transcription and Text Analysis.
London: Equinox.
Barthes, R. (1977). Rhetoric of the Image. In R. Barthes (Ed.), Image–Music–Text
(pp. 32–51). London: Fontana.
Bateman, J. A. (2014). Text and Image: A Critical Introduction to the Visual/
Verbal Divide. Oxon: Routledge.
Baumgarten, N. (2008). Yeah, That’s It!: Verbal Reference to Visual Information
in Film Texts and Film Translations. Meta, LIII(1), 6–24.
Cattrysse, P. (2001). Multimedia & Translation: Methodological Considerations.
In Y. Gambier & H. Gottlieb (Eds.), (Multi) Media Translation: Concepts,
Practices and Research (pp. 1–12). Amsterdam/Philadelphia: Benjamins.
Chaume, F. (2004). Cine y Traducción. Madrid: Ediciones Cátedra.
Chesterman, A. (1997). Memes of Translation. Amsterdam/Philadelphia:
Benjamins.
Chiaro, D., Heiss, C., & Bucaria, C. (Eds.). (2008). Between Text and Image:
Updating Research in Screen Translation. Amsterdam/Philadelphia: Benjamins.
Clark, B. (2011, September). Relevance and Multimodality. Paper Presented at
Analysing Multimodality: Systemic Functional Linguistics Meets Pragmatics,
Loughborough University.
Desilla, L. (2014). Reading Between the Lines, Seeing Beyond the Images: An
Empirical Study on the Comprehension of Implicit Film Dialogue Meaning
Across Cultures. The Translator, 20(2), 194–214.
Desjardins, R. (2017). Translation and Social Media in Theory, in Training and in
Professional Practice. London: Palgrave Macmillan.
Díaz-Cintas, J., & Remael, A. (2007). Audiovisual Translation: Subtitling.
Manchester: St Jerome.
Díaz-Cintas, J. (2004). In Search of a Theoretical Framework for the Study of
Audiovisual Translation. In P. Orero (Ed.), Topics in Audiovisual Translation
(pp. 21–34). Amsterdam/Philadelphia: Benjamins.
Eco, U. (1976). A Theory of Semiotics. Bloomington: Indiana University Press.
Forceville, C., & Clark, B. (2014). Can Pictures Have Explicatures? Linguagem
em (Dis)curso, 14(3), 451–472.
Gambier, Y., & Gottlieb, H. (Eds.). (2001). (Multi) Media Translation: Concepts,
Practices and Research. Amsterdam/Philadelphia: Benjamins.
Grice, H. P. (1989). Studies in the Way of Words. Cambridge, MA: Harvard
University Press.
Gutt, E. A. (2000). Translation and Relevance: Cognition and Context (2nd ed.).
Manchester: St. Jerome.
12 S. DICERTO
Hermans, T. (1999). Translation in Systems: Descriptive and System-Oriented

Approaches Explained. Manchester: St. Jerome.
House, J. (1997). Translation Quality Assessment: A Model Revisited. Tübingen:
Gunter Narr.
Hughes, M., Salway, A., Jones, G., & O’Connor, N. (2007). Analysing Image-
Text Relations for Semantic Media Adaptation and Personalisation. In Second
International Workshop on Semantic Media Adaptation and Personalization,
Uxbridge.
Jakobson, R. (1959). On Linguistic Aspects of Translation. In L. Venuti (Ed.), The
Translation Studies Reader (2nd ed.). London: Routledge.
Jakobson, R. (1968). Language in Relation to Other Communication Systems. In
R. Jakobson (1971) Selected Writings. The Hague: Mouton.
Kagan, N. (2014). Why Content Goes Viral: What Analyzing 100 Million Articles
Taught Us [blog post]. Available at: http://www.huffingtonpost.com/noah-
kagan/why-content-goes-viral-wh_b_5492767.html. Last accessed 19 June
2017.
Kaindl, C. (2004). Multimodality in the Translation of Humour in Comics. In
E. Ventola, C. Cassily, & M. Kaltenbacher (Eds.), Perspectives on Multimodality
Kress, G., & Van Leeuwen, T. (1996/2006). Reading Images: The Grammar of
Visual Design. London: Routledge.
Kress, G., & Van Leeuwen, T. (2001). Multimodal Discourse: The Modes and
Media of Contemporary Communication. London: Hodder Arnold.
Lemke, J. L. (1998). Multiplying Meaning: Visual and Verbal Semiotics in
Scientific Text. In J. Martin & R. Veel (Eds.), Reading Science: Crticial and
Functional Perspectives on Discourses of Science (pp. 87–113). London:
Routledge.
Liu, Y., & O’Halloran, K. L. (2009). Intersemiotic Texture: Analyzing Cohesive
Devices Between Language and Images. Social Semiotics, 19(4), 367–388.
Machin, D. (2007). Introduction to Multimodal Analysis. New York: Oxford
University Press.
Marsh, E. E., & White, M. D. (2003). A Taxonomy of Relationships Between
Images and Text. Journal of Documentation, 59(6), 647–672.
Martinec, R., & Salway, A. (2005). A System for Image-text Relations in New
(and Old) Media. Visual Communication, 4(3), 337–371.
Mubenga, K. S. (2009). Towards a Multimodal Pragmatic Analysis of Film
Discourse in Audiovisual Translation. Translator’ Journal, 54(3), 466–484.
Munday, J. (2004). Advertising: Some Challenges to Translation Theory. The
Translator, 10, 199–219.
Munday, J. (2012). Introducing Translation Studies: Theories and Applications
(4th ed.). Oxon: Routledge.
Newmark, P. (1981). Approaches to Translation. Oxford: Pergamon.
Nida, E. (1964). Toward a Science of Translating. Leiden: E.J. Brill.

Norris, S. (2004). Analysing Multimodal Interaction: A Methodological Framework.
London: Routledge.
O’Sullivan, C. (2013). Introduction: Multimodality as Challenge and Resource
for Translation. In C. O’Sullivan & C. Jeffcote (Eds.), Special Issue on
Translating Multimodalities, JoSTrans (Vol. 20, pp. 2–14).
O’Sullivan, C., & Jeffcote, C. (Eds.). (2013). Special Issue on Translating
Multimodalities, JoSTrans (Vol. 20).
Orero, P. (Ed.). (2004). Topics in Audiovisual Translation. Amsterdam/
Philadelphia: Benjamins.
Pastra, K. (2008). COSMOROE: A Cross-Media Relations Framework for
Modelling Multimedia Dialectics. Multimedia Systems, 14, 299–323.
Pedersen, J. (2008). High Felicity: A Speech Act Approach to Quality Assessment
in Subtitling. In D. Chiaro, C. Heiss, & C. Bucaria (Eds.), Between Text and
Image: Updating Research in Screen Translation. Amsterdam/Philadelphia:
Benjamins.
Pedersen, J. (2011). Subtitling Norms for Television. Amsterdam/Philadelphia:
Benjamins.
Peirce, C. S. (1960). Collected Writings. Cambridge: Harvard University Press.
Pérez-González, L. (2014). Audiovisual Translation: Theories, Methods and Issues.
London: Routledge.
Pym, A. (2010). Exploring Translation Theories. London: Routledge.
Reiss, K. (1977/1989). Text Types, Translation Types and Translation Assessment.
In A. Chesterman (Trans. & Ed.), Readings in Translation Theory
(pp. 105–115). Helsinki: Oy Finn Lectura Ab.
Reiss, K. (1981/2004). Type, Kind and Individuality of Text: Decision Making in
Translation. In S. Kitron (Trans.) & L. Venuti (Ed.), The Translation Studies
Reader (pp. 160–171). London: Routledge.
Remael, A. (2001). Some Thoughts on the Study of Multimodal and Multimedia
Translation. In Y. Gambier & H. Gottlieb (Eds.), (Multi) Media Translation:
Concepts, Practices and Research (pp. 13–22). Amsterdam/Philadelphia:
Benjamins.
Salway, A., & Martinec, R. (2005). Some Ideas for Modelling Image-Text
Combinations (Department of Computing Technical Report CS-05-02).
Guildford: University of Surrey.
Snell-Hornby, M. (1995). Translation Studies: An Integrated Approach.
Amsterdam/Philadelphia: Benjamins.
Snell-Hornby, M. (1997). Written to Be Spoken: The Audio-medial Text in
Translation. In A. Trosborg (Ed.), Text Typology and Translation (pp. 277–290).
Snell-Hornby, M. (2006). The Turns of Translation Studies: New Paradigms or
Shifting Viewpoints? Amsterdam/Philadelphia: Benjamins.
14 S. DICERTO
Sperber, D., & Wilson, D. (1986/1995). Relevance: Communication and

Cognition (2nd ed.). Oxford: Blackwell.
Taylor, C. J. (2003). Multimodal Transcription in the Analysis, Translation and
Subtitling of Italian Films. The Translator, 9(2), 191–205.
Toury, G. (1995). Descriptive Translation Studies—and Beyond. Amsterdam/
Van Leeuwen, T. (1999). Speech, Music, Sound. London: Macmillan.
Ventola, E., Cassily, C., & Kaltenbacher, M. (Eds.). (2004). Perspectives on
Multimodality. Amsterdam/Philadelphia: Benjamins.
Vermeer, H. (1996). A Skopos Theory of Translation. Heidelberg: TEXTconTEXT.
Williams, J., & Chesterman, A. (2014). The Map: A Beginner’s Guide to Doing
Research in Translation Studies. Oxon: Routledge.
Zanettin, F. (2004). Comics in Translation Studies. An Overview and Suggestions
for Research. In Tradução e interculturalismo. VII Seminário de Tradução
Científica e Técnica em Língua Portuguesa (pp. 93–98). Lisboa: União Latina.
CHAPTER 2
On the Road to Multimodality: Semiotics
Abstract Given the various perspectives that can be adopted to under-

stand multimodal texts, a theoretical model intending to study multimo-
dality for translation purposes needs to combine different approaches.
Indeed, literature capable of contributing to the multimodal debate comes
from various fields: semiotics—and particularly social semiotics—analyses
how different modes are organised, their similarities/differences, and their
meaning-making abilities; literature on visual-verbal relationships accounts
for the way modes interact; pragmatics has relatively recently joined the
multimodal debate, offering support to understand multimodality in con-
text; translation studies has investigated to some extent the challenges
posed by non-linguistic textual resources. This chapter reviews the rele-
vant literature on semiotics (in particular social semiotics) to pave the way
to the construction of the model.
Keywords Multimodality • Translation • Social semiotics • ST analysis

• Visual • Verbal • Aural
Given the complexities of multimodal issues and the various perspectives

that can be adopted to analyse and understand multimodal texts, a theo-
retical model aiming to study multimodality for translation purposes is
likely to be one that combines different approaches to produce a

https://doi.org/10.1007/978-3-319-69344-6_2
16 S. DICERTO
ultidisciplinary result. As discussed in Chap. 1, literature with the poten-

m
tial to contribute to the debate on multimodality in translation comes
from various fields. Semiotics—and in particular social semiotics—analyses
how different modes are organised, the similarities and differences in their
organisation, and their meaning-making abilities; literature expressly on
multimodality can contribute to the analysis of intersemiotic relationships
to account for the way the modes interact; pragmatics has more or less
recently joined the academic debate on multimodality, having been identi-
fied as a discipline likely to contribute to the understanding of multimodal
meaning in context; and work in translation studies has investigated, at
least to some extent and for some specific genres, the challenges posed to
translators by the visual and sound contributions to message formation.
The literature relevant to the development of the model for the study
of multimodality in translation is therefore rich and varied. To begin, this
chapter reviews literature on social semiotics and multimodality, starting
from an investigation of the organisation of individual modes, their differ-
ences and overlaps (Sect. 2.1) in order to then discuss the meaningful
interaction between modes and general multimodal message formation as
seen in social semiotics and studies on multimodality (Sect. 2.2).
Contextual meanings are examined in the following chapter.
2.1 The Organisation of Signs: The Realm

of Semiotics
The chosen starting point of this journey through multimodal meaning is

the realm of semiotics. Semiotics, according to Eco, is ‘concerned with
everything that can be taken as a sign’ (1976: 7), and signs, in Chandler’s
view, are ‘anything which “stands for” something else’ (2007: 3).
The reason why this multimodal journey is starting from semiotics is
that the very definition of ‘multimodal text’ adopted in this study, to
which we return below, is based on the notion of ‘semiotic system’ (or
‘semiotic mode’), namely, ‘a system of meaning’ (Halliday and Webster
2003: 2). Indeed, Anstey and Bull define multimodal texts as follows: ‘[a]
text may be defined as multimodal when it combines two or more semiotic
systems’ (2010: online).
This definition of multimodal text is arguably too broad: any written text
is conveyed by means of language and typography, and any spoken text by
means of language and paralinguistic features; therefore, even texts exclu-
sively language-based would fall into the multimodal category according to
ON THE ROAD TO MULTIMODALITY: SEMIOTICS 17
this definition. In my view, the differentiating characteristic of ‘multimodal’

texts is in the relationship between the semiotic systems at play. It is certainly
the case that apparently ‘monomodal’ texts make use of more than one
semiotic system to convey meaning; however, in a language-based ‘mono-
modal’ text, systems other than language are ancillary to language itself and
cannot be independent of it: typography or paralinguistic features cannot
exist without language. The same does not hold for multimodal texts as, for
example, language and images are potentially capable of producing mes-
sages independent of each other, such as dictionaries and paintings. In a
multimodal text, their use becomes complementary, and as will be seen, the
various multimodal components in a multimodal message cannot be extri-
cated from one another while still maintaining cohesion; nevertheless, in
principle language and images have the theoretical ability to produce narra-
tives independent of each other. Therefore, Anstey and Bull’s definition may
be redrawn as follows: a text may be defined as multimodal when it combines
at least two semiotic systems that are not necessarily ancillary to one another.
For the purposes of a work on multimodality, it is self-evidently impor-
tant to understand what semiotic modes are included in this category as,
for example, typography and paralinguistic features have already been
excluded. The number of existing semiotic systems is an ongoing subject
of debate within the semiotic community. As Chandler points out, ‘[d]
ifferent theorists favour different taxonomies, and while structuralists
often follow the “principle of parsimony” – seeking to find the smallest
number of groups deemed necessary – “necessity” is defined by purposes’
(Chandler 2007: 149, italics in original). Considering that the current
study looks at semiotics in the context of multimodality in translation,
previous research in both fields needs to be taken into consideration when
choosing a taxonomy suitable for the stated research purpose. The selected
taxonomy also needs to comply with the definition of multimodal text
provided above, that is, it needs to include semiotic systems that are in
theory capable of producing messages independent of one another.
The seminal work by Barthes (1977) looks at images and sounds (i.e.
visual and non-verbal aural signs) as the two broad categories worth
exploring for regularities in multimodal meaning production, along with
language (i.e. verbal signs). The modes intervening in multimodal mes-
sage formation will very much depend on the nature of the multimodal
text itself, which can comprise verbal and visual signs (e.g. a poster
advertisement), verbal and aural signs (e.g. radio programme), visual and
aural signs (e.g. dance performance) or all three (e.g. film).
18 S. DICERTO
While the reference to Barthes’ work is an early one, it is important to

note that current research in multimodality is still consistent with his point
of view: for example, the recently released software for multimodal analysis
developed by a team led by Kay O’Halloran was created to help under-
stand ‘how linguistic, visual and audio resources function to attract atten-
tion and create particular views of the world’ (Multimodal Analysis
Company 2013: online, my emphasis), implicitly confirming that verbal,
visual and aural signs are the resources leading recipients towards a certain
interpretation of a multimodal text. As indicated in Chap. 1, multimodal
research looking at static multimodal texts is also consistent with Barthes’
categorisation, as its main focus is on investigating relationships between
visual and verbal content (Marsh and White 2003; Martinec and Salway
2005; Pastra 2008; Bateman 2014). Likewise, research on the translation
of multimodal texts has investigated topics such as visual-verbal references
in films (e.g. Baumgarten 2008), audiovisual translation and subtitling
(e.g. Díaz-Cintas and Remael 2007) and the application of systems of
multimodal transcription to translation with an emphasis on the visual,
aural and linguistic components of a text (e.g. Taylor 2003), also echoing
Barthes’ view. Research on multimodal translation acknowledges that a
multimodal text does not have to make use of all three semiotic modes,
but at the same time, differently from work on multimodality, it needs to
restrict its scope to multimodal texts that make use of language; as this
book deals with multimodal texts for translation purposes, only multi-
modal texts that make use of the verbal mode fall within its scope of
interest.
As language, images and sounds can be used to create texts indepen-
dent of each other (for instance, language-based documents, paintings
and instrumental music), they are three recognisably independent semi-
otic systems whose investigation is acknowledged by the relevant literature
to be fundamental to the understanding of multimodal texts. For the spe-
cific purpose of this study, therefore, the above definition of multimodal
text can be further specified as follows: ‘a text may be defined as multi-
modal when it combines at least two of the verbal, visual and/or aural
semiotic systems’.
As stated in Chap. 1, the ‘applied’ part of this study explores the appli-
cation of the proposed model by discussing static multimodal text case
studies, that is, texts with visual and verbal components only. Nevertheless,
as the model is ultimately intended to be potentially applicable to all types

of multimodal texts, including dynamic texts, theoretical provisions on the
aural system are also included in this work, to support the development of
a comprehensive approach. As the three systems are important for the
theoretical framework of this book, the following indicates what types of
signs are grouped in each category:
–– Verbal: signs belonging to oral and written language
–– Visual: visual signs other than language
–– Aural: aural signs other than language
By and large, it is possible to assign particular signs to the relevant semi-
otic system, that is, ‘mode’. However, the three categories are not neces-
sarily mutually exclusive or entirely clear-cut; examples of borderline cases
include written words that are part of an image, or have image-like quali-
ties, and instances of speech in which the person nearly growls, for exam-
ple, because of pain, making noises that are somewhere between verbal
and aural signs.
From a semiotic perspective, the verbal, visual and aural systems differ
in the way they convey meaning, and their differences are connected with
the nature of the signs that comprise them. One of the most well-known
taxonomies of signs (Peirce 1960: 2.247) divides them into three catego-
ries: symbols, icons and indices. To use the Saussurean terminology
(1916/1983), these ‘signifiers’ show different types of relationship with
the respective ‘signifieds’, namely, the concepts they represent. Drawing
on Chandler’s summary of the Peircean taxonomy, signs can be said to
have the following characteristics:
– Symbols have an arbitrary relation to their signifieds: they represent them

only because it has been conventionally established that they do. The con-
nection between a symbol and its signified does not necessarily have any
other basis than the convention that ties them together.
– Icons have a relation of resemblance to their signifieds, or they try to imi-
tate them in some way, possessing some of the qualities that are associated
with the signifieds themselves.
– Indices have a direct connection with their signifieds, either physically or
causally. They do not necessarily require a communicative intention as the
basis of their production, and they may just be the consequence of a natu-
ral phenomenon. (after Chandler 2007: 36–37)
20 S. DICERTO
Mapping this taxonomy on the three semiotic modes investigated in

this study, language is regarded as mostly symbolic (Peirce 1960: 2.292;
Chandler 2007: 38) as words have an arbitrary connection with their sig-
nifieds; visual communication is mainly iconic or indexical (Chandler
2007: 43), as images tend to establish relationships of resemblance or
causality with the objects they represent; sounds are mostly indexical, as
they are always in a causal relationship with the objects that produce them.
Verbal, visual and aural signifiers thus differ in the type of relationship they
create with their signifieds, the verbal having a conventionalised connec-
tion with them (due to their symbolic nature) and the visual and aural
showing some sort of similarity or contiguity to them because of their
iconicity/indexicality. In either of the latter cases, this is a more ‘direct’
relation than the mostly arbitrary symbolic one.
There are some notable and well-known exceptions to the general
trends described above: for example, onomatopoeias in language are
indexical in their attempt to imitate a sound connected to their meaning;
logos often have a purely arbitrary and symbolic relation with the com-
pany they represent; national anthems represent a nation again in an arbi-
trary and symbolic way that bears little or no relation to the means of
production of the sounds they are composed of. It also has to be high-
lighted that the Peircean taxonomy does not imply that a sign needs to
belong to one category only. As Jakobson points out, Peirce’s categorisa-
tion should be understood as a hierarchy of functions rather than a tidy
division (1968: 700).
In general, verbal, visual and aural signs play a different role in com-
munication, and are more or less suited to conveying information of a
diverse nature on the basis of the distinct relations they tend to establish
with their signifieds. Due to their characteristic features, it is likely that
they will be exploited according to which of them is best suited to the
establishment of the type of relation to the signified required by the com-
municative situation at hand. Support for this ‘principle of specialisation’
of the signs in terms of usage comes from manifold sources. To quote a
few examples:
–– Hagan supports the view that images specialize in concrete scenes.
Verbal signs, on the other hand, specialize in concrete statements,
questions, and demands (Hagan 2007: 51). Images and verbal con-
tent are, however, also said to differ in terms of the steadiness of the
meaning interpretation they trigger: images trigger a broader set of
potential interpretations, even for expert ‘readers’, whereas verbal

content is associated with more stable interpretations (Hagan 2007:
52).
–– Along similar lines, but this time from the perspective of the type of
sign, Aguiar and Queiroz (2010: online) claim that ‘icons play a
central role in sensory tasks since they are associated with the quali-
ties of objects. Thus, they are present in the sensorial recognition of
external stimuli of any modality, as well as in the cognitive relation of
analogy’.
–– From yet another perspective, namely that of the medium of verbal
communication, it has been argued that specialization seems to
operate even within the verbal system. Writing about the differences
between speaking and writing, Olson maintains that written tran-
scriptions tend to be underdeterminative of the illocutionary force
of a speech act because they are not capable of representing it com-
pletely, and in this sense ‘[t]he history of reading may be seen, in
part, as a series of attempts to recognize and to cope with what is not
represented in a script’ (1994: 93, italics in original). This claim
supports the view that not all the elements of a speech act related to
its illocutionary force can be represented by written language, and
thus implicitly that some of the meanings expressed by an oral state-
ment cannot be (entirely) represented in writing.
Summarising, the visual, verbal and aural semiotic systems—seen also
from different perspectives—are different in the way they connect to their
signifieds, in the type of meaning they are most suited to conveying and in
the stability of the interpretation they trigger, and they tend to be used for
different kinds of information. However, these different modes are often
found together, and they cooperate in message formation in spite of their
differences. As Marsh and White argue, ‘[t]he message is a wedding of
[its] components, and the interplay among all elements is a critical con-
cern to people who need to convey information effectively’ (2003: 1).
Similar claims can be found in other studies, such as the one by Durant
and Lambrou (2009: 40), in which the combined use of words, images
and sounds in a text is described as forming a superordinate ‘media
language’.
The different semiotic systems can nevertheless be analysed individu-
ally, as a precursor to any analysis of their interaction. Studies on the gram-
mar of language are a good example of how this type of analysis can
22 S. DICERTO
concentrate on specific features of one semiotic system only, exploring its

organisation and its relation to meaning under different aspects; whether
analysing individual modes separately can lead the recipient to understand
the full meaning of a multimodal message as intended by the sender, a
central concern of Pragmatics, is a different question. Kress notes how it
‘is now impossible to make sense of texts, even of their linguistic parts
alone, without having a clear idea of what […] other features might be
contributing to the meaning of a text’ (Kress 2000: 337, my emphasis).
Baldry and Thibault support this view, asserting that multimodal meaning
is not merely an addition of meanings coming from different semiotic
resources, but rather the result of their combination under what they call
the ‘resource integration principle’, according to which selected semiotic
resource systems produce meaning in combination with one another. It is
in this sense that Baldry and Thibault describe multimodal meaning as
multiplicative rather than additive (Baldry and Thibault 2005: 21).
Other features of the communicative situation external to the individ-
ual semiotic systems are also said to contribute to the meaning conveyed
by a text. More precisely, textual coherence, namely, the ability of a text to
be meaningful, is said to be actually ‘made’ by the reader, who makes sense
of a text in relation to its context (linguistic and non-linguistic) and their
knowledge of the world (Bublitz 1999: 2), foreshadowing Gutt’s concern
with the importance of cognitive context in translation (2000).
Having reviewed some key items of the literature on signs and their
organisation, it is possible to draw some interim conclusions in order to
make progress towards a new model for multimodal ST analysis for trans-
lation purposes. Based on the aspects reviewed thus far, the model must
take into account the fact that different semiotic systems—or aspects of
these systems—are likely to be exploited to convey different types of
meaning in source texts, and that they make different individual contribu-
tions to the text. These can be analysed separately in terms of the regulari-
ties around which the semiotic systems are organised, where possible. The
text, however, is not just a sum of these individual contributions, as noted,
but the result of their interactions within the text itself, as suggested by
Baldry and Thibault. Overall textual coherence will be achieved with the
reader’s ‘input’, relating the text to its context and their personal knowl-
edge of the world to come to an interpretation of the text. Therefore, a
comprehensive model for multimodal ST analysis needs to be multilayered
and to take into account all these different ‘dimensions’ of the text by:
–– Investigating in depth the verbal, visual and aural semiotic modes in

order to understand their individual organisation, their potential and
limitations;
–– Understanding how semiotic systems differing in their resources and
their specialisation cooperate in conveying multimodal meaning in
order to map their relationships;
–– Studying multimodal texts in terms of usage in context, including in
the analysis of a multimodal ST, the meaning contributed by the
recipient in relation to their knowledge of the world and the contex-
tual information they are aware of.
In this particular instance of a model intended for translation purposes,
these three dimensions of analysis will have to be investigated in terms of
how they are likely to influence possible translations of the multimodal
text.
In the following, we go on to consider literature on multimodality in
order to discuss interaction models for the various semiotic modes and the
meaning-making capabilities of each mode in order to address the first two
points above.
2.2 One Text, Many Semiotic Modes: Social

Semiotics
State-of-the-art research in multimodality has not yet established a model
suited to the study of how semiotic modes interact with each other to
form multimodal texts. This lack of a ‘general’ model has been noted by
Stöckl, who claims that ‘we seem to know more about the function of
individual modes than about how they interact and are organized in text
and discourse’ (2004: 10). While pointing out the gap in the literature,
Stöckl still maintains that there may be ‘overriding principles [that] gov-
ern and guide all modes simultaneously’ (2004: 25). Research in the field
has been developing towards a higher level of generality, with academic
attention being increasingly drawn towards the specific topic of the rela-
tionships between the verbal and the visual (see Sect. 2.1).
Two isolated attempts aimed at building a model of multimodal mean-
ing looking at potential overriding principles of multimodality, discussed
here in chronological order, come from Kress and Van Leeuwen (2001)
and Baldry and Thibault (2005).
24 S. DICERTO
2.2.1 Kress and Van Leeuwen (2001): Multimodal Discourse

The purpose of Kress and Van Leeuwen’s work is to ‘explore the common
principles behind multimodal communication […] [moving] away from
the idea that the different modes in multimodal texts have strictly bounded
and framed specialist tasks’ (2001: 1–2).
Kress and Van Leeuwen acknowledge the possibility of exploring these
common principles working out what they call different ‘grammars’ for
each semiotic mode (such as the ‘grammar’ of visual design), in order to
research their differences and their areas of overlap. Nevertheless, they
argue that a full picture of multimodality cannot be painted in ‘grammati-
cal’ terms only. Thus, they take a novel approach towards the problem,
setting aside a discussion of theoretical principles and focusing on the
product of the interaction of several semiotic resources in practice—
namely, in authentic multimodal texts. In so doing, they attempt to anal-
yse meaning production by dividing the practice of multimodal
communication into different ‘strata’, namely ‘domains of [multimodal]
practice’ (2001: 4). Their argument is that meaning is made in the strata
of a message, and the interaction of modes can happen in all or some of
these. Kress and Van Leeuwen’s basic distinction is between the strata of
content and expression, each showing two sub-strata: discourse and design as
sub-strata of content, and production and distribution as sub-strata of
expression. The purpose of the analysis is to show how meaning is ‘made’
in multimodal terms at these levels of articulation, and in what terms the
different semiotic resources can contribute to the meaning-making pro-
cess. Kress and Van Leeuwen’s framework of analysis can be summarised
as follows (2001: 4–8):
–– Content:
• Discourses are socially situated forms of knowledge about reality

providing information about a certain process or event, often along
with a set of interpretations and/or evaluative judgements.
• Design is the abstract conception of what semiotic resources to use
for the production of a message about a certain discourse.
–– Expression:
• Production is the material articulation of the message, the actual

realisation of design.
• Distribution is the ‘technical’ side of production and relates to the

actual means exploited for the articulation of the message, as the act
of distribution can convey or influence meaning.
The production of a text will then start with the individuation of its
discourse of reference: for example, a text on fashion would draw on the
social knowledge about fashion and contribute to the debate on that
aspect of our reality. Once the appropriate discourse is identified, there will
be a choice of what semiotic resources to use to convey the message; this
choice may be partly influenced by the genre(s) that are typically associ-
ated with the specific discourse of fashion. The choice of semiotic resources
is then enacted during the process of design, when the most adequate
channel(s) and/or material(s) to convey the message are selected. In this
example, the channel could be a fashion magazine and the materials may
include pictures of models and related items. The actual production of the
message will then happen in various steps (e.g. writing, editing, printing),
and the text so produced will be distributed. The aspect of distribution
addresses the circulation of the message, selecting at what time the mes-
sage should be sent out, to whom, how often it will be recirculated (if
ever) and other distributional details. Kress and Van Leeuwen argue that
each stratum contributes meaning, and notably, that each semiotic mode
can be more or less ‘in the spotlight’ at each stage.
In Kress and Van Leeuwen’s work, these concepts are considered in
multimodal terms to show that the simultaneous presence of different
semiotic resources influences meaning in the four strata of the multimodal
message; at the same time, their discussion of meaning production through
strata remains quite generic, as it does not go into the details of how
exactly different semiotic resources interact in the various domains of
practice and how the analysis of multimodal texts can be practically carried
out using their theoretical framework. The authors move directly from a
generic discussion of the strata to their practical application using a selec-
tion of texts. The practical angle of analysis, albeit interesting and novel,
produces findings that are applicable to the texts under consideration but
are difficult to generalise further in the absence of a detailed supporting
framework explaining how modes interact. The level of interaction
between semiotic resources remains under-researched in Kress and Van
Leeuwen’s work; this aspect of their theoretical proposal finds its justifica-
tion in the inadequacy of existing theoretical frameworks for semiotic
modes other than language:
26 S. DICERTO
[L]anguage is still the mode which is foregrounded in terms of the poten-

tials for analysis and critique, both in academic and in popular discussion,
while modes such as colour are not. The possibilities of gaining understand-
ing through forms of analysis are therefore readily available for language and
are less so, at this time, for, say, colour. (Kress and Van Leeuwen 2001: 34)
Kress and Van Leeuwen’s work does good service providing evidence
that an integrated multimodal analysis is possible, but at the same time, it
does not venture into a systematic description of the common principles
of multimodality; their work supports indirectly the existence of regulari-
ties in multimodal communication, but by and large, it does not set out
generalisable principles of multimodal interaction in relation to meaning.
Their main contribution seems to be to indicate that the simultaneous use
of specific semiotic systems can be analysed in the context of particular
dimensions of practice, in which genre-based differences can be realised.
This approach certainly offers a way of looking at the multimodal phe-
nomenon as the product of ‘layers’ of meaning. At the same time, it does
not provide the systematic guidance on how to understand the interaction
between different modes which was laid out as a prerequisite for building
a model for multimodal analysis in Sect. 2.1.
2.2.2 Baldry and Thibault (2005): Multimodal Transcription

and Text Analysis
The work by Baldry and Thibault, on the other hand, acknowledges from
the very start the possibility that a single theoretical framework may not be
able to adequately describe how multimodal meaning is conveyed; at the
same time, however, it pledges to ‘analyse [multimodal texts] through
detailed multimodal transcription […] linked to the notion of multimodal
grammar and a scalar approach to multimodal meaning making that is
designed to explore the organisation of multimodal texts in terms of dif-
ferent levels’ (2005: 1). With their contribution, Baldry and Thibault pro-
duce a framework meant to describe general multimodal principles
applicable to multimodal texts, ‘seek[ing] to reveal the multimodal basis of
a text’s meaning in a systematic rather than an ad hoc way’ (2005: 21,
emphasis in the original).
Their framework pivots around two main theoretical constructs. The
first is the concept of ‘cluster’, which refers to a local grouping of items
and is used in particular in the analysis of static texts (2005: 31) to indicate
that two or more signs from different modes form part of the same unit of
meaning due to their proximity and are therefore to be analysed together.
The second is the concept of ‘phase’—namely, a time-based grouping of
items ‘codeployed in a consistent way over a given stretch of text’ (Baldry
and Thibault 2005: 47). The concept of ‘phase’ is used to signal that two
or more signs from different modes are to be analysed in conjunction due
to their simultaneous or near-simultaneous use. This idea is mostly appli-
cable to dynamic texts, as these show a development over time. The idea
of ‘phase’ is taken from Gregory’s work on phasal analysis, in which phases
are described as characterising ‘stretches within discourse […] exhibiting
their own significant and distinctive consistency and congruity’ (2002:
321).
While certainly spatial proximity and chronological co-deployment of
the signs contribute to the meaning of a multimodal text, these relation-
ships only explain a small part of the meaningful interaction between semi-
otic modes. For example, Baldry and Thibault’s multimodal transcription
scheme has been applied to the translation of multimodal texts by Taylor
(2003); however, in Taylor’s work the practical support offered by multi-
modal transcription to two subtitling case studies is limited to choices
concerning the timing of the subtitles (‘spotting’, tied to the concept of
phase) and to suggestions on how to edit down text in subtitles to erase
information already provided by other modes, tied to the concept of clus-
ter. Baldry and Thibault’s model offers little guidance in terms of the
contextual challenges that could be faced by the subtitler, and it only
partly explains the interaction between signs from different modes.
Although this framework of multimodal transcription and text analysis
represents a contribution to studies in multimodality, it seems to fall short
of its ambitious goal to reveal the multimodal basis of a text’s meaning for
two main reasons. Firstly, Baldry and Thibault hint at the influence con-
text exerts on the meaning of any text, but the idea of multimodal mean-
ing being influenced by the circumstances in which the text is produced is
not pursued in their framework, leaving that area of multimodal meaning
unaccounted for. Secondly, the relationship between semiotic modes is
somewhat underdeveloped, as Baldry and Thibault’s work only offers
tools to analyse the spatial proximity and simultaneity of deployment of
resources from different semiotic systems. Reducing the analysis of the
interaction between signs from different semiotic modes to an analysis of
their co-deployment, either in space or in time, appears rather reductive of
the full range of relationships between signs. For example, it is true that
28 S. DICERTO
the meaning of a sentence can be informed by the relative position of

words and clauses: the place occupied in a sentence may well tell us that a
word is a subject, or that a clause is subordinate. At the same time, the
seminal work by Halliday (1985/1994) has highlighted how clauses not
only show relationships of taxis (i.e. main/subordinate), but also logico-
semantic relationships. These links mean that a clause may, for example,
expand on the meaning of another, or enhance its content by virtue of its
own contribution. Critically, Halliday’s discussion of logico-semantic rela-
tionships between clauses is not grounded in matters of relative position
(or, more in general, co-deployment), but on considerations of logic and
meaning; therefore, clauses interact with each other by means of their rela-
tive position, but also form logical links with one another that make a
significant contribution to meaning. As this is true of clause relationships,
it seems reasonable to explore the possibility that signs from different
semiotic modes could also have logical relationships with each other going
beyond simple coexistence or positioning in the same text, a possibility
that Baldry and Thibault’s framework does not specifically address.
2.2.3 Taking Stock
The problem of individuating general principles explaining how meaning
is conveyed by multimodal texts has not found a fully satisfactory solution
to date through ‘top-down’ approaches such as the ones described above.
As discussed, Kress and Van Leeuwen, as well as Baldry and Thibault,
realise the necessity of using an approach that analyses various layers of
meaning-making in multimodal texts, albeit proposing different views on
how this approach should be systematised. As argued here, Kress and Van
Leeuwen do not engage with principles of interaction between semiotic
modes, and Baldry and Thibault do so purely from a point of view of spa-
tial and chronological co-deployment. Such studies are important steps
towards a more complete view of multimodality, but it is important to
acknowledge that their limitations do not allow them to build a compre-
hensive picture of how multimodal meaning is conveyed and therefore
how to approach the analysis of multimodal texts for translation.
Approaching the problem ‘bottom-up’ by looking at specific types of
multimodal interaction to identify possible homogeneous traits across
modes that can be further generalised does not seem particularly fruitful
either. In such studies, so far researchers have needed to face the issue of
diversity in potential for analysis among the various modes; as noted by
Kress and Van Leeuwen, non-linguistic modes have received far less atten-
tion than language in academic and professional discussion fora. This has
led to analytical resources being widely available for language and less so
for other modes, resulting in comparisons across modes being difficult to
conduct due to the lack of analytical tools (see, e.g. the work by Norris
(2004) on the topic of multimodal interaction and by Machin (2007) on
the interplay between images and language). This issue means that a ‘bot-
tom-up’ approach based on current knowledge might well produce unbal-
anced analytical frameworks focused on the analysis of one semiotic system,
which might work to explain particular types of multimodal interaction
without grasping the overarching communicative principle of multimodal
texts.
Nevertheless, studies on multimodality are growing in number, and
research on the regularities of the single modes is growing, too. Before
Kress and Van Leeuwen’s proposal on multimodal discourse, analytical
frameworks for non-linguistic semiotic resources (some of which are out-
lined below) had already been gaining ground. The dominance of lan-
guage over all the other semiotic modes has arguably started to fade in the
last decades, during which studies on aural and visual meaning have started
attracting the academic spotlight (e.g. Kress and Van Leeuwen 2006 on
visual meaning and Van Leeuwen 1999 on aural meaning). This perhaps
follows the realisation that modern means of communication, that is,
‘new’ media such as the internet, nowadays allow a growing number of
users to create messages that contain much more than ‘just’ words.
If a form of communication exists, the assumption is that it is likely to
involve the existence of a shared code allowing users to communicate with
each other; different forms of communication are thus likely to require
different forms of ‘literacy’. On this basis, research has been carried out to
establish if some sort of ‘grammar’ for semiotic modes other than lan-
guage exists, given their growing importance in a number of modern
genres (and, consequently, their translation). The word ‘grammar’ is used
to suggest a parallel between the formal mechanisms of language and
those of other semiotic systems, without this implying a similarity in these
regularities or the assumption that an analysis of these regularities should
be carried out by similar means.
Following this assumption, Kress and Van Leeuwen (1996/2006)
studied and proposed what they call a ‘grammar of images’, following
Halliday’s definition of grammar as ‘a means of representing patterns of
experience’ (1985: 101). In order not to create confusion between their
30 S. DICERTO
usage of the word ‘grammar’ and its traditional language-based meaning,

their approach to the investigation of the regularities of images starts out
by setting the boundaries of what their project shares or does not share
with Linguistics. In their view, the visual domain should not be analysed
in terms of syntax or semantics, and their purpose is not to look for more
or less direct equivalents to verbs and nouns; each semiotic system in their
view has a different organisation, with only partial overlaps, and care
should be taken in drawing unwarranted parallelisms between the mechan-
ics of each mode. Nevertheless, these forms and mechanisms should still
allow a semiotic system to perform the three metafunctions that Halliday
assigned to language, and which Kress and Van Leeuwen believe can be
generalised further to apply to the visual world as well. The three well-
known metafunctions described in Halliday’s work (1985/1994) are:
–– Ideational, which deals with aspects of human perception and con-
sists in the ability to represent human experience;
–– Interpersonal, which deals with creating a relation between pro-
ducer and receiver of the sign;
–– Textual, which deals with the ability of a text to be internally and
externally coherent through the establishment of logical ties.
In their work, Kress and Van Leeuwen discuss the means by which
images perform these metafunctions, also making comparisons with the
means used by language. For example, they discuss how images can show
concrete scenes, being able to represent human experience with a high
degree of detail (ideational metafunction); features such as perspective can
be used to create a relationship with the recipient, by making them ‘part
of the scene’ (interpersonal metafunction); finally, visual characteristics
such as composition (viz, the position occupied by the different elements
in an image) can be used for the rendition of the active and passive voice
of verbs and to influence coherence within the text (textual
metafunction).
Even though they are both able to perform the three metafunctions,
language and images represent different things by different means. Kress
and Van Leeuwen’s discussion of ‘narrative structures’ in images (and in
images in comparison with verbal texts) highlights this aspect very clearly
(2006: 59–78): the type of narration images can perform efficiently, they
argue, mostly involves physical action; as stated by Kress and Van Leeuwen,
‘[m]ental processes form […] only a minor category in the visual semiotic;
[…] there are no structural devices for making the strong distinction
between “cognition” and “affection” processes’ (2006: 77). On the basis

of Kress and Van Leeuwen’s work, images and language can be said to
suggest meaning by different means, in accordance with our earlier discus-
sion in Sect. 2.1 above. Even though they are both capable of representing
human experience, engaging with the audience and producing a coherent
text, they are different in terms of how this happens, of what kind of
meaning they can represent and of the level of detail and focus they can
achieve. Summarising Kress and Van Leeuwen’s claim, the visual mode has
more limited representational capabilities than the verbal mode, but its
ability to perform the Hallidayan metafunctions still sets it apart as a mode
with a narrative potential independent of other modes and capable of per-
forming with the highest degree of detail the narration of physical action,
being at the same time less suited than language to describing cognitive
processes.
Machin’s discussion of Kress and Van Leeuwen’s work offers a detailed
explanation about the differences in meaning-making between language
and images; nevertheless, Machin reaches the conclusion that, contrary to
what Kress and Van Leeuwen claim, the visual mode cannot satisfy the
requisites set by Halliday for a ‘complex semiotic system’, and hence its
regularities cannot be properly called a ‘grammar’ (2007: 159–188).
While this may sound like a purely terminological problem, we will come
back to this point and its importance in the multimodal debate in Chap. 4
(Sect. 4.3).
As for the aural mode, Van Leeuwen (1999) outlines an attempt to
integrate speech, music and other types of sound in order to describe what
can be communicated with sound and how this can be interpreted. As
stated earlier, only a brief account of the meaning-making capabilities of
the aural mode is provided here primarily for purposes of completeness
and to better frame the complexity of multimodal matters; texts including
sounds are, as already noted, not the specific focus of this study.
Van Leeuwen interprets his work as something that ‘should describe
sound as a semiotic resource offering its users a rich array of semiotic
choices’ (1999: 6). In spite of this undoubtedly rich array, Van Leeuwen
himself admits that the representational possibilities of sound are different
from those of images and language (and somewhat more restricted): non-
verbal sounds are generated by actions and can only represent actions—in
his own words, sound messages only have ‘verbs’ (1999: 93).
Although lacking a full ‘grammar’, sound in Van Leeuwen’s opinion
still shows regularities suggesting meaning. He claims, for example, that
32 S. DICERTO
the perceived distance of sounds can be used as a means for foregrounding

meaning: sound dubbing, for instance, is based on a tripartite approach
that divides the sonic information contained in a soundtrack into three
‘zones’—close, middle and far distance. These correspond respectively to
the most important information, the information that corresponds to the
listener’s social world and the one that corresponds to their physical world.
The most significant information that needs to be conveyed is placed in
the close distance, whereas the other two zones are used for less important
sounds, which are mostly meant to establish the setting for the sounds in
close proximity. Van Leeuwen also examines other types of regularities in
sound; their detailed description, however, would go beyond the scope of
this study. In Van Leeuwen’s view, sound is not structured as well as images
or language along the lines of Halliday’s metafunctions. In his opinion,
sound as a semiotic mode has a different ‘metafunctional configuration’
(1999: 190) from the others. In fact, this absence of structure is so evident
in Van Leeuwen’s view as to lead him to claim that sound ‘is not, or not
yet, a “mode”, and it has therefore not or not yet reached the levels of
abstraction and functional structuration that (written) language and image
have reached […]. Sound is a “medium” or perhaps part (already) “mode”,
part (still) “medium”’ (1999: 192).
Van Leeuwen claims that those modes that do not have a certain level
of recognisable organisation and abstraction are somewhat comparable to
‘[c]hildren’s meaning-making’, being ‘flexible, unsystematic, not yet for-
malized’ (1999: 192). Accordingly, organisation can be taken as one of
the defining characteristics of a mode for it to be considered as such.
However, while the very designation ‘semiotic system’ implies a potential
for analysis and abstraction, it seems hardly possible at this stage to define
what level of organisation a semiotic system needs to achieve for it to be
considered as such and to justify this choice on a semiotic basis. The
required level of organisation seems in Van Leeuwen’s view to be also con-
nected to meaning-making ability: the most organised mode—namely,
language—is able to represent all sorts of human experience, though with
varying degrees of detail; the visual mode, which has achieved some form
of organisation, albeit not as complete as that which characterises lan-
guage, specialises in some meanings, but is still able to build a narrative
structure and perform the three metafunctions; the least organised of the
three semiotic modes analysed—namely, sound—can only represent or in
some cases suggest, actions and feelings, thus showing evident meaning-
making limitations that put its status of ‘mode’ into doubt.
The difference in the level of organisation of these three semiotic modes

could be just an ‘evolutionistic’ matter due to the dominance of language
over the other modes, which has meant that there was no possibility for
the others to receive adequate attention and to develop to their full poten-
tial; however, it may also be an intrinsic characteristic of the visual and
aural modes that they cannot possibly in any analysis be found to have or be
able to acquire as complex an organisation as language. What all three
modes share seems to be no more than their semiotic nature—namely, the
ability to convey meanings through signs. However, these signs are pro-
duced and brought to the recipient’s attention by means of different tech-
niques to achieve communicative effects that are only partly similar, and
they build different types of relation with their signifieds (as discussed in
Sect. 2.1), being more or less suited to different communicative
situations.
2.3 Beyond the ‘Multimodal Code’

To summarise, the existence of a ‘multimodal code’, an overarching set of
principles coordinating multimodal interaction as envisioned by Stöckl
(2004), is not demonstrated by the literature generated to date, in spite of
attempts towards general frameworks for multimodal analysis. A bottom-
up approach based on the analysis of any common regularities among
modes does not seem to be possible at this stage either, given the very
different characteristics of each semiotic mode and the differences in their
potential for analysis described by Kress and Van Leeuwen.
However, the use of different semiotic modes in combination with each
other is something humans have practised now for millennia, from early
tribal songs combining instruments and voice to communication in the
digital era. Since none of the modes can be said to fulfil all the require-
ments of human communication in all communicative situations, even just
because of the different level of detail the different modes can achieve and
their different specialisation, the need for combining modes arises sponta-
neously. The multimodal message so composed draws on the semiotic
resources available from the different modes, whose degree of organisa-
tion for communicative purposes varies greatly. Yet, semiotic modes
can clearly interact efficiently with each other for the production of even
very complex messages (e.g. in films, which combine visual, aural and
verbal signs to generate articulate narratives), and humans seem to be
instinctively quite proficient in the use of combinatory techniques that
34 S. DICERTO
allow these meanings to be conveyed through multimodal textual organ-

isation. Likewise, translators have to date been largely operating at an
intuitive level in interpreting and translating multimodal texts, as their
analytical understanding of multimodality has had a weak basis (if any at
all).
The lack of common ground in the organisation of the modes and the
impossibility of dealing with them homogeneously suggest that the ‘com-
municational glue’ that binds together the different systems, the overarch-
ing multimodal principle, is to be sought elsewhere. It is here that
pragmatics enters the analysis of multimodal texts.
References
Aguiar, D., & Queiroz, J. (2010). Modeling Intersemiotic Translation: Notes
Toward a Peircean Approach [online]. Available at: http://french.chass.uto-
ronto.ca/as-sa/ASSA-No24/Article6en.htm. Last accessed 26 June 2017.
Anstey, M., & Bull, G. (2010). Helping Teachers to Explore Multimodal Texts.
Curriculum Leadership [online]. Available at: http://cmslive.curriculum.edu.
au/leader/default.asp?id=31522&issueID=12141. Last accessed 26 June
2017.
London: Equinox.
Bublitz, W. (1999). Introduction: Views of Coherence. In W. Bublitz, U. Lenk, &
E. Ventola (Eds.), Coherence in Spoken and Written Discourse. How to Create It
and How to Describe It (pp. 1–7). Amsterdam/Philadelphia: Benjamins.
Chandler, D. (2007). Semiotics: The Basics. New York: Routledge.
Durant, A., & Lambrou, M. (2009). Language and Media: A Resource Book for
Students. New York: Routledge.
Gregory, M. (2002). Phasal Analysis Within Communication Linguistics: Two
Contrastive Discourses. In P. H. Fries, M. Cummings, D. Lockwood, &
W. Sprueill (Eds.), Relations and Functions in Language and Discourse
(pp. 316–345). London: Continuum.
Hagan, M. S. (2007). Visual/Verbal Collaboration in Print: Complementary
Differences, Necessary Ties, and Untapped Rhetorical Opportunity. Written
Communication, 24(1), 49–83.
Halliday, M. A. K. (1985/1994). An Introduction to Functional Grammar (2nd
ed.). London: Edward Arnold.
Halliday, M. A. K., & Webster, J. J. (2003). On Language and Linguistics. London:
Continuum.
Kress, G. (2000). Multimodality: Challenges to Thinking About Language.
TESOL Quarterly, 34(2), 337–340.
University Press.
Multimodal Analysis Company. (2013). Concept [online]. Available at: http://
multimodal-analysis.com/about/concept/. Last accessed 16 June 2017.
London: Routledge.
Olson, D. R. (1994). The World on Paper: The Conceptual and Cognitive
Implications of Writing and Reading. Cambridge: Cambridge University Press.
Saussure, F. (1916/1983). Course in General Linguistics (R. Harris, Trans.).
London: Duckworth.
Stöckl, H. (2004). In Between Modes: Language and Image in Printed Media. In
Van Leeuwen, T. (2006). Typographic Meaning. Visual Communication, 4,
137–143.
CHAPTER 3
Multimodal Meaning in Context: Pragmatics
Abstract This chapter continues the journey through multimodal mean-

ing dealing with pragmatics and how this discipline can contribute to the
study of multimodal communication. In this context, literature on the
relationship between multimodality and pragmatics is presented and dis-
cussed (Sect. 3.1) with particular reference to major pragmatic theories
such as Grice’s theory of cooperativeness and Sperber and Wilson’s
Relevance Theory. This chapter also references and analyses pre-existing
work on pragmatics and translation studies and investigates the connec-
tions with multimodal issues found in this literature (Sect. 3.2).
Keywords Pragmatics • Relevance theory • Sperber and Wilson

• Explicature • Implicature • Gutt • Interpretive resemblance
Until recently, multimodality and pragmatics have tended to be unrelated

subjects, as pragmatics, the branch of linguistics that studies meaning in
context, has traditionally mostly been concerned with the study of the
spoken word. More recent literature has witnessed a number of scholars
calling for the development of new interdisciplinary studies using
cognitive-pragmatic approaches to improve our understanding of multi-
modal matters (see Mubenga 2009; Clark 2011; Braun 2016). This chap-
ter presents and discusses literature on the relationship between

https://doi.org/10.1007/978-3-319-69344-6_3
38 S. DICERTO
multimodality and pragmatics (Sect. 3.1), also referencing work on prag-

matics and translation studies and the connections with multimodal issues
found in this literature (Sect. 3.2).
3.1 Cooperativeness, Relevance and Multimodality

The interest currently shown by scholars in expanding the application
of pragmatics to embrace non-verbal aspects of communication should
not be seen as surprising. As pointed out by Orlebar, the signs and
symbols composing a multimodal text often have a polysemic nature,
which is disambiguated by the recipient according to their personal
interpretation of these signs and symbols and the cultural background
of the recipient themselves (2009: online), namely, according to con-
textual factors. If only for this reason, contextual interpretation—and
therefore pragmatic processes—plays a role in the comprehension of
multimodal texts that needs to be acknowledged. But to what extent
can pragmatics contribute to a model of multimodal meaning for trans-
lation purposes?
Currently, pragmatics is still widely applied in order to analyse what a
speaker means by their utterances in a specific context (‘speaker’s mean-
ing’). This probably happens on a practical rather than theoretical basis,
there not being a theoretical reason why pragmatics should not apply to
written texts; indeed, these are communicative acts whose interpretation
has a pragmatic component, in that they are interpreted in light of their
context of reference (e.g. political cartoons are seen in the light of political
events and states of affairs), and whose information structure has long
been understood by some scholars as a pragmatic matter (e.g. Thompson
1978).
A first, more ‘practical’ reason for this analytical preference might be
that brief talk exchanges in the context in which they happen are the ideal
‘work material’ for arguing pragmatic points, since their analysis is less
time-consuming and more focused than the analysis of, for example, a
lengthy novel. Second, and from a more theoretical point of view, talk
exchanges can also be easily related to the shared cognitive environment
of the speakers, namely, ‘the set of facts that are manifest to [them]’
(Sperber and Wilson 1995: 39). This is paramount in order to study mean-
ing in context, since the knowledge that participants contribute to a com-
munication is fundamental for the application of pragmatic principles.
MULTIMODAL MEANING IN CONTEXT: PRAGMATICS 39
Individuating the shared cognitive environment can be a very different

task in the case of oral and written texts, since these are different systems
of linguistic expression. As aptly summarised by Crystal (2010: 187),
[t]he differences of structure and use between spoken and written language
are inevitable, because they are the product of radically different kinds of
communicative situation. Speech is time-bound, dynamic, transient – part of
an interaction in which, typically, both participants are present, and the
speaker has a specific addressee (or group of addressees) in mind. Writing is
space-bound, static, permanent – the result of a situation in which, typically,
the producer is distant from the recipient – and, often, may not even know
who the recipient is (as with most literature). Writing can only occasionally
be thought of as an ‘interaction’ in the same way as speech […].
As the relationship between writer and reader is characterised by a tem-

poral and/or spatial mismatch, it can be very difficult to individuate a
shared cognitive environment to use as the basis for pragmatic analysis.
The writer may be no longer living at the time when a reader is faced with
their message, or reader and writer may be contemporaries but belong to
different cultures. As evidenced by Jucker and Smith (1995), even family
members who live in the same culture, in the same period, and are suppos-
edly in an intimate relationship continuously renegotiate their common
ground in conversation for them to be able to communicate efficiently.
Since this is not possible in the writer-reader relationship, it is legitimate
to wonder how much information reader and writer can reasonably be
assumed to share and how we might come to establish this with any cer-
tainty. The problem in establishing the extent of the shared cognitive envi-
ronment between writer and reader appears to be enough of a reason why
pragmaticians might shy away from written texts.
To help pragmatics approach the realm of the written word, Cooren
(2008) has rather controversially proposed the introduction of the notion
of textual agency. Cooren’s argument is that textual entities have an ability
to produce speech acts independently of human actors, and that they
should then be considered as agents per se, capable of affecting the sur-
rounding world. To illustrate this point, he calls on several everyday
expressions, like ‘the recipe calls for…’, ‘the signature commits you to…’,
‘the manual suggests to…’, in which agency is more or less consciously
attributed to the text rather than to the author of the text itself (although
this is widely dependent on the language used, and may not be applicable
40 S. DICERTO
for texts in languages that do not normally attribute agency to the subjects
of certain verbs). Through textual agency, pragmatics could be used to
analyse a text based on the text-reader shared cognitive environment
rather than the writer-reader shared cognitive environment. This would
bring ‘speaker’ and reader together in the same place, at the same time
eliminating de facto some of the aforementioned problems relating to
temporal/spatial mismatch between writer and reader.
From a practical point of view, the notion of textual agency intends to
bring text and reader close together. However, the text still only makes
sense in relation to its author, the context in which it was produced, the
purpose for which it was produced and the audience addressed by the
writer, and these are elements that cannot be ignored in a textual analysis.
Therefore, the notion of textual agency does not bridge the shared envi-
ronment gap for analytical purposes, and is not upheld by this study. As
Hyland (2005) states, writers compose their messages with a reader-
oriented focus, aiming to guide them towards a certain action or under-
standing; the text is therefore not an independent entity, but rather a tool
created by its writer as the carrier of that intention. Also, assigning agency
to texts clashes with the view of coherence supported by this study, which
sees coherence as only text-based (as opposed to text-inherent); the source
of coherence is the user who interprets the text in connection with ‘the
linguistic context, the socio-cultural environment, the valid communica-
tive principles and maxims and the interpreter’s encyclopedic knowledge’
(Bublitz 1999: 2), rather than the text itself. This is all the more important
in a study on translation, where differences between the context of the ST
and that of the TT are a major concern.
The point of view proposed in this study, then, is that there is no need
for a notional ‘patch’ that would allow pragmatics to be applied to written
texts, as Cooren has argued: these texts consist of verbal content designed
to carry a ‘writer’s meaning’ whose interpretation by the reader is possible
and is partly dependent on usage in context. Thus, pragmatics can and
should be applied to written texts. However, suggesting such an approach
means coming to terms with its inherent limitations: for the reasons previ-
ously mentioned, establishing the shared cognitive environment between
writer and reader is far more problematic than for speaker and hearer, and
this represents a difficulty that needs to be acknowledged. From the point
of view of an external observer of the writer-reader communication, the
readership of a certain text is made up of a group of people whose indi-
vidual cognitive environments are unknown for the most part and not
necessarily homogeneous; similarly, information on the writer’s cognitive

environment may be only partly available, if at all, as the author of a text
could be in some cases unknown. Therefore, a pragmatic approach to the
study of written texts can be proposed with the proviso that the shared
cognitive environment can only be inferred and only to a certain extent.
Mostly, the inference will be based on what the writer manifestly knows
(because they have written it, or their writing implies it) and what they
expect their readership to know (inferring it from the background knowl-
edge on which their text is based and which is required for its understand-
ing). Limited availability of information about the author’s cognitive
environment is, in any case, the operational context in which any reader
faces any text trying to understand the writer’s meaning. This is already
acknowledged in the Socratic dialogues:
Writing, Phaedrus, has this strange quality, and is very like painting; for the
creatures of painting stand like living beings, but if one asks them a ques-
tion, they preserve a solemn silence. And so it is with written words; you
might think they spoke as if they had intelligence, but if you question them,
wishing to know about their sayings, they always say only one and the same
thing. And every word, when once it is written, is bandied about, alike
among those who understand and those who have no interest in it, and it
knows not to whom to speak or not to speak; when ill-treated or unjustly
reviled it always needs its father to help it; for it has no power to protect or
help itself. (Plato, Phaedrus: 275d–e)
Given that writer and reader often do not communicate in any way
external to the written words of a message, a writer will write being aware
that the knowledge available to the audience who will process the message
is limited. This influences the composition of the message itself in the
form of a conscious effort to make it understandable ‘as is’ (referred to as
‘recipient design’ by Sacks and Schegloff 1979). As written texts may be
the sole form of communication between writer and reader (e.g. advertise-
ments are often a one-way channel between a company and potential cus-
tomers), multimodal content can be advantageous in their composition: as
seen previously in this chapter, semiotic modes other than the verbal can
provide information that language would convey perhaps less efficiently or
with more difficulty (e.g. the way an object looks), helping the message
sender guide potential recipients towards their intended interpretation
more quickly and precisely. Therefore, in my view, the tendency of some
42 S. DICERTO
message senders to communicate multimodally should be seen as a prag-

matic strategy to cope with the lack of a substantial shared cognitive envi-
ronment with their recipients, by including in the text elements that would
be considered contextual to a talk exchange—for example, images.
On the other hand, the question is still open about how the contribu-
tions by non-verbal semiotic modes to the communicative act could be
accounted for by a pragmatic model. Even though applications of prag-
matics to multimodal texts are not common in the literature, some
attempts in this direction have been made: most notably, the work by Yus
Ramos (1998) uses pragmatics as the basis for a taxonomy of the commu-
nicative situations that arise in verbal-visual media discourse, while Tanaka
(1999) applies it to the analysis of advertisements in Britain and Japan.
While their work has a different focus than the one adopted in this study,
it does confirm that it is possible to apply pragmatics to multimodality. It
is my belief that the application of pragmatics to multimodal issues is not
only possible—rather, it is the key to the world of multimodality.
Sperber and Wilson’s work on Relevance Theory (1986/1995), one of
the two most important pragmatic frameworks alongside Grice’s (1989),
was originally designed to account for linguistic communication.
Nevertheless, the design of Relevance Theory (henceforth RT) is compat-
ible with its applicability to other semiotic modes, and in what follows, I
argue for its application to multimodal texts.
Sperber and Wilson maintain that, from Aristotle to Grice, language
was only explained in terms of its code, and communication was thought
to be achieved by encoding and decoding messages; the Gricean thought
brought a revolution into linguistics, arguing that language was actually
inferential and communication is achieved by producing and interpreting
evidence (Sperber and Wilson 1995: 2). In fact, Sperber and Wilson sup-
port the view that language is a complex form of communication in which
these two realities can and do coexist: language must be interpreted by
means of a coding/decoding process, but this is subservient to the infer-
ential activity through which we understand the intentions of the speaker
(1995: 27). Thus, interpreting an utterance in a language requires knowl-
edge of the code, but also the inferential abilities that will allow the hearer
to make sense of it in context to access the speaker’s intention, and RT was
developed to account for this ‘duality’ of linguistic communication.
By its nature, a multimodal text will include signs from more than one
mode, each of which has a code of its own. Chandler has successfully
argued that signs of any nature are to be interpreted in terms of both code
and usage in context (2007: 8), exactly as observed by Sperber and Wilson
for language. Therefore, it can be concluded that the same (de)coding and
inferential processes apply to linguistic as well as other types of sign and
that in this sense the way we communicate through language is no differ-
ent than through other modes. As Sperber and Wilson’s theory was devel-
oped based on coding and inferentiality as the two ‘gears’ of communication,
RT is a good candidate for application to the interpretation of texts includ-
ing a variety of types of signs.
It must also be noted that, in a multimodal text, each mode finds its
most immediate context of reference in the other mode(s), and this influ-
ences the usage of each semiotic system. Since information can be drawn
from different sources, in multimodal texts, the message communicated
by a single mode is incomplete without the remaining information, as a
mode can rely on the other to express what it has left unexpressed or to
enhance its meaning: usage in context—and hence pragmatics—becomes
then a key factor in multimodality, possibly even more so than in ‘mono-
modal’ texts.
Nevertheless, RT is not the only pragmatic approach available, and sug-
gesting an application of pragmatics to multimodality means having to
face a choice about which theoretical framework should be applied. The
two major pragmatic approaches to meaning explanation are the Gricean
model on cooperativeness (developed in 1967, and subsequently elabo-
rated by Levinson 1983, among others) and the already mentioned RT, a
relevance-based framework first published by Sperber and Wilson in 1986.
While differing in the view they take towards the principle that informs
the inferential system, the two approaches show an array of commonali-
ties; arguably, this still does not allow for the identification of enough
common ground to proceed to an analysis that uses either approach inter-
changeably without significant variations. Whether communication is ana-
lysed in terms of cooperativeness or in terms of relevance, both frameworks
are meant to cover the meaning of utterances in context as a whole, and
from a strictly practical point of view, they differ mainly in what they
ascribe to explicit or implicit meaning production. However, the focus
here is not on the pragmatic question of whether certain meanings are
explicitly or implicitly communicated, but rather on the overall meaning
assigned by the audience to a text and how this can be dealt with by a
translator operating in yet another cognitive environment.
According to both Grice and Sperber and Wilson, inferentiality is the
principle that informs communication itself and is at the basis of any
44 S. DICERTO
interpretation of a communicative act. Sperber and Wilson justify their

view of communication as mainly inferential suggesting that, since the
purpose of communication is the recognition of intentions, analysing it in
terms of the code(s) used and not in terms of the inferential activity that
links those codes to the recognition of intentions would mean taking a
narrow view of communicative matters. They support this statement with
real-life examples of communication, saying that even when a message is
encoded in an incorrect way (e.g. because of the speaker’s inability to
encode it correctly, which is the case for children and foreigners), the
recipient tends to apply an automatic correction insofar as the speaker’s
intention is anyway evidently recognisable. If the overriding principle of
communication were a code, an error in its use would prevent the message
from going through at all.
The interpretation of utterances is, thus, first and foremost inferential.
Indeed, Sperber and Wilson regard the use of codes as widespread, repre-
senting the vast majority of communicative acts (a view also supported by
Searle 1969: 38), but they also acknowledge that the use of a code is not
strictly necessary for communication. When looking at spoken texts, then,
we should do so from a pragmatic point of view and ask ourselves what
intention we can recognise in the speaker putting together that message.
Communication is said in RT to involve the production of ‘stimuli’,
namely, ‘any modification of the physical environment designed by a com-
municator to be perceived by an audience and used as evidence of the
communicator’s intentions’ (Sperber and Wilson 1995: 29). Stimuli,
which for reasons of simplicity we might equate to texts, are seen in RT as
conveying an informative intention (inform the audience of something)
and a communicative intention (inform the audience of the informative
intention). While, as mentioned above, RT is meant for linguistic com-
munication, any communicative act satisfies the above definition; Sperber
and Wilson’s perspective on communication could then be extended to
include not only linguistic texts, but multimodal texts as well. This view of
communication, however, is not original to Sperber and Wilson and goes
back to Grice, who applies the term ‘utterance’ to what Sperber and
Wilson call ‘stimulus’ (1995: 29). Multimodal texts are certainly “utter-
ances” (Grice) or “stimuli” (Sperber and Wilson), in either case, eligible
for pragmatic analysis. Pragmatics can, then, be applied to multimodal
texts without enlarging its theoretical scope. This view is also supported by
Forceville, who states that the very generality of the claim by Sperber and
Wilson that the principle of relevance is essential to explain human
communication (Sperber and Wilson 1995: vii) implies applications

beyond the realm of verbal communication (Forceville 1996: 83).
Although differing in their approach to explaining meaning, RT and
Gricean Pragmatics both stress the concept of understanding the speaker’s
intentions as the ultimate goal of the hearer; disagreement only arises in
how these intentions are mapped by the two theories. Grice claims that:
our talk exchanges do not normally consist of a succession of disconnected

remarks, and would not be rational if they did. They are characteristically, to
some degree at least, cooperative efforts; and each participant recognizes in
them, to some extent, a common purpose or set of purposes, or at least a mutu-
ally accepted direction. (Grice 1989: 45, my emphasis)
Grice’s cooperative principle is then explained in more detail with a set of

communicational and well-known maxims, which Grice claims to be the
frame of reference in conversation.
1. Maxim of Quality
Try to make your contribution one that is true
2. Maxim of Quantity
Make your contribution as informative as is required
3. Maxim of Relation
Be relevant
4. Maxim of Manner
Be perspicuous.
(Grice 1989: 26–27)
According to Grice, these maxims are, however, not always followed in

conversation; their voluntary violation does not mean a breakdown in
communication, but rather it produces different types of implied meaning
(implicatures) in communication. For example, in a conversation like
A: How are you and Emma doing?

B: Beautiful day, isn’t it?
B’s irrelevant answer to A’s question should not be construed as the

speaker’s lack of cooperativeness; rather, B is trying to communicate his
unwillingness to discuss his relationship with Emma without having to do
so explicitly. This could be due to a variety of reasons, for example, social
norms preventing B from being explicit about his personal preference to
46 S. DICERTO
leave the topic aside as far as possible. Regardless of B’s rationale for his
communicative choice, A finds herself in a position where her question has
received what at first seems like an irrelevant answer; however, instead of
perhaps choosing to reiterate the question to obtain the desired informa-
tion, A may try to reconcile the violation of the maxim of relation with an
assumption that B is trying to be cooperative and that therefore his answer
is designed to be relevant to the communicative exchange—this will allow A to
get to B’s intention to communicate by means of an implicature, that is, an
additional meaning communicated by an utterance in light of its context.
Grice’s view on communication hints at a clear-cut division between
the roles of semantics and pragmatics in language comprehension: when
the semantic content of an utterance does not obey the conversational
maxims, an inferential analysis aimed at reconciling the semantic meaning
with the cooperative principle is needed. During this inferential analysis,
the hearer may be able to identify additional meanings (implicatures)
which will help them understand the speaker’s meaning.
Grice’s ‘division of labour’ has been attacked and criticised on many an
occasion, even by scholars supporting a neo-Gricean approach. It is nowa-
days widely accepted that Grice’s theory of communication implies a few
inner contradictions, most notably what Levinson calls ‘Grice’s circle’
(2000: 186). Levinson sets out to demonstrate that Grice’s claim about
the division between semantics and pragmatics is not tenable: inferential
processes of the same type as the ones that determine implicatures are
involved in the determination of ‘what is said’, which Grice says semantics
should account for. For example, pronouns (and, in general, deictics) are
part of ‘what is said’, but the determination of their reference is context-
bound and inferential, in the same way as implicatures are. Levinson
attempts to resolve this inner contradiction of the Gricean theory by sug-
gesting two levels of intervention of inferential processes in meaning
detection, as summarised by Fig. 3.1.
This line of thought still sees meaning interpretation as a step-by-step
process starting with the semantic representation of an utterance and end-
ing with the determination of the speaker’s meaning by inferential means.
The division between the role played by semantics and pragmatics is no
longer clear-cut, as inferential processes intervene in the semantic inter-
pretation, but meaning analysis as a whole is still seen as progressive and
made up of sequential steps.
A different view is offered by Sperber and Wilson. As anticipated, they
oppose the Gricean and neo-Gricean view that communication is ruled by
a principle of cooperativeness and a set of maxims that accompany it, claiming
Fig. 3.1 Meaning detection scheme, after Levinson (2000: 188)
that communication is based on a principle of relevance. Although rele-

vance is one of the maxims proposed by Grice, for Sperber and Wilson,
relevance is not a maxim; rather, according to RT, no communication can
exist without relevance, and therefore relevance cannot be violated. What
is worth noting is that while relevance cannot be violated, communication
according to RT happens through attempts to reach a high level of rele-
vance; indeed, Sperber and Wilson claim not only that any utterance is
necessarily designed to be relevant but also that any utterance carries the
implicit assumption of its own optimal relevance. This means that any
message we produce gets to the recipient with the implication that we
48 S. DICERTO
have put it in the form we thought was optimal for the recipient to process
in order to obtain information about what we wanted to communicate
with it. In RT terms, our message will make use of stimuli (i.e. utterances
in the case of a linguistic message), which communicate at an explicit level;
information that is not explicitly communicated by the stimuli (like B’s
unwillingness to discuss his relationship with Emma in the example above)
is regarded as implicit and termed an implicature. However, in contrast to
Levinson’s viewpoint, the information derived from deictics and pronouns
is not considered as implicitly communicated, as these elements of the
message still form part of the stimulus and do not provide information
different than the stimulus itself. The information they convey is neverthe-
less the result of an inferential process, and therefore they are considered
as a separate type of explicit meaning called explicatures.
To sum up the differences between the two major theories in pragmat-
ics, the Gricean model suggests a division of labour in which ‘what is said’
prepares the ground for ‘what is implicated’. In this view, ‘what is said’
roughly corresponds to the semantic meaning of an utterance, while impli-
catures are its inferential components. Modern developments of Grice’s
theory, such as Levinson’s work, have conceded that inferential processes
intervene in the semantic interpretation (e.g. when we need to assign a
reference to a pronoun) and that the information that can be derived from
these processes is not to be considered as part of the explicit content of the
utterance, that is, it is implicit. In Sperber and Wilson’s view, although the
retrieval of the information connected to processes such as reference
assignment happens through inferential means, the role of this informa-
tion has no other function than to ‘flesh out’ the blueprint delivered by
grammar (Blakemore 1992: 59). For this reason, RT sees these processes
as forming part of what is explicitly communicated by an utterance, called
by Blakemore the ‘proposition expressed’ (1992: 65–90). In RT, implica-
tures only account for additional meanings not stated in the semantic
content.1
1
It must be acknowledged that the division between explicit and implicated meaning is not
agreed upon by all pragmaticians: Bach’s contribution to this debate was to propose the idea
of a third category of meaning falling between the two sides of the ‘pragmatic fence’, namely,
the category of impliciture (1994). In Bach’s view, developments of the logical form of an
utterance are not to be considered as part of either the explicit or the implicated meaning,
falling somewhere in between: these elements of meaning are not uttered (and hence not said
in a Gricean view), but at the same time, they do not contribute any additional proposition
either, thus not qualifying fully as either explicit or implicit meaning.
An example of a brief conversation between A and B can be helpful to

further clarify the differences in how these accounts have so far been
applied to the spoken discourse, in order to prepare for a discussion of
how pragmatics is applied to multimodal texts in the next chapter.
A: Shall we get started?

B: Well, he’s not going to be here before half an hour.
In this case, a Gricean approach would suggest that to come to an

understanding of this talk exchange, we should first be aware of its seman-
tic content. The deictics ‘he’, ‘we’ and ‘here’, whose reference is known to
A and B, are part of the semantic content of the utterance, and they are
considered as part of ‘what is said’. In this case, B’s response is apparently
irrelevant to A’s remark, in that it is not a direct answer to the question,
and hence it violates Grice’s maxim of relation. If B’s remark is to be made
relevant, it must carry an implicature: B’s statement is actually a sugges-
tion implicating that A and B should not wait for ‘him’ to join them and
that they should get started on their task without the third person.
Levinson would map the meaning of this exchange in a different way:
while the above-mentioned deictics are part of the semantic meaning, the
assignment of their reference is an inferential, context-bound process;
based on the inferential nature of the reference, deictics should be treated
as implicatures, not as part of what is said. The implicature related to the
irrelevance of B’s remark is still considered an implicature, albeit of a dif-
ferent nature than the one related to the deictics.
Sperber and Wilson would dissect the meaning of this exchange in yet
another way. The semantic meaning staying the same, Sperber and Wilson
would claim that even though the reference of deictics is derived through
inferential processes, the information these references provide does not
constitute a meaning additional to the utterance itself, and is therefore to
be considered an explicature. The implicature related to B’s remark will be
retrieved by A in her effort to understand the relevance of B’s statement:
if B has decided to reply the way he did, it means that this form is the one
he considers as most relevant for A’s question to be answered. Table 3.1
summarises the differences among these approaches to meaning.
Given the difference between the Gricean and the RT approach to dis-
secting meaning, discussed in detail by Carston in its many aspects (2004),
a suitable criterion needs to be found in order to select the most appropri-
ate model for the stated purpose, that is, the analysis of multimodal texts.
50 S. DICERTO
Table 3.1 Pragmatic approaches to meaning, after Levinson (2000: 195)

Author Semantic Deictic and Minimal Enriched Additional
representation reference proposition proposition propositions
resolution
Grice (1967) What is said Implicatures

Levinson What is said
(2000)
The coded Implicatures
Sperber and Logical form Explicatures Implicatures
Wilson (1986)
Propositional form
Grice’s cooperative principle and its maxims seem to be quite well suited
for the analysis of stimuli that come in the form of utterances. However,
when talking about static texts, and of multimodal texts as a sub-category
of the static type, it is difficult to imagine how a principle of cooperative-
ness would work between a sender and a recipient who do not have a
chance to interact, for the reasons of spatial and temporal mismatch con-
sidered previously. How could a sender be cooperative with an audience
with whom they cannot have a contemporaneous dialogic interaction, and
who is likely to approach the text from different backgrounds, with differ-
ent cognitive environments, and perhaps at different historical moments?
This is even more so the case when translation is added to the mix.
Also, as considered in Sect. 2.2, not all semiotic modes have the same
levels of organisation, which means that in some cases, there might not be
a clear code, a standard against which a stimulus can be compared in order
to individuate possible communicative violations. The types of stimuli at
play in a multimodal text are many more than those in single utterances or
talk exchanges, and a Gricean model may therefore not be easily applicable
to the type of analysis attempted here. The information non-linguistic
signs provide might be difficult to quantify and/or qualify in terms of its
truth value or clarity. It could, however, be relevant or irrelevant to each
message recipient depending on their personal interpretation.
In Sperber and Wilson’s work, it is possible to find an enlightening
example in this sense:
For example, Peter asks Mary, […] How are you feeling today?
Mary responds by pulling a bottle of aspirin out of her bag and showing
it to him. Her behaviour is not coded: there is no rule or convention which
says that displaying a bottle of aspirin means that one is not feeling well.
(Sperber and Wilson 1995: 25, my emphasis)
The information conveyed by displaying a bottle of aspirin is difficult to

qualify and quantify; also, analysing this stimulus in terms of possible vio-
lation of one or more maxims would be problematic, given the absence of
a code the stimulus can be judged against. For the same reason, account-
ing for other gestures, images or sounds contained in a multimodal mes-
sage could be problematic in terms of a theory based on conversational
maxims. Also, from a more practical point of view, it must be acknowl-
edged that the Gricean view presents an inherent difficulty, namely, the
possibly endless multiplication of the conversational maxims: after Grice,
many scholars felt that the conversational maxims as outlined by Grice did
not account for all the implicatures produced in conversation and
attempted to complete them by adding more.2
Rather than assuming a communication based on an alleged coopera-
tiveness among individuals mostly unknown to each other and unlikely to
be able to interact, an RT approach places emphasis on the recipient;
indeed, they will use the perceived relevance of the text as the principle
informing their interpretation activity, without necessarily assuming that
the message sender is being ‘cooperative’ and without relying on a set of
conversational ‘maxims’ whose applicability to non-linguistic modes is far
from proven, given the lack of clear communicative standards in non-
linguistic modes. In this view, the multimodal message sender can draw on
varied and inhomogeneous textual resources to get their intentions across,
according to what best suits their communicational needs, irrespective of
whether the material they intend to use as a textual resource has a code to
support its usage in communication or not, so long as they believe that this
is the optimally relevant communicative choice.
The most viable and logically justifiable approach to multimodality in
the context of this study, then, is a relevance theoretic one. Clark (2011)
claims that both RT and multimodal discourse could benefit from some
form of synergy, as the former could be expanded through its application
to the latter, and the latter could be better understood, thanks to the for-
mer, suggesting this as a desirable path for the development of the two
topics. The significance of pragmatics in terms of the analysis of multi-
modal meaning, and hence the support it can offer to multimodal transla-
tion activities, has, however, not been investigated in depth before this
2
Leech (1983), for example, discusses maxims related to tact and politeness: debate about
these additional maxims is still open, as the much more recent article by Pfister (2010)
demonstrates.
52 S. DICERTO
study, which builds on Clark’s suggestion, taking it in a direction that also

includes translation studies.
The next section therefore deals with pragmatics and translation stud-
ies. Even though the literature on this topic is somewhat scarce, it is worth
investigating to understand what points of contact can be built, or are
already available, between the considerations made this far on the subject
of pragmatics applied to multimodality and the existing work on pragmat-
ics applied to translation studies.
3.2 Pragmatics and Translation: Understanding

and Rendering Meaning
In the previous section, it was argued that a pragmatic approach can be

applied to the analysis of multimodal texts, which can be considered as
communicative stimuli and addressed as such from an RT viewpoint to
understand how they convey meaning (explicitly and implicitly). The link
between a pragmatic analysis of multimodal STs and their translation, on
the other hand, is yet to be fully addressed. The first step to be taken in
this direction is to discuss the pre-existing links between pragmatics and
translation studies. Those scholars who have investigated the matter seem
to consider the connection between the two disciplines almost self-evident.
Kitis, for example, discusses how she has
often read […] that in recent theories of translation linguistics is left behind.
This may be right if by the term ‘linguistics’ one understands the more for-
mal or syntactic oriented theories […]. However, it cannot be more untrue
if one considers pragmatics […] as that component of linguistics that can
greatly inform, and has indeed informed, various approaches in translation
studies. (2009: 64)
As Kitis says, the literature on translation models has certainly drawn on

pragmatic approaches (e.g. Nord 2005; Baker 2011); however, her claim
that pragmatics has played a more extensive role in informing approaches
in translation studies than other areas of linguistics seems debatable, as
pragmatics has not usually had a prominent role in the formation of theo-
retical frameworks for translation (with one notable exception, i.e. the
work of Gutt 2000). Nevertheless, other authors have also supported the
view that pragmatics has had a greater influence on translation than other
areas of linguistics. According to Hatim and Mason, for example, the
influence of pragmatic issues on translation and on the translator is

pervasive:
In most cases, the translator, as a receiver of ST but not specifically an

addressee (in the sense of the intended receiver of ST), is an observer of the
text-world environment of ST. The role of the translator as reader is then
one of constructing a model of the intended meaning of the ST and of form-
ing judgements about the probable impact of the ST on intended receivers.
As a text producer, the translator operates in a different socio-cultural envi-
ronment, seeking to reproduce his or her interpretation of ‘speaker mean-
ing’ in such a way as to achieve the intended effects on TT readers. (Hatim
and Mason 1990: 91–92, my italics)
To be able to reproduce his or her interpretation of a ST for a target

audience, according to Hatim and Mason, a translator must be familiar at
some level with pragmatic concepts. The so-called speaker’s meaning (or
rather, writer’s meaning in the case of written STs) is therefore considered
as a key notion for the activity of translation; in their view, however, the
pragmatic intervention in the translation activity is not only limited to the
‘passive’ comprehension of the writer’s meaning in a text. The purpose of
the translator is primarily to produce a target text conveying their interpre-
tation of the writer’s meaning. To reach this goal, the translator needs to
take into account ‘several layers of problematic areas’ (Bernardo 2010:
107). In defining these problematic areas, Bernardo follows Beaugrande
and Dressler’s model of textuality, identifying these as text type, text genre
and textuality features (intentionality, acceptability, situationality, informa-
tivity, coherence, cohesion and intertextuality), which she analyses from a
pragmatic standpoint.
Being able to deal with these characteristics of a text, understand them
and reproduce them is what Bernardo calls ‘textual competence’, which she
claims ‘can only be achieved if the translator is trained to interpret the syn-
tactic and semantic marks in the source text from a pragmatic point of
view’, as ‘the communicative value, which involves both the semantic and
pragmatic meaning with all its allusions, symbolism and connotations, must
be kept homologous’ (Bernardo 2010: 114–115). A similar view from a
different angle is held by Kitis, who claims that the translator ‘must be fully
aware of the many layers that envelop each linguistic event, that is, language
use, which feed into its interpretation simultaneously’ (Kitis 2009: 70).
However, Kitis’ claim that the purpose of translation is to create a TT
54 S. DICERTO
reflecting the same cognitive world as the original (2009: 70) is hardly ten-
able, even just because of the temporal and/or spatial mismatch between
the ST and the TT (see also Sect. 3.1). Nevertheless, Kitis’s view of texts as
instances of communication embedded in a series of contextual and cultural
layers supports the idea that translation should be seen mainly from a prag-
matic point of view, although her comments are confined to language.
Both Bernardo and Hatim and Mason argue in favour of seeing transla-
tion as a pragmatic process. Guidelines on the scope and boundaries of
such a pragmatic approach are provided by Hickey, who maintains that a
thorough pragmatic framework for translation should
attempt to explain translation – procedure, process and product – from the

point of view of what is (potentially) done by the original author in or by the
text, what is (potentially) done in the translation as a response to the origi-
nal, how and why it is done in that way in that context. (Hickey 1998: 4)
Hickey does not suggest that a pragmatic approach would provide

definitive answers for translation; rather, it would try to explain how a TT
relates to its ST and how it constitutes a response to the original commu-
nicative act and the logical basis for the organisation of a TT according to
certain criteria.
The only fully developed attempt in this sense has been carried out by
Gutt (2000), whose aim was to use RT to produce a general theory of
translation. Somehow, by his own admission, he both failed and succeeded
in his attempt. In his preface he says that he ‘had expected that relevance
theory would help […] to formulate a general theory of translation.
However, within a year it became increasingly clear that relevance theory
alone is adequate – there seems to be no need for a distinct general transla-
tion theory’ (Gutt 2000: vii).
Gutt’s suggestion is that translation should be looked at as a communi-
cative process following the view of communication as a mainly inferential
process provided by RT. Accordingly, Gutt sees translation as a case of
interpretive use of language and more specifically of interlingual interpre-
tive use. This is a concept based on the notion of interpretive resemblance
expounded by Wilson and Sperber:
Two propositional forms P and Q (and by extension, two thoughts or utter-

ances with P and Q as their propositional forms) interpretively resemble one
another in a context C to the extent that they share their analytic and con-
textual implications in the context C. (1988: 138, italics in original)
In Gutt’s work, to be called a translation of the original, a text must

interpretively resemble its ST, namely, share its analytic and contextual
implications. However, original and translation do not share a context,
given that the purpose of translation itself is to produce a TT to allow
users from a different linguaculture to access the content of the ST. Since
ST and TT are assumed not to share a context, Gutt’s argument is that a
translation should interpretively resemble its original in its own context,
bringing about similar contextual effects in return for a similar level of
processing effort on the side of the audience.
The notion of interpretive resemblance is the closest notion to that of
equivalence in translation that Gutt claims can be maintained. Interpretive
resemblance in translation is not determined by considerations of style,
form or adherence to the original, but rather by considerations of rele-
vance. In Gutt’s own words (2000: 106),
[i]n interpretive use the principle of relevance comes across as a presumption

of optimal resemblance: what the report intends to convey is (a) presumed
to interpretively resemble the original […] and (b) the resemblance it shows
is to be consistent with the presumption of optimal relevance.
These conditions seem to provide translators with some leeway, as they do

not impose any sort of rule as to how the conditions themselves should be
satisfied or any sort of strict hierarchy about the different elements of the
ST that translators should strive to reproduce. Translators, according to
Gutt, should have relevance as a guiding principle: their choices are and
always will be context-dependent, and hence, it will not be possible to
individuate specific textual factors or even less a hierarchy of textual factors
which should determine the translator’s choices. Gutt’s theory may be
somewhat intimidating for translators who, following this approach,
would not be able to rely on a precise hierarchy of textual factors helping
them in their choices. However, partly in order to address this concern,
Gutt discusses different stylistic elements, and how these can be seen from
an RT point of view, to help translators gain an understanding of individ-
ual translation challenges from an RT perspective:
[T]he following important points can be drawn out. Firstly, the translator
must be seen and must see himself clearly as a communicator addressing the
receptor language audience: whatever his view of translation […], he always
has an informative intention which the translated text is to convey to the
receptor language audience. […] Secondly, […] whatever he does in his
56 S. DICERTO
translation matters primarily not because it agrees with or violates some

principle or theory of translation, but because of the causal interdependence
of cognitive environment, stimulus and interpretation. […] Thirdly, since
the phenomena of translation can be accounted for by this general theory of
ostensive-inferential communication, there is no need to develop a separate
theory of translation. The success or failure of translations, like that of other
instances of ostensive-inferential communication, depends causally on con-
sistency with the principle of relevance. (Gutt 2000: 199)
If this is the perspective from which all translation should be consid-

ered, there does not appear to be any reason why this should not be the
case for the translation of multimodal texts as well. A relevance theoretic
account of translation represents a promising approach, as the perspective
it adopts is founded in communication theory and in what we know about
how human communication works.
Considering that Sperber and Wilson’s theory, as previously observed,
does not exclude the analysis of stimuli other than language, it could rea-
sonably be assumed that Gutt’s application of RT to translation could
apply to texts whose composite nature comprises different sorts of stimuli.
Even though Gutt’s attention is not focused on multimodal texts, his
work incidentally mentions non-verbal communication, which he acknowl-
edges also works on an inferential basis:
Of course there is a difference between non-verbal and verbal or linguistic

communication, but this difference lies not in the presence or absence of
inference, but rather in the degree of explicitness which the stimulus can
achieve. (Gutt 2000: 25)
If semiotic resources other than language also communicate inferen-

tially, it can reasonably be expected that explicit and implicit meanings can
be conveyed by those, too, following the model provided by Sperber and
Wilson, as will be seen in detail in the next chapter. The presence of more
than one semiotic mode in multimodal texts and the ability of modes to
communicate both explicitly and implicitly constitutes at the same time
extra opportunities and extra challenges in the translator’s ‘mission’ to
make sure her ST interpretively resembles its ST.
Following Gutt’s work (2000), the notion of interpretive resemblance
is based on a set of analytic and contextual attributes which the ST and TT
need to share. Relevance will therefore have to be the guiding principle
that leads translators in the selection of the analytic and contextual attri-
butes a TT needs to feature in order to interpretively resemble its ST.
The strategies employed by the translator for the production of a mul-
timodal TT and any required reorganisation of the text will therefore
depend on their ability to achieve optimal interpretive resemblance. Any
constraints that might hinder this ability need to be taken into account,
and they might result in the need to reorganise the TT’s explicit and
implicit content. Therefore, the notion of interpretive resemblance will
constitute the frame of reference for how the translator manages the indi-
vidual textual resources and their interaction.
Moving into the next chapter, Gutt’s notion of interpretive resem-
blance is used as the cornerstone to develop the model for multimodal text
analysis along three different dimensions, reflecting the need for the trans-
lator to address the analytic and contextual attributes of the text in her
work. As the pragmatic view adopted in this work is central to the model
hereby proposed, the first section further develops the idea introduced in
Sect. 3.1 of using RT to account for multimodal communication and sets
the boundaries and scope of the pragmatic dimension of the model. Then,
existing frameworks aiming to explain the logico-semantic relationships
that can be created between visual and verbal content are discussed as a
second step towards the full development of the model, to support trans-
lators in understanding such relationships in the ST and modelling them
in the TT. Lastly, the considerations made so far concerning the qualities
of the different individual modes and on communication through sign
systems are integrated into the model to discuss the use of individual tex-
tual resources according to different communicative purposes.
References
Bach, K. (1994). Conversational Impliciture. Mind & Language, 9(2), 124–162.
Bernardo, A. M. (2010). Translation as Text Transfer—Pragmatic Implications.
Estudos Linguisticos / Linguistic Studies, 5, 107–115.
Blakemore, D. (1992). Understanding Utterances: An Introduction to Pragmatics.
Oxford: Blackwell.
Braun, S. (2016). The Importance of Being Relevant? A Cognitive-Pragmatic
Framework for Conceptualising Audiovisual Translation. Target, 28(2),
302–313.
58 S. DICERTO

Carston, R. (2004). Relevance Theory and the Saying/Implicating Distinction. In
L. Horn & G. Ward (Eds.), The Handbook of Pragmatics (pp. 633–656).
Oxford: Blackwell.
Cooren, F. (2008). Between Semiotics and Pragmatics: Opening Language Studies
to Textual Agency. Journal of Pragmatics, 40, 1–16.
Crystal. (2010). The Cambridge Encyclopedia of Language. Cambridge: Cambridge
University Press.
Forceville, C. (1996). Pictorial Metaphor in Advertising. London: Routledge.
University Press.
Hatim, B., & Mason, I. (1990). Discourse and the Translator. London: Longman.
Hickey, L. (1998). The Pragmatics of Translation. Clevedon: Multilingual Matters.
Hyland, K. (2005). Metadiscourse: Exploring Interaction in Writing. London:
Continuum.
Jucker, A. H., & Smith, S. W. (1995). Explicit and Implicit Ways of Enhancing
Common Ground in Conversations. Pragmatics, 6(1), 1–18.
Kitis, E. (2009). The Pragmatic Infrastructure of Translation. Tradução &
Comunicação, 18, 63–85.
Leech, J. (1983). Principles of Pragmatics. London: Longman.
Levinson, S. C. (1983). Pragmatics. Cambridge: Cambridge University Press.
Levinson, S. C. (2000). Presumptive Meanings: The Theory of Generalized
Conversational Implicature. Cambridge: MIT Press.
Nord, C. (2005). Text Analysis in Translation: Theory, Methodology and Didactic
Application of a Model for Translation-Oriented Text Analysis (2nd ed.).
Amsterdam/New York: Rodopi.
Orlebar, J. (2009). Understanding Media Language. MediaEdu [online]. Available
at: http://media.edusites.co.uk/article/understanding-media-language/.
Last accessed 26 June 2017.
Pfister, J. (2010). Is There a Need for a Maxim of Politeness? Journal of Pragmatics,
42, 1266–1282.
Sacks, H., & Schegloff, E. A. (1979). Two Preferences in the Organization of
Reference to Persons in Conversation and Their Interaction. In G. Psathas
(Ed.), Everyday Language: Studies in Ethnomethodology (pp. 15–21). New York:
Irvington.
Searle, J. (1969). Speech Acts. Cambridge: Cambridge University Press.
Tanaka, K. (1999). Advertising Language: A Pragmatic Approach to Advertisements
in Britain and Japan. London: Routledge.
Thompson, S. (1978). Modern English from a Typological Point of View: Some
Implications of the Function of Word Order. Linguistische Berichte, 54, 19–35.
Wilson, D., & Sperber, D. (1988). Representation and Relevance. In R. M.
Kempson (Ed.), Mental Representations: The Interface Between Language and
Reality. Cambridge: Cambridge University Press.
Yus Ramos, F. (1998). Relevance Theory and Media Discourse: A Verbal-Visual
Model of Communication. Poetics, 25, 293–309.
CHAPTER 4
Analysing Multimodal Source Texts

for Translation: A Proposal
Abstract The approach used in this chapter first sets the most general
dimension of analysis, and subsequently identifies more specific dimen-
sions, looking into more detail at the various components of a ST and
their interaction. A pragmatic perspective is adopted as the most general
dimension of multimodal analysis; indeed, the ultimate purpose of a mul-
timodal message recipient is to recognise the sender’s intention.
Multimodal meaning is therefore analysed according to the distinction
between explicit and implicit meaning in Relevance Theory. The second
dimension of analysis concerns the visual-verbal relations and how these
contribute to multimodal message formation. Finally, the third dimension
of analysis regards the meaning of individual modes.
Keywords Text analysis • ST analysis • Multimodal translation • Visual-

verbal relations • Optimal relevance • Interpretive resemblance
The proposed model for translation-oriented multimodal text analysis is

developed in this chapter progressing from the most general to the most
specific dimension of meaning in multimodal texts discussed so far in
Chaps. 2 and 3. Accordingly, it first elaborates on the relevance-theoretic
approach to multimodal meaning considered in the previous chapter,
adopting this as the pivot around which the model revolves and discussing

https://doi.org/10.1007/978-3-319-69344-6_4
62 S. DICERTO
in detail the application of RT to multimodal meaning for translation pur-

poses (Sect. 4.1). Following this general analytical dimension, more spe-
cific dimensions are identified, that is, the meaningful interaction among
modes (Sect. 4.2) and the meaning contributed by individual modes (with
specific reference to the visual and verbal modes, Sect. 4.3). These more
specific dimensions are elaborated upon by ensuring that they are consis-
tent with the more general relevance-theoretic orientation.
Section 4.1 sets the general boundaries for the scope of application of the
model. As noted, in the light of the literature reviewed in Chap. 3, a prag-
matic perspective is adopted as the most general dimension of multimodal
analysis; indeed, the ultimate purpose of a multimodal message recipient is to
recognise the sender’s intentions, inferring them from the text and its rela-
tionship with its context. Following this pragmatic perspective, multimodal
meaning is categorised at the most general analytical dimension in terms of
explicit and implicit meaning, looking at how the explicitly and implicitly
communicated message contributes to leading the recipient towards the rec-
ognition of the sender’s intentions. The explicit/implicit categorisation
adopted is that outlined by Sperber and Wilson (1995) and in particular its
application to translation by Gutt (2000), as discussed in Chap. 3.
Within the area of explicit meaning, Sect. 4.2 deals with the second
dimension of analysis; this examines the relationships between visual and
verbal content, building on two frameworks of analysis for such relations
(Martinec and Salway 2005; Pastra 2008). The two frameworks are dis-
cussed, further developed and integrated to return a picture of the multi-
modal textual organisation reflecting the meaningful interaction among
textual resources. This is done in a way which is coherent with the prag-
matic viewpoint and its general scope, ensuring that the framework for
visual-verbal relations resulting from the elaboration of these proposals
and the view it proposes fit well within the larger pragmatic picture.
The third dimension of analysis also forms part of the realm of explicit
meaning and is discussed in Sect. 4.3. As seen in previous chapters, a
pragmatics-based model meant to account for multimodal meaning can-
not analyse only contextual issues and the relations between visual and
verbal elements, but should also consider the meaning carried by the indi-
vidual modes. The observations made in Sect. 2.2 about different modes
and their ability to convey distinct types of meaning are particularly rele-
vant to this dimension of analysis. Given once more the requirements set
by the pragmatic viewpoint adopted here, these observations are exam-
ined through a pragmatic lens.
ANALYSING MULTIMODAL SOURCE TEXTS FOR TRANSLATION: A PROPOSAL 63
The resulting model for analysis of a multimodal ST is hence formed of

three dimensions of increasing specificity:
–– Multimodal pragmatic meaning (Sect. 4.1)
–– Interaction of the modes (Sect. 4.2)
–– Meaning in individual modes (Sect. 4.3)
Such a complex scalar model, made up of three different yet interre-
lated dimensions, requires for its application a transcription tool capable of
organising the results of a textual analysis in a clear, orderly and accessible
fashion suitable for its purpose of translation-oriented ST analysis. This is
dealt with in Sect. 4.4. The model will need to be able to account for the
development of the text in space and time, enabling the user to allocate
textual resources and their related meaning(s) at the correct point of tex-
tual development. Therefore, the concepts of cluster and phase (Baldry and
Thibault 2005) are used to underpin the transcription of multimodal texts
and the representation of the results in the three analytical dimensions.
The model so constructed supports an analytical method that brings
together the content of the individual dimensions, looking at how these
are interrelated and what this means in terms of general text interpreta-
tion; the analytical process aims to return a comprehensive, detailed and
organised ‘map’ of the perceived meaning of a text in its context of refer-
ence. This map serves a double purpose: during its drafting, it is a self-
reflection tool guiding the user in raising awareness of their intuitive
interpretation of the text, as they pinpoint the salient textual and contex-
tual elements used in their interpretation process; in its developed form,
the textual map can be used as a starting point to:
–– Identify potential translation issues;
–– Ascribe them to one or more specific analytical dimensions (i.e. the
role of individual resources, their interaction or inferential
meaning);
–– Estimate the likely impact of these issues on the dimension to which
they belong, the other analytical dimensions and the general inter-
pretation of the TT in its context of reference;
–– Consider potential strategies that could be adopted to resolve those
issues in the TT, also estimating the impact of any potential changes
on the textual organisation in the various dimensions.
As mentioned above, the model is first and foremost an analytical tool,
finding its natural application in the work of researchers requiring a tool
64 S. DICERTO
for the systematic analysis of multimodal texts and their translation,

regardless of text type. The model’s scope of applicability could include
research on localisation, AVT, and any other areas within or outside of
translation studies dealing with a systematic investigation of multimodal
meaning. While its application to the everyday work of translators would
not be time-efficient, the model has potential as a teaching tool for the
training of new translators, in that it would support tutors in raising aware-
ness of the complex organisation of multimodal texts, of their manifold
communicative possibilities, of the potential challenges they can present,
of the translation solutions that might help overcome those challenges and
of their effect on the communicative organisation of multimodal TTs.
4.1 Multimodal Pragmatic Meaning

The pragmatic dimension of analysis, being the most general dimension of
the model, is the first one to be discussed in order to set the model’s own
boundaries. In order to model the pragmatic dimension of analysis, it is
first of all important to recap the basic principles of RT and expand on
these, discussing in detail their application to the particular case of multi-
modal texts.
Arguably the most important principle outlined by Sperber and Wilson
is that messages are communicated with a presumption of optimal rele-
vance, namely, the producer of a message wants the recipient to believe
that the form in which s/he decided to convey the message is worth pro-
cessing and it is the most cost-effective for the recipient to access the
sender’s intention (1995: 270).
The application of this principle to multimodal messages is no different
from its application to monomodal messages (or their respective transla-
tions). The choice on the part of the message producer to ‘build’ the mes-
sage as multimodal comes with the inherent presumption that this is the
optimal form for the message to take, for the intended explicit and implicit
meanings to be derived from it by the message receiver against a process-
ing effort deemed reasonable for the quality and quantity of information
conveyed. However, this presumption of optimal relevance operates in a
multimodal message at a different level.
A monomodal message, such as a verbal message, is an instantiation of
a specific mode (and any associated code, in this case language). As
previously observed, Sperber and Wilson claim that meaning in language
is derived both from code analysis (e.g. grammar) and inferential activity.
The existence of a grammar is what provides the end receiver with guid-
ance on how the individual words of a linguistic message are to be inter-
preted when put together and it informs the inferential activity. On the
other hand, as previously argued, multimodal messages are the product of
the interaction between elements from different modes, some of which
might not have a grammar (see Sect. 2.2 regarding the visual and verbal
modes).
If these messages were dealt with in the same way as monomodal ones,
each mode would be dealt with separately in terms of its identifiable regu-
larities (regardless of whether these can be defined as amounting to a
‘grammar’), its usage in context and the inferential information that can
be derived from it (as if it were a monomodal text in its own right). As
already observed in Sect. 2.2, however, a separate analysis of the different
modes in a multimodal text does not satisfactorily account for the message
conveyed, as this is to be interpreted as a single entity. The presence of
multiple modes influences the meaning which each suggests to the receiver,
and the information they provide separately is often partial, different or
even irrelevant to the audience if the multimodal interaction is not anal-
ysed. These elements can be compared to cogs in a clock: their presence is
as important as their interaction in terms of the overall functioning of the
clock, and analysing each of them separately is unlikely to determine
whether the clock works and what time it marks.
However, if it is true that no set of rules connects items belonging to
different modes, it is legitimate to wonder why end receivers consider the
different components of a multimodal message as part of a single textual
unit, how the unit’s internal coherence is perceived and analysed and what
types of connections exist between signs from different modes. It is my
claim that what makes the end receiver analyse the content of the various
modes as part of a single, multimodal structure, is indeed Sperber and
Wilson’s presumption of optimal relevance: the presumption that the multi-
modal form is considered by the sender as the optimal form of communi-
cation of a particular message in a particular context, indeed, entails that
the different modes are to be considered as conveying a single message,
and that they have to be processed as interrelated components of a single
textual unit.
If what binds together the different modes is this pragmatic presump-
tion of optimal relevance, the interaction of the modes does not happen,
66 S. DICERTO
thanks to a grammar, but rather, it has an inferential basis. Far from pro-
viding strict combinatory rules, the presumption of optimal relevance sug-
gests that connections between the different modes must exist and that
they are to be identified in order to lead us to the recognition of inten-
tions. It is then to the presumption of optimal relevance that a multimodal
analysis has to make reference in order to identify the sender’s intention
and the textual organisation chosen by the sender as optimally relevant to
communicate the intention itself. If the ‘communicative glue’ of multi-
modal texts is the presumption of optimal relevance, then relevance is the
general principle all the other dimensions of analysis have to refer to and
comply with, and multimodal texts can be profitably analysed from a
relevance-theoretic perspective.
As discussed in Chap. 3, RT supports the view that, other than through
the semantic meaning derived from linguistic encoding/decoding, recipi-
ents identify the sender’s intentions also through explicatures and impli-
catures, namely, inferential meanings brought about by the use of language
in a certain context. These inferential meanings are retrieved on the basis
of the semantic representation of utterances among other factors; they
are so important in communication that their existence can change dra-
matically the meaning that would normally be associated with the seman-
tic representation.
As seen in Chap. 3, RT was developed mostly on a linguistic basis—
therefore, the concepts of ‘semantic representation’, ‘explicature’ and
‘implicature’ were elaborated for language only. It is thus important to
assess how adaptable to multimodal communication such concepts are.
Do multimodal messages have a semantic representation? Are they capable
of suggesting explicatures and implicatures? In what follows, we start from
the presence of a multimodal semantic representation and move on to
discussing the ability of multimodal texts to generate explicatures and
implicatures.
The semantic representation of an utterance is also called its ‘logical
form’ in Sperber and Wilson’s work, and it is defined as follows:
[…] a logical form is a well-formed formula, a structured set of constituents,

which undergoes formal logical operations determined by its structure. […]
for a representation to be amenable to logical processing, all that is necessary
is for it to be well formed, whereas to be capable of being true or false, it
must also be semantically complete […]. We take it that an incomplete
conceptual structure can nevertheless be well formed, and can undergo logi-
cal processing. (1995:72)
From the above, it is possible to infer that, in order to be interpretable

from a RT perspective, a multimodal text must have a semantic representa-
tion, amenable to logical processing. Modes in multimodal texts do show
a semantic representation, or logical form: signs belonging to multimodal
texts form structured sets of constituents—language is most certainly
structured, in that it is organised for communicative purposes, and the
works by Kress and Van Leeuwen (2006) and Van Leeuwen (1999) have
successfully argued that images and sounds also show forms of organisa-
tion, although by different means and with different levels of solidity.
However, as previously observed, the semantic representation of a multi-
modal text is not equivalent to the simple sum of the semantic content of
the various modes, but rather it is the product of the interaction of the
modes.
Sperber and Wilson only specify that some kind of structuration is
required in order for a message to have a semantic representation; within
the scope of their work, however, they mean the notion of ‘structured set
of constituents’ to apply to utterances and hence to units of meaning
whose organisation is determined by the rules of grammar. However, a
multimodal text does not have a grammar establishing rules of interaction
for its components, and as pointed out previously, the present study sup-
ports the view that the ‘communicational glue’ that ties together the dif-
ferent modes in a multimodal text is the presumption of optimal relevance,
rather than a set of grammatical rules. This means that the end receiver of a
multimodal message has to reconstruct the semantic representation of a mul-
timodal text through the presumption of optimal relevance rather than on a
grammatical basis in order to be able to process it logically, but the semantic
representation still exists. The above definition of semantic representation
can then be applied to the logical form of a multimodal text with one clari-
fication, that is, that while multimodal texts do have a semantic represen-
tation amenable to processing, the type of processing this requires is not
grammar-based, but rather inference-based, and it entails connections
among signs from different modes.
According to Sperber and Wilson’s definition, the semantic representa-
tion of utterances may be incomplete in some cases, requiring the hearer
to complete it before it can be assigned a meaning; this activity of comple-
tion of the semantic representation finds a practical example in the
reference resolution of deictics. If somebody encounters a claim like ‘She
caught a cold’, the recipient will need to know who the pronoun ‘she’
refers to before they are able to assign a truth value to the utterance or
68 S. DICERTO
form an opinion on it, although the utterance is in itself well-formed.

While assigning reference to a deictic expression certainly requires contex-
tual knowledge, and can hence be considered as a pragmatic activity, as we
have seen, Sperber and Wilson claim that completing the semantic repre-
sentation of an utterance allows the recipient to access further explicit
meaning, as it does not reveal anything in terms of the implicit value of an
utterance. Indeed, completing the sentence above, replacing ‘she’ with,
for example, ‘your sister Sofia’ is essential for us to process logically the
semantic representation, so that we can say whether it is true or false. In
turn, being able to process the semantic representation will be helpful to
understanding what the sentence is implicitly intended to communicate,
leading us towards an interpretation of the sender’s intention in letting us
know that Sofia caught a cold. The act of completion in itself is not what
communicates implicit meaning, as a development of a logical form is
considered an explicitly communicated assumption, namely, an explicature
(1995: 182). While required developments of semantic representations
such as the assignment of a reference to deictics are easily identifiable in
language, in that they are often signalled by specific grammatical catego-
ries or sentence structures (e.g. pronouns or adverbs of place, elliptical
sentences), it is somewhat harder to establish if and when the semantic
representation of a multimodal text is incomplete and requires develop-
ment on the part of the end receiver. Let us examine the example in
Fig. 4.1.
Consider Lord Kitchener’s pointing finger. The reference the finger
makes can vary according to the person who is in front of the image,
which makes of the pointing index what we can call a ‘visual deictic’. To
derive meaning from the text, the message receiver needs to complete the
semantic representation of the multimodal message, identifying them-
selves as the reference of the deictic (and of the verbal deictic pronoun
‘you’). The presence of the visual deictic, whose reference resolution is
easy for the message receiver since Lord Kitchener’s finger is pointing at
them while they are analysing the message, also helps the recipient assign
a reference to the linguistic deictic. Without the image, the linguistic ref-
erence would be less clear, and readers may come to the conclusion that
the message is not necessarily directed at them.
Assigning reference to these two deictics does not add any implicit
information to what the communicated message is, but without reference
assignment, the semantic representation of the multimodal text would be
incomplete. This example suggests that the semantic representation of a
Fig. 4.1 Lord Kitchener poster, Alfred Leete, 1914

70 S. DICERTO
multimodal message can be incomplete and that explicatures can be

brought about by multimodal texts, too, either by means of individual
modes or through their interaction.
Implicit meanings that cannot be considered as developments of the
semantic representation of a message, that is, implicatures, are defined in
RT as contextual assumptions or implications which a sender manifestly
intends to make manifest to the hearer (1995: 194). RT distinguishes two
kinds of implicature: implicated premises and implicated conclusions.
Implicated premises are supplied by the recipient, thanks to their knowl-
edge of the world relevant to the message, whereas implicated conclusions
are derived from the message, its explicatures and implicated premises (if
any). In both cases, their retrieval is connected to the recipient’s presump-
tion of optimal relevance with respect to the sender’s message. To better
explain these concepts, Sperber and Wilson use the following example
(1995: 194). In exchange (1),
(1) (a) Peter: Would you drive a Mercedes?

(b) Mary: I wouldn’t drive ANY expensive car.
Mary’s remark is relevant to Peter if and only if Peter can retrieve impli-
catures (2) and (3).
( 2) A Mercedes is an expensive car.

(3) Mary wouldn’t drive a Mercedes.
Neither (2) nor (3) are developments of the semantic representation

of utterance (1b). As they are implicit meanings Mary manifestly intended
to communicate by her answer, they are implicatures connected to
Mary’s utterance. The difference between them is that (2) is an impli-
cated premise, as it is provided by Peter based on his encyclopaedic
knowledge related to the semantic representation (as the latter mentions
expensive cars, of which Peter knows Mercedes are a subcategory),
whereas (3) is derived logically from the ‘sum’ of Mary’s utterance and
the implicated premise (Mary would not drive any expensive cars + a
Mercedes is an expensive car = Mary would not drive a Mercedes), and
hence it is an implicated conclusion. Language is certainly capable of
inducing the retrieval of both implicated premises and implicated conclu-
sions, as shown by this example, but the concept of implicature needs to
be applicable to multimodal texts as well if these are to be analysed in
relevance-theoretic terms.
Returning to the Lord Kitchener example, the image can certainly be

said to be involved in the retrieval of an implicated premise: in the illustra-
tion, Lord Kitchener is wearing a uniform defining him as a member of the
British Army. While this multimodal text never makes explicit reference to
what the country needs the reader for, the intended recipients of the origi-
nal message can be assumed to have known of the ongoing war, must have
been familiar with the army uniform (and possibly with Lord Kitchener’s
image) and hence realised the character’s connection with the armed
forces, much in the same way as anybody who knows Mercedes also knows
that a Mercedes is an expensive car. Thanks to the retrieval of this impli-
cated premise, and along with the explicatures of the multimodal content,
the audience can reach the implicated conclusion: the British Army needs
them for military purposes. This implicated conclusion is the fruit borne
by the interaction of language and image, their explicatures and the impli-
cated premises they help to retrieve. Implicated conclusions in this multi-
modal text are produced by the message as a whole, with the contribution
of both visual and verbal content. The analysis of the Lord Kitchener
poster suggests that such static multimodal texts share features such as
semantic representation, explicatures and implicatures with monomodal
spoken texts such as the ones accounted for by Sperber and Wilson, lead-
ing the recipient to the sender’s meaning by the same means (see
Table 4.1).
The possibility of analysing a multimodal text for translation purposes
in terms of its semantic representation, explicatures and implicatures has a
strong link with the notion of translation as a case of interlingual interpre-
tive use proposed by Gutt (see Sect. 3.2). In a monolingual context,
Wilson and Sperber maintain that two utterances interpretively resemble
each other in a certain context ‘to the extent that they share their analytic
and contextual implications’ (Wilson and Sperber 1988: 138, my
emphasis). In a translation context, what this means is that in order to
interpretively resemble the ST, a TT needs to share with it a similar ana-
lytic structure (semantic representation) and produce similar contextual
effects (explicatures and implicatures).
Table 4.1 Sender’s meaning

Sender’s meaning
Semantic representation Inferential meanings
Explicatures Implicatures
72 S. DICERTO
Gutt argues that a total transfer of a message for a different audience is

not always possible (2000: 103); a high level of interpretive resemblance
between ST and TT may not be achieved in some cases for several reasons,
for example, the lack of a shared context, the different semantic represen-
tation imposed by the two linguistic codes or differences in the style of
genre of the TL that may require changes reducing interpretive resem-
blance. Hence, the ST and the TT can often resemble each other only to
a certain extent, showing a similar semantic representation producing
similar implicatures and explicatures to the extent to which this is
possible.
By definition, implicatures are retrieved on the basis of the explica-
tures, the semantic representation of a message and the recipient’s knowl-
edge of the world; explicatures are dependent on the semantic
representation only. The contextual implications of a message as a whole,
then, can be said to be based on its semantic representation. Given this
relationship, it is essential to establish how the semantic representation of
multimodal texts is organised under the presumption of optimal relevance
in order to model a pragmatic analysis of multimodal texts. The identifi-
cation of the semantic representation of a message will provide insights
into the formation of contextual implications and hence into how a mul-
timodal ST can be approximated through the production of a TT with a
similar semantic representation on which similar explicatures and implica-
tures are based.
Summarising, several claims have been made so far in this chapter:
–– multimodal texts convey the sender’s meaning in a very similar way
to verbal monomodal ones, showing a semantic representation capa-
ble of suggesting the retrieval of explicatures and implicatures based
on the visual as well as verbal content;
–– a translation of a multimodal ST needs to interpretively resemble it,
showing a similar semantic representation that suggests the retrieval
of similar explicatures and implicatures;
–– explicatures and implicatures (and hence the sender’s meaning) are
based on the semantic representation of a message. While grammar
helps a recipient (in the case of written texts, a reader) navigate the
semantic representation of verbal messages, multimodal texts have
no grammar to explain their logical form; rather, this needs to be
reconstructed by the recipient through the presumption of optimal
relevance.
This last point introduces the next section. While the presumption of
optimal relevance governs the reconstruction of the semantic representa-
tion in multimodal texts, what this means in practical terms still needs to
be explained. Section 4.2 therefore analyses some frameworks meant to
explain the relationships occurring between visual and verbal content to
gain an insight into multimodal textual organisation. It shows how the
presumption of optimal relevance is the key to the identification of the
logico-semantic relationships between visual and verbal content that play
a role in shaping the semantic representation of a multimodal text.
4.2 The Semantic Representation of Multimodal

Texts: Visual-Verbal Relationships
Until relatively recent times, the verbal content of texts was considered as
the main source of meaning. All other contributions to communication
were thought of as ‘satellites’ of language whose non-central role was to
add further meaning to the predominantly verbal text, to the point that
the very word ‘text’ is often used loosely as a synonym of ‘linguistic text’.
The question of whether language was really the dominant form of textual
expression went for some time unasked, and some studies have directly or
indirectly supported this view over the years (e.g. Eco 1976; Levin and
Mayer 1993; Berinstein 1997). However, in a more recent view, visual-
verbal relationships are examined with an approach that includes the pos-
sibility of images having a dominant role in a text and language being
subservient to them or images and language co-determining the text.
Many of the most recent visual-verbal relationship taxonomies function
according to this tenet (e.g. Nikolajeva and Scott 2000; Marsh and White
2003; Martinec and Salway 2005; Hughes et al. 2007; Pastra 2008);
scholars dealing with the topic have raised the question of how images and
verbal content interact, rather than how images can contribute meaning to
the verbal text. Understanding in detail how signs from different semiotic
sources interact to generate meaning is fundamental for a study on multi-
modality, as it provides a picture of how a multimodal text is organised to
convey messages (i.e. its semantic representation). This is even more
important in a study of multimodality for translation purposes, as the goal
in translation is not only to understand the semantic representation of the
multimodal ST, but also to be able to replicate it as far as possible in the
TT in order to produce a text that interpretively resembles its original.
74 S. DICERTO
The taxonomy developed by Martinec and Salway (2005) tries to

answer the question of how images and language interact by providing a
framework that puts images and verbal content in a double relationship
system made up of status and logico-semantic connections. These rela-
tionships were originally elaborated respectively by Barthes (1977) and
Halliday (1994), whose work was combined by Martinec and Salway to
produce a new classification.
Barthes’s work identifies three possible visual-verbal relations based on
status:
– Anchorage: the verbal content supports the visual content

– Illustration: the visual content supports the verbal content
– Relay: visual and verbal content have equal status (Barthes 1977: 32–51)
While anchorage and illustration are left unchanged in Martinec and

Salway’s work, relay is divided into two instances: the case in which visual
and verbal content are equal and independent and the case in which they
are equal and complementary. Their categorisation is shown in Fig. 4.2.
As mentioned, the categories outlined by Barthes are meant to identify
visual-verbal relations on the ground of the communicative status that
images and language occupy with respect to each other. Following this
categorisation, Barthes also hints at the fact that different relationships of
status may be associated with different types of meaning (1977: 40–41);
for example, a relationship of illustration could be used when the author
needs to exemplify an abstract concept, and a relationship of relay might
not be usually employed to perform the logico-semantic function of exem-
Fig. 4.2 Relationships of status, after Martinec and Salway (2003: 351)
plification. Encouraged by Barthes’s own claim, Martinec and Salway

argue that his categorisation can be integrated with Halliday’s model of
logico-semantic relationships to achieve a picture of the links between
relationships of status and the meaning they generate.
Halliday’s taxonomy (the other major component of Martinec and
Salway’s model, represented in Fig. 4.3) was originally developed as a
model of clause relationships, and as such, it applied only to language. Its
founding criterion is the type of meaning contribution a clause can add to
another clause in logico-semantic terms. The relations described by
Halliday are relations of projection and expansion. Projection deals with
events previously represented linguistically, whereas expansion deals with
relations between ‘new’ events, not yet represented linguistically (Halliday
1994: 252–253). In a relation of projection, one clause is a report of a
thought or an utterance, and the other attributes it to an agent; in a rela-
tion of expansion, the clauses describe events and states of affairs, and one
clause adds to the other.
Halliday describes two possibilities for projection:
–– Locution: the reported event is an utterance;
–– Idea: the reported event is a thought.
He also identifies three ways in which a clause can expand the content
of another clause:
Fig. 4.3 Visual-verbal logico-semantic relationships—after Martinec and Salway

(2005: 360)
76 S. DICERTO
–– Elaboration: the clause restates the main clause in other words, re-
exposing or exemplifying it;
–– Extension: the expansion is carried out with the addition of new ele-
ments, alternatives or exceptions;
–– Enhancement: the clause embellishes its main clause adding circum-
stantial features like time, space and causal relations.
Martinec and Salway argue that Halliday’s model of clause relationships
can also account for the logico-semantic contributions images and lan-
guage can add to each other. Halliday’s clause relationships, summarised
in the following diagram, are applied by Martinec and Salway to visual-
verbal relationships as in Fig. 4.3.
Martinec and Salway’s claim is that the two systems (status and logico-
semantic relationships) complement each other in describing the types of
relationship existing between images and language; therefore, in their
view, a visual-verbal relation should be identified in terms not only of the
relative status of signs but also of their logico-semantic connection, select-
ing a relationship from each of the two systems. Halliday originally pro-
posed his taxonomy to account for both paratactic and hypotactic relations,
and therefore, his taxonomy can apply equally well to anchorage, illustra-
tion and relay as the relevant status; indeed, these are the multimodal
equivalents to paratactic (relay) and hypotactic (anchorage and illustra-
tion) clause relations, due to the difference in status between the contribu-
tions made by the different modes to the message.
However, it is worth asking whether selecting from the two systems
simultaneously, as Martinec and Salway suggest, might be useful for
describing multimodal relations for the purpose of translation analysis. If
texts are thought of not as the result of an addition, but as the result of a
multiplication of semiotically different meanings, it seems unlikely that
there is a well demarcated relationship of equal or subordinate status
between textual resources in multimodal texts. While the ‘quantity’ of
information provided by each mode, assuming that this could be mea-
sured in some way, could be a criterion to determine status, there can be
texts in which a single sign from one mode can influence to a great extent
the meaning of an even relatively large set of signs from another mode; for
example, the meaning of a page full of verbal instructions can be entirely
reframed, or anyway drastically altered, if the page is crossed out in red
pen. In this example, the outnumbered single sign could, from a certain
perspective, be claimed to be ‘dominant’ because of the influence it exerts
on a number of other signs in spite of its isolation. However, the outnum-

bered but ‘powerful’ sign would not communicate the same full message
without the signs it influences, and therefore, it is somewhat dependent on
them, again calling dominance into question.
As an all-encompassing hierarchical criterion taking into account all the
different facets of interaction between modes appears as a rather unlikely
theoretical construct, it is worth wondering whether determining the sta-
tus of a sign in a multimodal text contributes information that can be use-
ful for our purposes, that is, understanding how multimodal texts convey
meaning in order to inform their translation. To answer this question, we
might return to Marsh and White’s metaphor of the message as a wedding
of components (see Sect. 2.1). Building on its social implications to con-
tinue the metaphor, we could say that establishing a hierarchy of signs in a
text can be likened to establishing a hierarchy between husband and wife.
Even in the event that this objective could be attained by choosing a crite-
rion producing clear results (e.g. salary), such a criterion would consider
only one aspect of the marital relation, ignoring others (e.g. who usually
makes decisions). Knowing which partner has a ‘dominant’ role in one
aspect of the marriage only contributes partial information about the
‘semantic representation’ of the relationship, and it is unlikely to provide
substantial indications of the overall nature of the individuals, of whether
their relationship is well formed and meaningful and of how the couple
interact with their social context. As the purpose of analysing visual-verbal
relationships here is to come to an understanding of the overall multi-
modal logical form of a text, status relationships seem to be able to con-
tribute only very little to our purpose, if anything at all, and will not be
applied in this study.
By contrast, the logico-semantic part of Martinec and Salway’s model,
that is, the interpretation of Halliday’s taxonomy and its application to
visual-verbal relationships, aims to explain what the logico-semantic rela-
tions between signs from different modes can be, and its scope is much
closer to the intent of understanding the multimodal semantic representa-
tion of a text. As noted, Halliday developed his clause relationship model
to explain logico-semantic ties between clauses, that is, the semantic and
logical links these form to convey meaning together, which resonates well
with the purpose of a study aiming to analyse STs for translation
purposes.
However, it is important to note that Halliday’s clause relation model
is concerned more with the product, with what is produced through the
78 S. DICERTO
interaction between clauses, than with how it is produced. This is because

clauses within a sentence belong to a specific code (language), which has
precise grammatical rules determining how clauses interact with each
other (e.g. word order, punctuation, use of prepositions and conjunc-
tions) and ensuring that their relationships are encoded correctly and
understandable as such. This is due to language having a high degree of
organisation, as discussed in Sect. 2.2. By contrast, we can recall that the
relation between an image and verbal content, the multimodal semantic
representation, has no grammar to help its identification (see Sect. 4.1). If
the verbal content extends the meaning of an image, how do recipients
understand that? What is in an image that can enhance verbal meaning?
Being able to answer these questions is fundamental to building an ana-
lytical tool geared to describing the organisation of multimodal texts for
translation: indeed, the inability to identify the source of logico-semantic
relationships would cripple the analyst’s attempt to systematise the reading
of a text all the way down to how individual resources are connected.
Consequently, the result of the analysis would only partially help the trans-
lation of a ST, as it would be impossible to know how meaning is conveyed
by the interaction of resources and hence how this interaction can be
mimicked in a potential TT.
Intersemiotic interaction needs to be investigated at a more ‘basic’ level
to understand how logico-semantic relations are brought about. This issue
has been addressed by Pastra’s study, which aimed to build ‘a corpus-
based framework for describing semantic interrelations between images,
language and body movements’ from a message-formation perspective
(2008: 299). Pastra’s framework is used in the current study to comple-
ment Halliday’s logico-semantic relationships, as discussed below, pre-
cisely because of its focus on how multimodal texts are formed. In Pastra’s
words, ‘the relations attempt to capture the semantic aspects of the mes-
sage formation process itself and thus, facilitate inference mechanisms in
their task of revealing (or expressing) intentions in message-formation’
(2008: 306). This view about the purpose of semantic analysis matches
with what is stated by RT, in so far as the ultimate purpose of communica-
tion is seen by RT to be the identification and recognition of intentions
through the use of code as well as inference. Therefore, Pastra’s view is
coherent with the overall pragmatic approach adopted in this work. The
types of relationship described in Pastra’s work and called by the author
‘cross-media interaction relations’ (COSMOROE) can be then seen as the
how that complements Halliday’s what, and on this basis, a detailed discus-
sion of cross-media interaction relations is important to move forward

towards the final form of the model for multimodal ST analysis. The
COSMOROE relationships are summarised in Fig. 4.4.
As mentioned, this taxonomy looks at message formation rather than at
the relative status of the signs. As shown in Fig. 4.4, the main categories
included in the COSMOROE are:
–– Equivalence: the information expressed by the different modes
(which she terms ‘media’) is semantically equivalent, it refers to the
same entity (e.g. the image shows a river and the verbal content tells
us its name);
–– Complementarity: the information expressed in one mode comple-
ments the information expressed in another mode (e.g. the image
shows a river and the verbal content provides information describing
its features);
–– Independence: each mode carries an independent message, which
is, however, coherent (or strikingly incoherent) with the document
topic. Each can stand on its own, but their combination creates a
larger multimodal message (e.g. the verbal content describes a river
Fig. 4.4 Cross-media interaction relations (COSMOROE)—after Pastra (2008:

308)
80 S. DICERTO
and the text is decorated with images of droplets of water). (after

Pastra 2008: 307–308)
Each relationship type is further divided by Pastra into subtypes with
specific features (2008: 308–314; examples added). Specifically, the rela-
tionship of equivalence is divided into:
–– Token-token: the modes refer exactly to the same entity (2008:
308), for example, the image shows St Peter’s Basilica and the verbal
content reads ‘St Peter’s Basilica’;
–– Type-token: one mode provides the class of the entity expressed by
the other (2008: 308), for example, the verbal content mentions
churches and the image shows St Peter’s Basilica;
–– Metonymy: each mode refers to a different entity, but the user’s
intention is to consider the two entities as semantically equal. The
two entities come from the same domain and there is no transfer of
qualities between them (2008: 309), for example, an image of
Michelangelo’s Pietà with a legend reading ‘art in St Peter’s
Basilica’.
–– Metaphor: a similarity is drawn between entities belonging to differ-
ent domains, and there is a transfer of qualities between them (2008:
310), for example, the image shows St Peter’s Basilica and the verbal
content reads ‘a paradise of the arts’.
The relation of complementarity, instead, is divided into two subcat-
egories, according to whether the relationship itself is essential or non-
essential to form a coherent message. Both sets of relationships feature
three subtypes:
–– Exophora: one mode contains a reference whose resolution is pro-
vided by another mode, for example, the verbal content says ‘you are
here’ and the visual content shows a map of England pinpointing
London.
–– Agent-object: one mode reveals the subject/object of an action/
event/state expressed by another (2008: 310), for example, a bal-
loon connects the speaker to their utterances in comics.
–– Apposition: one mode provides additional information on the
other—this information identifies or describes something or some-
one (2008: 311). This relation is different from the token-token
kind as it is linked to a specific context and not promoted as gener-
ally valid (2008: 311). An example could be an image of Barack
Obama with a tagline describing him as ‘the most loved

U.S. President’.
In addition to these three relationships, the ‘non-essential’ cluster of
relationships of complementarity features one more relationship:
–– Adjunct: one mode functions as adjunct to the information carried
by another mode (2008: 312), adding non-essential information.
The signature on a painting, for example, could play the role of an
adjunct, as it adds information that is not (always) necessary in order
to understand the message carried by the painting.
Lastly, the relation of independence shows three subtypes:
–– Contradiction: one mode refers to the exact opposite of another, or
the two references are semantically incompatible (2008: 313).
Contradiction shows the same subtypes as equivalence. An example
of contradiction could be a picture of the Pope with a tagline reading
‘the Italian parliament’, since the linguistic and visual meanings are
incompatible (as the Pope is not an Italian citizen, and cannot even
be part of the Italian parliament, let alone represent its entirety).
Contradiction, as discussed in some of the examples in Chap. 5, is
mainly used for humorous/satirical purposes, unless it is the result of
a factual mistake.
–– Symbiosis: one mode provides information and the other shows
something thematically related, which, however, does not refer to or
complement that information (2008: 313), as is the case of visual
decorations to verbal content (see above example on
independence).
–– Meta-information: one mode reveals additional information
through its means of realisation. This information is independent yet
inherently related to the information expressed by the other mode
(2008: 314). For example, the font or the style of visual decoration
of a text may reveal extra information on the place and time of its
production.
Halliday’s categorisation as adapted by Martinec and Salway and the
COSMOROE taxonomy seem to be complementary in the explanation of
message formation: the former describes what is being done at a commu-
nicative level, while the latter looks at how this is done, namely, what cross-
media relations are used towards expanding or projecting meaning. These
two taxonomies can then be integrated to offer a complementary view of
82 S. DICERTO
the interaction of signs in multimodal message formation. Although

Pastra’s taxonomy successfully describes intersemiotic relationships from a
rather ‘mechanist’ viewpoint, arguably coming closer to the multimodal
equivalent of a ‘grammar’, she still does not take a further step to investi-
gate such relationships on the side of the recipient. She does not com-
ment, for example, on how the audience of a message may identify the
presence of, say, a relationship of equivalence between two signs. As dis-
cussed, in language this is done by means of a grammar that determines
how clauses are supposed to link to each other, which is not the case
regarding the organisation of multimodal texts.
To recap, in the light of the relevance-theoretic approach adopted here,
the claim of this study is that the mechanisms of detection of multimodal
relationships are reconstructed by the recipient by means of the presump-
tion of optimal relevance. What this means is that, conscious that the
sender must have intended the message to be optimally relevant to their
audience, the recipient goes about multimodal message interpretation
actively looking for potential cross-media interaction relations that would
produce an optimally relevant semantic representation; the selection hap-
pens based on the relevance of the information that can be derived from
the relationships and the processing effort they involve. Pragmatic pro-
cesses playing such an important role in the reconstruction of the multi-
modal semantic representation implicitly support Levinson’s argument
that there is no consistent way of, in his own words, ‘cutting the semiotic
pie’ so that semantics and pragmatics account for two separate areas of
meaning and have no influence over each other (Levinson 2000: 198).
Figure 4.5 summarises the integration of Martinec and Salway’s adapta-
tion of Halliday’s clause relations with Pastra’s work to produce a single
framework. The part of the diagram related to Pastra’s work is here repro-
duced with two important modifications. The first is that relations of con-
tradiction are not treated as a subtype of relations of independence, but as
a category in its own right, whose subtypes coincide with the ones shown
by the relation of equivalence. Indeed, in my view only symbiosis and
meta-information constitute relationships of true independence, since the
information provided by the different modes does not need to be analysed
jointly. In the case of contradiction, the information provided by the two
modes cannot stand on its own, as two pieces of information contradicting
each other need to be analysed jointly to even establish the existence of the
relationship itself. Therefore, contradiction is here considered on the same
level as equivalence, complementarity and independence. Since the rela-
Fig. 4.5 Visual-verbal relations, full diagram
tionship subtypes of contradiction are the same as for equivalence, for

purposes of economy and clarity, these are assimilated in Fig. 4.5.
The second modification relates to the addition of a new subcategory
within the category of complementarity. Pastra’s model does not include a
category identifying a visual-verbal relation in which a textual element is
crucial for the recipient to be able to select one of two (or more) meanings
associated with a polysemic word/image as the intended interpretation
within that context. While this is certainly a relation of complementarity,
as the visual meaning complements the information provided by verbal
means, none of the categories within the relevant branch of Pastra’s model
84 S. DICERTO
covers this type of relationship. As the ability to interpret a polysemic word

appropriately is fundamental to the overall functioning of a text, a new
category called ‘sense selection’ is added to the branch of essential rela-
tions of complementarity. The part of the diagram related to Martinec and
Salway’s adaptation of Halliday’s work is kept unchanged from the origi-
nal. As argued above, the two parts of the diagram are complementary, as
logico-semantic outcomes result from the formation of cross-media inter-
action relations. The identification of COSMOROE as well as logico-
semantic relationships, as argued above, follows the recipient’s application
of the presumption of optimal relevance; while no grammar will be at the
basis of the unequivocal identification of relations between signs, the idea
that the sender constructs the message as optimally relevant will guide the
reader towards the relationships that are, in their view, most compatible
with this presumption.
Assuming that, as previously discussed, in order to interpretively resem-
ble a ST, a TT must show a similar analytical organisation and contextual
implications, translating a multimodal text means being able to reproduce
as closely as possible the visual-verbal relations that shape the semantic
representation of a ST. Given that the contextual implications of a text,
namely, its explicatures and implicatures, are based on its semantic repre-
sentation, analysing an ST by means of the above taxonomy offers a view
of the semantic representation that is also fundamental in terms of its
contribution to implicit meanings (and, as a consequence, the sender’s
general meaning). Therefore, based on the study of the multimodal
semantic representation carried out in this chapter, Table 4.1 can be fur-
ther elaborated upon, as shown in Table 4.2.
It has to be noted that, while it is true that explicatures and implicatures
are based on the semantic representation, they also depend on the recipi-
ent’s contextual and encyclopaedic knowledge. Therefore, it can be argued
that problems regarding the interpretive resemblance between ST and TT
will fall into two categories:
Table 4.2 Sender’s meaning, first elaboration

Sender’s meaning
Semantic representation Inferential meanings
COSMOROE Logico-semantic relations Explicatures Implicatures

–– Problems deriving from the translator’s inability to lead the reader

towards the intended interpretation of the semantic representation
(and hence, in the case of multimodality, the relationships described
above), for example, due to restrictions in the manipulation of tex-
tual resources. In some translation scenarios, for example, the visual
resources cannot be altered (e.g. in AVT) or the ST contains a pun
referring to one of the visual elements that cannot be replicated in
the TT. This inability to manipulate individual textual resources is
likely to affect interpretive resemblance, both in terms of the overall
semantic representation (including visual-verbal relations) and the
contextual implications of the TT, as the latter depend on the
former.
–– Problems deriving from the translator’s awareness that the target
audience is not likely to possess the contextual and/or encyclopaedic
knowledge to which the textual resources appeal in order to suggest
explicatures and implicatures meant to lead the recipient towards a
certain interpretation. In this case, the problem is of an inferential
nature; yet, while interpretive resemblance is affected mainly in con-
textual terms, in order to overcome the obstacle, modifications will
have to be applied to the textual resources to compensate for the
expected loss of information.
Given the two points above and the involvement of individual textual
resources in each of them, considerations on how meaning is conveyed at
the level of each mode made in Chap. 2 need to be woven into the model’s
fabric. This task is addressed in the following section.
4.3 Individual Modes and Pragmatic Processes

At this point, it is important to equip the model for multimodal ST analy-
sis with analytical tools based on the current views in semiotics about the
meaning in individual modes. The considerations presented in Chap. 2
cover many of the analytical requirements for this meaning dimension, in
that they state that signs belonging to different modes are used to convey
different types of meaning, the same meaning in a different light or with a
varying degree of detail, outlining the communicative situations each type
of sign is more suited to describing. During any application of the model,
it is then up to the analyst to identify the meaning carried by semiotically
different signs in a multimodal text. Nevertheless, in order to make f urther
86 S. DICERTO
progress with the development of our model of multimodal ST analysis, it

is necessary to consider some further factors regarding both the verbal and
visual aspects, with particular reference to the visual mode.
As considered in Sect. 2.2, the various modes have different levels of
organisation, being all more or less capable of performing the three
Hallidayan metafunctions. While language shows a rather solid grammati-
cal organisation, images are said to be organised only to a certain extent;
due to its lack of an evident organisation, sound can barely be identified as
a ‘mode’, in spite of the regularities connected to meaning production
which it shows.
Verbal meaning has already been widely explored in the past decades
through general and language-specific studies, not least in a Hallidayan
systemic-functional grammar framework, as noted above. Therefore, as far
as the analysis of language is concerned, the model can draw on a wealth
of existing research. Hence the analysis of a multimodal text from the
point of view of its linguistic component is to be based on a case-by-case
approach which takes into account the language of the ST to be analysed
and of the TT to be produced. Conversely, considerations on the visual
mode seem more generalisable as, while the use of images is certainly
culture-bound, the techniques to create visual narratives identified by
Kress and Van Leeuwen are by and large cross-cultural (e.g. background-
ing/foregrounding information, use of vectors, degree of realism). We
can recall from Sect. 2.2 that the visual mode offers a level of observable
organisation different from language; this organisation allows images to
be used to build narrative structures that can efficiently describe physical
action, but are less suited to other types of meaning (Kress and Van
Leeuwen 2006: 59–78). In the same context, I hinted at how Kress and
Van Leeuwen claim that the level of organisation shown by the visual
mode can be described by their ‘grammar of images’, postponing until
now the discussion of why the term ‘grammar’ might not be appropriate
to define what Kress and Van Leeuwen propose. Machin, for example,
objects to Kress and Van Leeuwen’s view that they have developed a
‘grammar of images’, claiming that the visual mode does not satisfy the
requisites set by Halliday for a ‘complex semiotic system’ and that there-
fore a study of its regularities cannot be called a ‘grammar’. This viewpoint
is explained in some detail in the quotation below:
[W]e looked at two characteristics that would need to be found in visual

communication in order for us to say there was indeed a visual grammar […].
This visual grammar must have a lexicon of elements that can be chosen to
create meaning in combinations [and] there must be a finite system of rules
for combination of elements. In the first case we have found that it is hard to
identify visual elements in themselves. […] In the second case we have not
found an arbitrary code that is the first layer of meaning. (Machin 2007: 176)
While Kress and Van Leeuwen’s substantial endeavour to study the

organisation of the visual mode does return some interesting results in
terms of insights into the way its regularities suggest meaning, calling it a
‘grammar’ implicitly suggests a parallel with the organisation of language
which is then not upheld in the analysis, as summarised by Machin (2007:
186). True enough, images have narrative potential; they can be used to
identify subject, object and ‘verb’ in an action (e.g. in a hunting scene, the
hunter would be the subject, the prey would be the object and the imagi-
nary line between hunter and prey would be the verb that links subject and
object); their composition can appeal to the recipient’s knowledge of the
world; at the same time, there is no finite set of rules that can be used to
combine visual elements, and sometimes the elements comprising a visual
message are difficult to tell apart as individual items (e.g. an object’s image
may blur and merge with the background, and no definite line separating
the two may be identifiable).
The standpoint which I adopt here agrees with Machin’s view that
‘grammar’ is not an adequate term to capture regularities in visual com-
munication. Elsewhere, it has been pointed out how much of the visual
interpretation offered by Kress and Van Leeuwen by means of their ‘gram-
mar’ actually relies heavily on contextual knowledge, although the
importance and influence of context on visual interpretation is only very
generically acknowledged by Kress and Van Leeuwen (Forceville 1999:
171). Thus, the inextricability of semantics and pragmatics in the analysis
of the semantic representation of a message appears to apply equally to the
visual mode and the verbal, with one major difference: as images do not
have an established grammar, the coded contribution towards message
interpretation is bound to be much reduced in the case of visual content
as compared to its verbal counterpart, consequently leaving greater room
for interpretation on the part of the recipient (including the translator, if
the visual content is part of a text that needs to be relayed for a different
language).
In order to bridge the gap left by the lack of a real grammar to sup-
port the interpretation of visual content, Forceville suggests that Kress
88 S. DICERTO
and Van Leeuwen’s work should be embedded in a cognitive approach

such as the one provided by RT (Forceville 1999: 174). Consistent with
Forceville’s suggestion, the approach adopted in this volume is that the
regularities identified by Kress and Van Leeuwen are detected by recipi-
ents under the presumption of optimal relevance in their quest for the
sender’s intention, in the same fashion as for visual-verbal relations. This
approach does not negate the existence of the regularities identified by
Kress and Van Leeuwen; rather, it aims to assign them to an appropriate
place in the interpretation of a multimodal message, namely, one that is
subservient to the presumption of optimal relevance and holds no claim
to represent a ‘visual grammar’. The semantic representation of visual
messages is thus identified by the receiver through both semantic and
pragmatic means, just as for verbal messages; nevertheless, the pragmatic
contribution to the interpretation of the semantic meaning is a lot more
apparent in the case of visual content. As can be noted, reliance on the
presumption of optimal relevance recurs throughout the model, charac-
terising not only the visual semantic representation but also the detec-
tion of visual-verbal relations and that of the overall multimodal logical
form (as described in Sect. 4.2). Therefore, this approach to the inter-
pretation of visual meaning shows consistency with the rest of the mod-
el’s organisation.
The same considerations applicable to the visual mode regarding the
contribution from pragmatic processes to the interpretation of its seman-
tic representation also apply to the aural mode, important for the analysis
of dynamic audiovisual texts. As noted, this mode shows an even more
basic level of organisation, and the interpretation of the meaning it con-
veys is less ‘stable’ than that of the visual or verbal system. Hence, its
(perhaps fewer) regularities could also be profitably explored from a point
of view of relevance.
Having concluded these remarks on the meaning of individual modes,
the structure of the model proposed by this study for ST analysis of the
sender’s meaning in multimodal texts appears now clearly in its tripartite
shape, summarised in Table 4.3.
As the dimensions of analysis go from the ‘micro’ level of the
semantic representation of individual modes to the ‘macro’ level of
inferential meanings, feeding into the interpretation of the sender’s
meaning, the model can be defined as scalar, similarly to what has been
proposed by Kress and Van Leeuwen (2001) and Baldry and Thibault
Table 4.3 Tripartite structure of the model for the analysis of multimodal STs
Sender’s meaning
Semantic representation Semantic representation of Inferential meanings

of individual modes multimodal text
Verbal Visual Aural COSMOROE Logico- Explicatures Implicatures

semantic
relations
(2005). However, the model proposed here emphasises how meaning

is conveyed through individual modes, multimodal interaction and
inference—this latter dimension in particular is not accounted for by
the works mentioned. While Table 4.3 progresses left to right from
individual modes to inferential meanings, this should not be seen as
supporting the view that text interpretation starts from one point to
proceed progressively to the other dimensions. Rather, the text will be
processed moving continuously between dimensions: the information
the recipient derives from each area of meaning will influence their
interpretation of the other dimensions until the recipient believes they
have interpreted the text according to its optimal relevance in their
current context. It is important to underline again how the presump-
tion of optimal relevance intervenes in each and every dimension of
analysis as outlined in this chapter, confirming its status as the overrid-
ing principle at the basis of the model.
While the model’s dimensions of analysis and the general principle gov-
erning its application are now fully outlined, the model still needs to be
shaped in a way that is actionable for its analytical purpose. Indeed, as it
stands, the model’s general structure is theoretically complete but not
fully developed as an analytical tool, lacking guidance on how to separate
different parts of a multimodal text for analysis and ‘slot’ information into
the different dimensions, also indicating their interaction. As stated by
Baldry and Thibault (2005), multimodal texts develop over space and
time; therefore, the applicable form of the model needs to include these
aspects, which are not yet addressed by the general structure reported
above and are discussed in the next section.
90 S. DICERTO
4.4 Development Over Space and Time: Cluster

and Phase
The concept of multimodal text development over space and time is cen-
tral to Baldry and Thibault’s work on multimodal transcription and text
analysis. These two forms of multimodal text development are based on
the concepts of cluster and phase, respectively, as discussed in Chap. 2.
We can recall that a cluster is a local grouping of items. The concept of
cluster is particularly relevant to the analysis of static multimodal texts
(Baldry and Thibault 2005: 31), which only develop over space, as these
tend to show more or less well-defined groupings of items useful for guid-
ing the message recipient through the various textual ‘areas’. Elements
belonging to different clusters in the same text are normally separated
visually, for example, with a line demarcating the top part of a leaflet from
the bottom, indicating that the two parts are to be considered as different
‘spatial units’. Without the use of clear markers of separation, a cluster may
be identified on the basis that all its elements are found within close prox-
imity of each other, and further removed from others. These elements can
be from different semiotic modes, and a cluster may contain images and
verbal content.
Leaflets, for example, show clusters marked by paper foldings; within
the same cluster, there may be further subclusters, for example, at the top,
centre and bottom of each cluster. Some genres within static multimodal
texts are more reliant on a cluster-based organisation than others. We can
think of comics as clear examples of multimodal texts in which clusters are
used to help the reader to follow the development of the narrative effec-
tively through frames. The heavy use of clusters in comics is so embedded
in the genre that comic readers expect to receive this type of organisational
guidance. The use of this technique in comics also gives the impression of
text development over time, as cluster sequences are strongly associated
with the passing of time, and exceptions (e.g. flashbacks) may need to be
signalled to the reader.
This very specific use of clusters makes comics into one of the few genres
of static multimodal texts in which spatial and temporal development are
claimed to coincide (McCloud 1994: 100), although it has been success-
fully argued that panel sequences are not always used to communicate a
temporal development (Cohn 2007). However, in the context of static
multimodal texts, development over time does not really happen—we may
only have the impression of it. Indeed, the analytical framework for static
multimodal texts by Baldry and Thibault, shown in Table 4.4, only

accounts for space and not for time development.
While elements in a static multimodal text only show item groupings
divided spatially (clusters), dynamic multimodal texts (such as films) orga-
nise their textual resources both over space (e.g. split screen in the same
frame to create two clusters) and in temporal succession (one frame is
presented after the other).
We can also recall that temporal development is divided by Baldry and
Thibault into phases, the second item of interest here. A phase is a time-
based grouping of items that are ‘codeployed in a consistent way over a
given stretch of text’ (Baldry and Thibault 2005: 47). Therefore, a phase
groups together multiple textual resources to be analysed in conjunction
due to their temporally close use. Phases can also be demarcated more or
less clearly: subsequent shots in a film represent different phases with a
clear start/end; on the other hand, shorter dynamic texts, like commercial
jingles, may show a single phase corresponding to the entire text.
Given its applicability only to multimodal texts that show a temporal
development, the concept of phase is only relevant for the analysis of
dynamic multimodal texts. In this case, therefore, this component needs
to be added to the schema presented in Table 4.4, positioned to the left of
‘clusters’, reflecting a progression from generic to specific (see Table 4.5).
Indeed, clusters are parts of phases, for example, frames are parts of shots.
The concepts of phase and cluster as outlined by Baldry and Thibault
can be useful to complete the structure of the analytical model with tools
to divide multimodal texts into smaller units, transcribe and analyse them
in an orderly fashion. Therefore, the adaptation of Baldry and Thibault’s
model shown in Table 4.5 needs to be integrated with Table 4.3, which
illustrates the tripartite structure of the model proposed in this study, so
that a dynamic multimodal text can be ordered by temporal and spatial
development, to then be analysed in terms of the content of individual
Table 4.4 Transcription table for static multimodal texts, after Baldry and
Thibault (2005: 29)
Cluster Textual resources used in the clusters
Image of a cluster Information about the textual resources contained in the cluster, for
within the text example, wordings, visual images, spatial disposition of the items, size
of the cluster
92 S. DICERTO
modes, its multimodal semantic representation and the inferential mean-

ings it suggests. The component ‘textual resources used in the cluster’
from Baldry and Thibault’s model (Table 4.5) is replaced by ‘semantic
representation of individual modes’ and ‘semantic representation of mul-
timodal text’ from Table 4.3, as these latter categories also cover the
deployment of textual resources, but with the approach elaborated in this
study; an extra column is added for inferential meanings, unaccounted for
in Baldry and Thibault’s work. The integration of the two tables results in
Table 4.6.
In this way, the analytical table progresses from left to right starting
from the identification of item groupings, moving on to a breakdown of
the grouping’s visual, verbal and aural logical forms, proceeding to the
relationships the components of a cluster enjoy intermodally (multimodal
semantic representation) and eventually concluding with the individuation
of any explicatures and implicatures triggered by one or more resources.
What could be seen as the somewhat arbitrary nature of phases and
clusters, and of their subphases and subclusters, deserves some comment;
as groupings are not always clearly identifiable, at the beginning of any
‘dissection’ of a text into parts, it is important to select criteria to deter-
mine the number of clusters/phases consistently within the same text and
to choose how many sublevels will be used (if any). This largely depends
on the size of the text and the purpose of the analysis. If the object of
analysis is a book, a chapter may correspond to a ‘cluster’—pages may be
‘subclusters’ and a table within a page may be a ‘sub-subcluster’. In the
case of much shorter texts, the text itself may be a cluster, and no division
into further subclusters may be necessary (or even possible). It should also
be noted that cluster/phase size may not be comparable across texts; indi-
vidual shots in a film or in a television advertisement may well be rather
Table 4.5 Transcription table for dynamic multimodal texts, after Baldry and
Thibault (2005)
Phase Cluster Textual resources used in the clusters
Phase number Image of Information about the textual resources contained in

or start-end cluster 1 cluster 1, for example, wordings, visual images, sounds,
times spatial disposition of the items, size of the cluster
Image of Information about the textual resources contained in
cluster 2 cluster 2, for example, wordings, visual images, sounds
spatial disposition of the items, size of the cluster
Table 4.6 Table of transcription and analysis for dynamic multimodal texts
Sender’s meaning
Grouping of items Semantic representation of individual modes Semantic representation of Inferential meanings
multimodal text
Phase Cluster Verbal Visual Aural COSMOROE Logico- Explicatures Implicatures

semantic
relations
Phase Image Semantic Semantic Semantic Relations of Relations Explicatures Implicatures

number of representation representation representation of equivalence, of triggered by triggered by
or cluster of the verbal of the visual the aural content contradiction, expansion, the textual the textual
start- 1 content content complementarity, projection resources resources
end Image Semantic Semantic Semantic independence
times of representation representation representation of
cluster of the verbal of the visual the aural content
2 content content
ANALYSING MULTIMODAL SOURCE TEXTS FOR TRANSLATION: A PROPOSAL
93
94 S. DICERTO
different in terms of length. The texts analysed in Chap. 5 show practical

examples of different text sizes, determining differences in the size of item
groupings.
As considered, the concept of phase only applies to dynamic multi-
modal texts, which show a development over time; therefore, the ‘phase’
column will not appear in the adaptation of the table for static multimodal
texts (Table 4.7), the focus of attention in the following chapter. The
‘aural’ component also then becomes redundant for this narrower
purpose.
Tables 4.6 and 4.7 are the analytical tools resulting from this study,
respectively, for dynamic and static multimodal texts, representing the
model’s full structure in a format suitable for practical application to mul-
timodal STs.
It should be noted that the model can be further adapted and/or
expanded to accommodate different semiotic modes. Indeed, for the
study of specific genres, columns can be added and removed, with particu-
lar reference to those concerning the semantic representation of individual
modes. For example, the analysis of genres which are heavily reliant on
visual meanings may require further columns to investigate separately
semiotic modes like colour; the study of dynamic texts not including
images (e.g. radio advertisements) would require the model’s ‘dynamic’
setup, as the text develops over time, without the column for visual con-
tent—in the same context, an analyst may decide to split the sound col-
umn into ‘music’ and ‘environmental sounds’ to be able to record more
detail from the different audio tracks. As considered at the beginning of
this journey, identifying the semiotic modes relevant to a textual analysis is
a matter of necessity. Analytical necessities are dependent on the nature of
the materials to be analysed and on the analytical purpose; therefore, the
number of modes to be analysed is entirely dependent on the context of
analysis. In fields such as machine learning, in which the analytical goal
might be to focus on the interaction among textual resources without
reference to the context, the columns on explicature and implicatures
might also be redundant.
In the next chapter, I apply the model to the translation of several static
multimodal texts to analyse their message formation, identifying their
semantic representation, explicatures and implicatures. Therefore, I adopt
the blueprint for static multimodal texts as set out in Table 4.7 and exem-
plify how the analysis can be carried out practically, including transcription
details. The purpose of my ST analysis is to arrive at an interpretation of
Table 4.7 Table of transcription and analysis for static multimodal texts
Sender’s meaning
Grouping of Semantic representation of individual modes Semantic representation of multimodal text Inferential meanings
items
Cluster Verbal Visual COSMOROE Logico-semantic Explicatures Implicatures

relations
Image of Semantic Semantic Relations of Relations of Explicatures Implicatures

cluster 1 representation of the representation of the equivalence, expansion, triggered by triggered by
verbal content visual content contradiction, projection the textual the textual
Image of Semantic Semantic complementary, resources resources
cluster 2 representation of the representation of the independence
verbal content visual content
ANALYSING MULTIMODAL SOURCE TEXTS FOR TRANSLATION: A PROPOSAL
95
96 S. DICERTO
the ST sender’s communicative intention, based on the semantic represen-

tation of the text as well as my own contextual and world knowledge. This
is done in order to inform the translation into Italian (my mother tongue)
of the English STs analysed, pinpointing potential translation issues and
considering the strategies that can be applied to solve them.
It is worth remarking once more that the process of analysis presented
in Chap. 5 is not recommended for professional translators in their every-
day activity, as this would be too time-consuming for the rhythms of pro-
fessional life. The model is meant first and foremost as a research and
analytical tool that can be helpful for gaining a deeper understanding of
the issues related to multimodality. The application of the model to a large
sample of multimodal texts could help identify trends in the difficulties
translators encounter in their work on different text types, and be useful in
producing generalisations applicable to new texts. It can also be used as a
self-development tool by professionals interested in knowing more about
how the multimodal texts they translate convey meaning. Translation
tutors may also consider the model (perhaps in a reduced version) as a
didactic resource to support their students’ self-reflection or a classroom
discussion on multimodal texts and the potential pitfalls in their transla-
tion. Ultimately, the process of multimodal reflection the model supports
is not meant to identify the ‘most appropriate’ solution to a translation
problem, but only to enable the user to break down systematically the
meaning they themselves assign to a text, in order to understand what
potential difficulties they may have to face in their personal attempt to
achieve the closest possible interpretive resemblance in their
TT. Overcoming such difficulties may involve more or less important
modifications to the content/organisation of the textual resources, or an
effort to guide the reader towards different relationships between textual
elements or other contextual meanings.
References
London: Equinox.
Berinstein, P. (1997). Moving Multimedia: The Information Value in Images.
Searcher, 5(8), 40–49.
Cohn, N. (2007). A Visual Lexicon. The Public Journal of Semiotics, 1(1), 35–56.

Forceville, C. (1999). Educating the Eye? Kress and Van Leeuwen’s Reading
Images: The Grammar of Visual Design (1996). Language and Literature, 8,
163–178.
Halliday, M. A. K. (1994). An Introduction to Functional Grammar (2nd ed.).
London: Edward Arnold.
Uxbridge.
Levin, J. R., & Mayer, R. E. (1993). Understanding Illustrations in Text. In B. K.
Britton, A. Woodward, & M. Binkley (Eds.), Learning from Textbooks: Theory
and Practice. Hillsdale: Lawrence Erlbaum Associates.
University Press.
McCloud, S. (1994). Understanding Comics: The Invisible Art. New York:
HarperPerennial.
Nikolajeva, M., & Scott, C. (2000). The Dynamics of Picturebook Communication.
Children’s Literature in Education, 3(3), 225–239.
Van Leeuwen, T. (2006). Typographic Meaning. Visual Communication, 4,
137–143.
CHAPTER 5
Multimodal ST Analysis: The Model Applied
Abstract This chapter applies the model for multimodal ST analysis to inves-
tigate the organisation of some real multimodal texts for translation purposes.
Given the general scope of the book, several multimodal texts are used, select-
ing them from the different types outlined by Reiss (Text Types, Translation
Types and Translation Assessment. In A. Chesterman (Trans. & Ed.),
Readings in Translation Theory (pp. 105–115). Helsinki: Oy Finn Lectura
Ab, 1977/1989). Based on this choice, texts are organised into expressive,
informative and operative. At the beginning of each textual analysis, some
contextual information on the circumstances of publication of the message
and its author is included. Then, the model is applied to investigate the mul-
timodal organisation of the text, in order to identify potentially problematic
areas for its translation. Finally, potential solutions to the individual transla-
tion challenges are discussed in terms of how the applicable strategies may
affect the level of interpretive resemblance between source and target text.
Keywords Multimodal translation • Multimodality • Reiss • Translation

issues • Interpretive resemblance
In order to explore the model proposed in Chap. 4 for the analysis of static
multimodal texts for translation purposes, this chapter gives a demonstra-
tion of its application to a set of multimodal texts. The chapter starts with

https://doi.org/10.1007/978-3-319-69344-6_5
100 S. DICERTO
a description of the model’s method of application in Sect. 5.1, including

details on the selection of multimodal texts, the analytical procedures and
the coding system used. It then moves on to the actual analysis of multi-
modal texts, divided into three sections: expressive texts (Sect 5.2), infor-
mative texts (Sect. 5.3) and operative texts (Sect. 5.4). Conclusions are
drawn on the analysis of all three text types in Sect. 5.5.
5.1 Method of Application

The method of application of the model is described in this section, with
particular reference to the selection of the examples of multimodal texts to
be analysed (Sect. 5.1.1), the analytical procedures adopted (Sect. 5.1.2)
and the coding system used to ‘map’ the multimodal texts (Sect. 5.1.3).
5.1.1
Selection of Material
We begin the description of the selection process by outlining the selec-
tion criteria for the texts that comprise the sample used to explore the
application of the model. Firstly, as the model is intended to be applicable
to a broad range of texts, in terms of genres and purposes, the initial crite-
ria for selection are text types as outlined by Reiss (1977/1989). The
reason for choosing Reiss’s categorisation is that her taxonomy is based on
textual functions, and choosing to analyse texts from each of her three
categories—expressive, informative and operative—ensures a discussion of
STs with different goals, conveying messages whose content is organised
artistically, to communicate knowledge or information or to persuade
readers of a certain viewpoint or action. This will help readers gain an
understanding of what the multimodal organisation of different types of
text might look like, and how this organisation might be linked with the
function(s) the text performs.
However, texts are rarely of a ‘pure’ type, as Reiss herself explicitly
acknowledges: for example, a political leaflet will often convey information
while at the same time being an expression of the author’s point of view
on political matters and being ultimately intended to persuade the reader-
ship to vote for a certain party, therefore being informative, expressive and
operative at the same time with an arguable prevalence of this latter func-
tion. It is common for texts not to be entirely expressive, informative or
operative, and as the example shows, it is likely for them to be positioned
somewhere within the ‘triangle’ made of the three functions.
MULTIMODAL ST ANALYSIS: THE MODEL APPLIED 101
Multimodal texts, just like their monomodal counterparts described by

Reiss, are also intended for a variety of functions, showing features from
the expressive, informative and/or operative types (Reiss 2000: 164–165),
but unlike monomodal texts, they have an added layer of complexity, in
that they combine signs from different semiotic modes. In the present
work, multimodal texts are assigned to a category based on their domi-
nant function following Reiss’s categorisation.
‘Stand-alone’ and complete multimodal units including visual and ver-
bal elements, such as poster advertisements, were the preferred format in
the selection process. However, in order for the model to be tested on a
range of multimodal texts for which there is a realistic translation need, in
some cases the analysis had to be limited to a particular area of a text.
Indeed, the analysis of complete larger multimodal texts (e.g. manuals,
children’s books) would have been prohibitive in terms of length. Thus,
multimodal texts or text extracts of more or less comparable size were
selected; where the chosen texts are extracts, any co-text deemed relevant
for the purpose of analysis is provided for reference.
The entire selection consists of texts in English, which are analysed and
then translated into Italian. Extending the application of the model to
other translation pairs must remain an empirical exercise for future
research.
5.1.2
Analytical Procedure
At the beginning of each textual analysis, some contextual information on
the setting of the message and its author is provided as appropriate. The
model is then applied in order to investigate the multimodal organisation
of the text in each analytical dimension, before the results of the analysis
are reported in the relevant summary table, with a view to identifying
potentially problematic areas for the translation into Italian (these are
underlined and in bold in the summary table). Finally, I propose solutions
to the translation challenges identified, commenting on what changes
these entail in terms of the TT’s semantic representation and inferential
meanings if interpretive resemblance to the ST is to be maintained.
Each analysis follows the model outlined in Chap. 4 and the relevant
analytical table (Table 4.7), dealing with: the grouping of items (clusters),
the semantic representation of individual modes (verbal, visual) and of the
multimodal text (COSMOROE and logico-semantic relations) and any
102 S. DICERTO
inferential meanings (explicatures, implicatures). Given the number of

potential relationships in the semantic representation of each multimodal
text and the relevant visual-verbal relations, readers might find it useful to
refer to the relational scheme in Fig. 4.5 and, in general, to the definitions
provided in Sect. 4.2.
5.1.3 Coding System
As discussed in Chap. 4, the wealth of information related to the organisa-
tion of a multimodal text needs to be organised in a systematic way in
order to be accessible. For this reason, Table 4.7 integrates the model’s
theoretical structure with Baldry and Thibault’s multimodal transcription
system (2005). While the table divides and organises the three analytical
dimensions and their sub-sections, individual items such as textual
resources and their groupings or individual visual-verbal relations still
need to be clearly labelled for ease of identification using a consistent cod-
ing system. This section is perhaps more relevant for scholars who intend
to make active use of the model to analyse multimodal texts, and less so
for readers who want to gain an understanding of multimodal translation
issues—in the latter case, the reader might want to skip ahead to Sect. 5.2,
where the actual analysis of multimodal texts begins.
Starting with groupings of items, individual clusters within each text
are labelled with a code that identifies them and assigned a number (e.g.
CL1, CL2). Each cluster is shown as an image in the cluster column to
clearly identify its boundaries. Within each cluster, the labels applied to
each textual resource reflect the mode the element belongs to; different
elements from the same mode may be distinguished by number, but it is
important to note that the numbering system does not reflect priority or
relative status between elements. Thus, two verbal elements in the same
cluster/text will be labelled respectively ‘1VER’ and ‘2VER’, whereas two
visual elements will be ‘1VIS’ and ‘2VIS’.
In order to show the link with the textual resources involved, visual-
verbal relationships are named after them. Thus, a relationship between
1VER and 2VIS is labelled 1VER-2VIS and tagged according to the rele-
vant category/subcategory, both for COSMOROE and logico-semantic
relationships (e.g. 1VER-2VIS: complementarity—apposition; extension).
If an element enjoys a one-to-many relationship with two or more other ele-
ments, all elements need to be listed in the relationship label, which is tagged
according to the relevant relationship categories as above. The element
establishing the multiple relationship is listed first, and other e lements follow
after a dash, divided by a comma (e.g. 1VER-1VIS,2VIS). If the multiple
relationship is established by an element with all the others appearing in the
text, ‘ALL’ appears after the dash (e.g. 1VER-ALL) instead of a lengthy list.
As explicatures and implicatures are based on the semantic representa-
tion, these are also identified by the same codes as the other elements.
Thus, an explicature of the element 1VER is labelled ‘1VER’; an implica-
ture derived on the basis of 1VER and 1VIS is labelled ‘1VER-1VIS’. An
implicature triggered by all textual elements together is labelled ‘ALL’.
Whenever possible, a comprehensive account of the explicatures identified
in a text is transcribed in the relevant column. However, in the case of texts
including a wealth of explicatures, only the ones deemed directly relevant
to the analysis of the text for translation purposes are listed, that is, those
that are necessary to explain its interpretation in terms of optimal rele-
vance. The identification of explicatures has to be guided by the criterion
outlined by Sperber and Wilson, that is, ‘the right interpretation […] is the
one that is consistent with the principle of relevance’ (1995: 184). It is also
important to note that the number of implicatures associated with a certain
semantic representation is potentially countless and that the model is not
intended to address them all, nor is it or capable of doing so. For this rea-
son, only implicatures that are strongly communicated are included in the
analysis. Strongly communicated implicatures are defined by Sperber and
Wilson as those ‘which must actually be supplied if the interpretation [of
the text] is to be consistent with the principle of relevance’ (1995: 199).
Lastly, it is important to point out how both the column dedicated to
the semantic representation of the multimodal text and the one for con-
textual meanings are not divided by phase/cluster, as relationships trigger-
ing inferential meanings can exist between elements belonging to different
groupings, affecting the whole text. The application of the model’s coding
system is further clarified in the texts analysed in this chapter. The table
resulting from the model’s application acts as a ‘map’ of the multimodal
meaning, and is used to pinpoint potential translation issues as a starting
point for their resolution.
5.2 Expressive Texts

Expressive texts are defined as texts whose function is ‘[c]reative composi-
tion, an artistic shaping of the content’ (Reiss 1977/1989: 105).
Multimodal texts of this type are found in various genres, including,
104 S. DICERTO
among others, cartoons, comics, reviews, children’s books and other fic-
tion. Paraphrasing Lotmann (1972)1, Reiss provides the following expla-
nation of the communicative situation corresponding to expressive texts:
[T]he sender is in the foreground. The author of the text writes his topic
himself; he alone, following only his own creative will, decides on the means
of verbalization. He consciously exploits the expressive and associative pos-
sibilities of the language in order to communicate his thoughts in an artistic,
creative way. (Reiss 1977/1989: 105)
Thus, this text type accounts for a variety of communicative situations

sharing as the only necessary requirement that the corresponding text be
a creative expression of the author, who exploits language (and, in the case
of multimodal texts, textual resources in general) to communicate their
thoughts through expressive and associative meaning in an artistic fashion.
The requirement that an expressive text needs to have a close association
to the author and to be a form of personal expression means that expres-
sive multimodal texts can come in a variety of forms with very different
multimodal organisations, depending on the author’s preference. Any of
the three dimensions can be in the foreground: the author of an expressive
multimodal text may rely more heavily on a semiotic mode than another
(e.g. make more extensive use of language rather than images or vice
versa), perhaps using complex and detailed textual resources; the type of
visual-verbal relations they adopt may be key in delivering the overall com-
municative intention; in some instances, cultural knowledge might be the
central pivot of the text. The author’s idiolect, namely, the ‘ensemble of
linguistic features, belonging to a person, which are affected by geograph-
ical, educational and even physical factors including class, gender, race,
historical influences that contribute to shaping one’s ideological persona’
(Federici 2011: 7) is particularly relevant to these texts, as they are an
expression of the author’s creative mind and are intended to communicate
their thoughts artistically. Regardless of the author’s personal stylistic
choices, multimodal texts that share their authorship will be likely to share
common ground in terms of the textual resources deployed, the network
of visual-verbal relations created or the inferential meanings elicited. This
is particularly interesting from a multimodal translation viewpoint, as the
multimodal TT will be partly dependent on the translator’s ability to
1
Lotmann, J. (1972) Die Struktur literarischer Texte. München: Fink.
examine several texts in terms of their multimodal organisation to identify

the author’s ‘signature’, arguably becoming something of an expert in a
certain multimodal style rather than content.
Application of the model to expressive multimodal texts will then result
in an account of how this idiolect is ‘acted out’ by the author through the
selection of certain communicative strategies and schemes of interaction
between dimensions. Here I analyse a British political cartoon, a restau-
rant review, a page from a children’s book and a US political cartoon to
give a better sense of how multimodal textual organisation is strongly
linked with the author’s personal style.
5.2.1 The Backbone
This text is a political cartoon drawn by Steve Bell and published on the
website of The Guardian as a comment to David Cameron’s message to
the Conservative members of his party, published through Cameron’s
Twitter account in May 2013. The cartoon shows two clusters, the first
including the visual contribution and the top part of verbal content, the
second including the verbal content on the right-hand side (Fig. 5.1).
Fig. 5.1 Steve Bell (2013) The Backbone. The Guardian, 22 May 2013
106 S. DICERTO
In order to understand the cartoon in depth, it is important for the

readership to be aware that this was released after Cameron posted a tweet
entitled ‘My message to conservative members, the backbone of our
party’, in which the leader was addressing Conservative members with a
message about unity within the group, claimed to be necessary in order to
achieve the party’s purpose to act in the national interest ‘to clear up
Labour’s mess and make Britain stand tall again’ (Cameron: online).
In the first cluster, the verbal content above the image is quoting
Cameron; in order for the readership to identify the quotation, recipients
will need to be aware of Cameron’s tweet (implicated premise). Below
the verbal content, the caricature of David Cameron provides the essen-
tial agent for the quotation, projecting the locution on the then leader
of the Conservative Party. Identifying the essential agent for the quotation
requires readers to contribute some contextual knowledge: if the audience
cannot access the implicated premise about Cameron’s tweet, they will
have to be acquainted with cartoons by Steve Bell in order to clearly iden-
tify the pink condom as David Cameron’s caricature, as this is the way the
cartoonist usually depicts the ex-British Prime Minister (implicated prem-
ise). As stated by the cartoonist, this is because of the ‘smoothness of [his]
complexion […]. This guy has no pores, he’s like he’s made of rubber. So
in a sense that’s why I draw him that way’ (PRI: online).
In the visual contribution Cameron is depicted as the head of a dino-
saur skeleton, connected to the body by means of a piece of string, address-
ing the backbone he is tied to. By processing visual and verbal content
together, recipients can derive several inferential meanings (implicated
conclusions) from various aspects of an elaborate metaphor meant for
meaning extension: the conservative members mentioned in the verbal
content are likened to the fossilised backbone of an extinct creature found
in a museum; Cameron does not fit well together with the rest of the
body, awkwardly replacing the original head; the relationship between
Cameron and the conservative members is not solid, as it is likened to a
bow knot (which has the potential to come undone relatively easily).
The visual component of the cartoon also builds an essential exophora
with the verbal component, as the latter refers to a message and the former
shows it in detail (extension). Thanks to the essential exophora, readers
are lead to the conclusion that Cameron’s message to conservative mem-
bers is the offensive gesture shown by the image (explicature). Based on
their contextual knowledge of the real content of Cameron’s message and
the explicature, recipients can infer that Bell’s intention was to convey a
level of similarity between Cameron’s message and an offensive gesture

(implicated conclusion).
The incompatibility between Cameron’s real message and its visual rep-
resentation creates an incongruity. According to Attardo, incongruity and
the ability to resolve it are what generates humour; the word ‘resolution’
is here not meant as ‘dissolution’ of the incongruity, but rather as its ‘jus-
tification’ (1994: 143). In Attardo’s view, as readers, we are likely to find
a text funny if we can perceive, process and ‘resolve’ an incongruity in its
content, in this case understanding why Cameron’s real message and his
message as portrayed by Bell are incompatible.
Given the cartoon’s semantic representation, explicatures and implica-
tures, it can be concluded that Bell’s intention was to humorously criti-
cise Cameron’s message to the members of his party drawing a
parallelism with an offensive gesture.
The second cluster provides additional details that enhance the text,
while not being central to its interpretation. These are the name of its
author (non-essential agent, projection of the ideas expressed in the
text), its date of publication (adjunct meant for temporal enhancement)
and the address of the author’s website (adjunct, spatial enhancement).
It is interesting to note how Bell chose his criticism to be rather
‘oblique’, in that nothing of what is explicitly communicated (i.e. the ver-
bal content and the image ‘in themselves’) bears any evident criticism, and
recipients need to ‘dig deeper’ into implicit meaning to come to an under-
standing of how and why the text is relevant. Under the presumption of
optimal relevance, we need to assume that the author thought this was the
best way to communicate his intention. There might be a variety of rea-
sons for this being the case, among which social/genre-specific conven-
tions, the humorous aspect of the text requiring the establishment of an
incongruity and, indeed, Bell’s own idiolect, as discussed in Sect. 5.2.
In a translation scenario in which this text is to be reproduced in Italian
for publication in a newspaper, magazine or website dealing with political
topics, translation challenges would be mostly connected with the visual
content and the contextual knowledge it relates to, which in turn influ-
ence the readership’s ability to access inferential meanings (Table 5.1).
As previously mentioned, the identification of the visual content as
referring to David Cameron is only possible with some contextual knowl-
edge deriving either from familiarity with Cameron’s tweet or with how
Steve Bell generally draws caricatures of the ex-British Prime Minister.
This level of contextual knowledge is unlikely for an Italian audience, who
Table 5.1 Summary table, The Backbone
Sender’s meaning
Grouping of Semantic representation of Semantic representation of multimodal text Inferential meanings

items individual modes
Cluster Verbal content Visual content COSMOROE Logico-semantic relations Explicatures Implicatures
1CL 1VER: 1VIS: 1VER-1VIS: 1VER-1VIS: 1VER-1VIS: 1VIS:

‘My message Pink condom Complementarity—essential Projection—locution Cameron’s Pink condom = Cameron
to conservative addressing the agent 1VER-1VIS: message to the 1VER:
members, the backbone he is 1VER-1VIS: Extension Conservative The title quotes
backbone of tied to with an Complementarity—essential 1VER-1VIS: members is the Cameron’s tweet
our party’ offensive gesture exophora Extension offensive 1VER-1VIS:
2VER: 1VER-1VIS: 2VER-ALL: gesture in the Conservative members are
2CL © Steve Bell Equivalence—metaphor Projection—idea image the backbone of an extinct
2013 – 3508 – 2VER-ALL: 2VER-ALL: creature; Cameron’s
12.5.13 – Complementarity—non- Enhancement—temporal connection with them is
belltoons.co.uk essential agent 2VER-ALL: not solid; Cameron does
2VER-ALL: Enhancement—spatial not fit well with the
Complementarity—adjunct Conservative members
2VER-ALL: 1VER-1VIS:
Complementarity—adjunct Cameron’s message to the
conservative members is
similar to an offensive
gesture
might need further support to access inferential meanings. Hence, the

implicated premise triggered by the image needs to be compensated for
more explicitly, for example, by providing indications about the identity of
the speaker in the verbal content (e.g. ‘Il messaggio di Cameron ai membri
conservatori, la colonna vertebrale del suo partito’, ‘Cameron’s message to
Conservative members, the backbone of his party’). This explicitation
mentions the existence of a message sent by Cameron to members of his
party and makes it possible for recipients to identify the only visual item
that can be regarded as sending a message to somebody (thanks to the
gesture) as Cameron. In turn, this supports the essential exophora and it
allows the identification of the explicatures, which would otherwise be
lost.
The visual content also poses another challenge for translation: the ges-
ture made by Cameron has no meaning for Italians and hence cannot be
perceived as offensive, weakening the basis for the reinterpretation of the
original message as an insult. If possible, a balloon containing verbal
insults could be inserted to compensate for the visual meaning that does
not travel well across cultural borders. Alternatively, a solution involving a
slight modification of the visual content could be to retouch the image of
Cameron’s hand so that only the middle finger is shown as lifted. This
would provide a cultural equivalent an Italian readership can identify as an
insulting gesture, rendering the inclusion of additional verbal content
unnecessary.
5.2.2 Latymer
This text is an extract from the Food & Drink section of the Surrey Life
website, dealing with a review of a restaurant in the Surrey area. Restaurant
reviews are a mixed text type, as they perform an informative function
(providing factual information on a restaurant business), an operative
function (advising the audience for/against trying a certain restaurant)
but also, and arguably more centrally, one of expression (as the author is
clearly in the foreground, and the main purpose is expressing their own
subjective opinion regarding their dining experience, regardless of whether
this causes a change in the audience’s behaviour). This is only an extract
from the webpage in which the review appears, focusing specifically on the
first half of the restaurant review and disregarding links to other pages,
advertisements, banners or other content. The choice to include only half
of the review reflects a focus on the multimodal part of the text, as the rest
110 S. DICERTO
Fig. 5.2 Latymer restaurant review, Surrey Life
of the text is made of verbal content only. The text is made up of three
clusters, one on the top left (title, date and social network icons), one on
the top right (image and legend on the right) and one at the bottom
(main body of the article) (Fig. 5.2).
The title in the top left cluster behaves as a defining apposition for the
rest of the text (meaning extension), in that it indicates explicitly the
nature of the text itself (restaurant review) and its main subject. Its seman-
tic representation is not complete, as the title needs several adjustments to
become a full sentence, including the addition of several words. Its com-
pletion produces an explicature (‘Michael Wignall works at The Latymer
at Pennyhill Park, located in Bagshot GU19 5EU – this is a restaurant
review for the Latymer’). The date provides additional information
(adjunct) meant to temporally enhance the text. The icons in the same
cluster provide extra information (adjunct) on how many times the article
has been shared on social networks, providing information useful to frame

the content of the article in terms of its distribution in other websites
(spatial enhancement).
The image of artfully arranged food on the top right is not referenced
directly anywhere in the text (independence-symbiosis), and can there-
fore be assumed to provide a visual preview of the topic (elaboration).
Within the same cluster, the legend connected to the image appears, again,
to be in a symbiotic relationship with the image, providing information
that is only thematically related to the content of the visual contribution
(independence-symbiosis, meaning elaboration).
The legend also appeals to the audience’s encyclopaedic knowledge of
the Michelin guide and the assignment of stars (implicated premise).
The intention of the author is stated explicitly in the title of the article
(or, more precisely, in its developed semantic representation): to provide
a restaurant review based on the author’s personal experience at The
Latymer (Table 5.2).
Translating this restaurant review for the Food & Drink section of an
Italian website dealing with foreign restaurants would mean that the
translator would have to consider carefully the relationship between the
image in the second cluster and the rest of the article; also, its legend is
not directly related to the content of the image itself. While this is not
necessarily a language-specific issue, but rather one of individual publish-
ing style, depending on the TT outlet and their publication policy, the
Italian translation may require a tighter bond between the article and the
image. Indeed, digital web designer Marko Prljic argues that it is good
practice for web articles to include ‘useful’ photos as far as possible, that
is, pictures that help comprehension, teach content or show procedures,
preferring them to pictures that are purely decorative (Prljic 2014).
Based on best practice recommendations such as Prljic’s, a different out-
let might therefore require visual content linking directly with the topic
of the review (e.g. a picture of the restaurant or of its owner); changes in
the visual material might in turn have a direct influence on the legend
found in the same cluster. While addressing these issues may not be part
of the translator’s responsibility in some cases, she may anyway propose
amendments to the text, and in particular to the content of the second
cluster, in order to improve the potential level of relevance perceived by
the target audience (TA). The cluster may be entirely omitted, or its
legend modified to make it more relevant to the image (e.g. ‘Tagliata di
manzo al balsamico’, ‘Sliced beef with balsamic vinegar’); another alter-
112
Table 5.2 Summary table, Latymer restaurant review

Sender’s meaning

S. DICERTO
Cluster Verbal content Visual COSMOROE Logico-semantic relations Explicatures Implicatures

content
1CL 1VER: 1VIS: 1VER-ALL: 1VER-ALL: 1VER: 3VER:

Michael Wignall Social Complementarity—defining Extension Michael Wignall Generic
at The Latymer at network apposition 2VER-ALL: works at The knowledge on
Pennyhill Park, icons 2VER-ALL: Enhancement—temporal Latymer at the Michelin
Bagshot GU19 Complementarity—adjunct 1VIS-ALL: Pennyhill Park, guide and the
5EU—restaurant 1VIS-ALL: Enhancement—spatial located in assignment of
review Complementarity—adjunct 2VIS-ALL: Bagshot GU19 stars
2VER: 2VIS-ALL: Elaboration 5EU—this is a
Tuesday, May 21, Independence-symbiosis 3VER-2VIS: restaurant
2013 12:12 PM 3VER-2VIS: Elaboration review for the
2CL 3VER: 2VIS: Independence-symbiosis Latymer
The Latymer is image of
Surrey’s only two food
Michelin star
restaurant
3CL 4VER: –
[main body of
text]
native would be to replace the image itself with something better suited
to the legend (e.g. a picture of the restaurant) in order to tighten the
relationship between visual and verbal content in the cluster. While this
might seem at first sight an unnecessary change to the original ST visual-
verbal relations, these amendments do not affect the communicative
intention conveyed by the text, which is clearly still a restaurant review;
however, as Prljic states, people scan screens based on past experience
and expectations, and using the ‘wrong’ photos can degrade user experi-
ence (Prljic 2014). In this context, a ‘weak’ relationship such as one of
independence between visual and verbal content may not fit the expecta-
tions of the outlet’s TA, who may be used to stronger links between
images and their legends. In this case, the translator may consider such
changes in order not to distract the TA from the communicative inten-
tion of the text with textual associations hard for the audience to recon-
struct under the principle of optimal relevance. Changes of this type may
be considered in an attempt to comply with cultural/corporate conven-
tions, reducing the TA’s processing effort where this is not required to
access fundamental implicated meanings.
5.2.3 The Cat in the Hat

This text extract is the first page of the children’s book The Cat in the Hat
by Dr Seuss. This excerpt shows one cluster only, which consists of a chunk
of verbal content, in the form of rhyming lines, and a visual element, in the
form of an illustration (Fig. 5.3).
The verbal content in the page is a poem in four lines, and these require
reference assignment for their deictics in order to be understood. The
references to ‘it’ and ‘we’ are provided by the visual content (i.e. respec-
tively, the weather and the children), which the reader retrieves through
an exophora, essential in order to understand the meaning of the text.
This relationship helps to complete the verbal semantic representation,
projecting the locution in the verbal content on the children in the visual
content, and resulting in an explicature (The sun did not shine. The
weather was too wet to play. So we, the children, sat in the house all the
cold, cold, wet day). These are not the only possible completions of the
verbal logical form: further developments of the verbal semantic represen-
tation might be that the sun did not shine on that day, the weather was
too wet to play outdoors and the children sat in (presumably) their own
house. These completions make the verbal meaning more explicit, but
114 S. DICERTO
Fig. 5.3 Dr Seuss (2004) The Cat in the Hat, p. 1. London: Harper Collins
they are not strictly necessary for the understanding of the text.2 Therefore,
these and all other potential elements of completion are excluded from
2
This type of completion of the verbal logical form is called a ‘generalised conversational
implicature’ in the Gricean theory, whose claim is that these assumptions are in a sense con-
text-independent, in that readers will apply them by default unless these are prevented from
arising by textual/contextual factors that negate them. The status and nature of such ‘default
interpretations’ have been a longstanding battleground in pragmatics, and currently the only
consensus achieved in the discipline is that they need to be differentiated from other types of
pragmatic contributions in that, contrary to the case of inferences derived from specific con-
texts, these assumptions go through unless a special context is present (Horn 2004: 4–5).
this analysis, which focuses on the explicatures without which the text
could not be processed.
Relations of equivalence can be found between the visual and the
verbal content: ‘the house’ is represented both visually and verbally
(token-token), as well as the ‘cold, cold, wet day’, of which the visual
content represents a single moment only (metonymy). No significant
new elements are added, thanks to these relations, which only state the
same content twice (elaboration). The absence of strongly communi-
cated implicatures means that the text communicates largely on an
explicit level: thus, it can be concluded that the sender’s intention is to
communicate the semantic representation of the text. Texts (utter-
ances) of this type are defined by Sperber and Wilson as ‘ordinary asser-
tions’ (1995: 183).
Relationships of equivalence for purposes of elaboration are domi-
nant in the extract. This aspect of the text seems compatible with the
type of audience it is addressed to. A very young readership is likely to
have limited access to inferential meanings, both given their contained
knowledge of the world and because they are in the process of acquir-
ing basic literacy skills. Hence, texts that reiterate content visually and
verbally with the aid of basic visual-verbal interaction seem an appro-
priate choice for their level of proficiency in text interpretation
(Table 5.3).
Difficulties in reproducing the first page of The Cat in the Hat for an
Italian TA would lie mostly in the reproduction of the verbal content,
which needs to satisfy several requisites: firstly, as the original verbal con-
tent is expressed partly in rhyme, the TT should rhyme if possible in order
to mimic this characteristic of its verbal semantic representation and share
this formal feature with its ST; secondly, the verbal content blends with
the visual contribution by making reference to some of its features—as this
creates visual-verbal relationships, these need to be maintained in order for
the overall sense of the text to be preserved. Hence, an Italian TT should
rhyme and mention the same elements of the visual contribution as the
ST. The rhyming requirement should perhaps be a lower-order priority
than the mention of certain elements of the image: indeed, although the
absence of rhyme would mean a change to one of the features of the verbal
logical form, this change does not influence the overall multimodal seman-
tic representation or the overall interpretation of the text; on the other
hand, not mentioning some of the visual features in the verbal content
would weaken the relations among textual resources, changing the way
the text is organised.
116
Table 5.3 Summary table, The Cat in the Hat

S. DICERTO
Sender’s meaning

Cluster Verbal content Visual content COSMOROE Logico-semantic Explicatures Implicatures

relations
1CL 1VER: 1VIS: 1VIS-1VER: 1VIS-1VER: 1VER-1VIS: –

The sun did not House with Complementarity—essential Projection—locution The sun did
shine./It was two children exophora 1VIS-1VER: not shine. The
too wet to inside during 1VIS-1VER: Elaboration weather was
play./So we sat a rainy day. Equivalence—token-token 1VIS-1VER: too wet to
in the house/all The wind is 1VIS-1VER: Elaboration play. So we,
that cold, cold, shaking the Equivalence—metonymy the children,
wet day tree outside the sat in the
the house house all the
cold, cold, wet
day
An Italian translation, then, could look like the following:
Il sole non c’era The sun was not there

Giocar non si poteva Playing we could not
Restammo a guardare We stayed to watch
La fredda pioggia che scendeva The cold rain that came down
This solution mentions the house, and the pronoun ‘we’ is embedded in the verb
‘restammo’. The ‘cold, cold, wet day’ is not mentioned, but the cold rain is, com-
pensating for the lost relation of metonymical equivalence with a similar relation
of metonymical equivalence (as the rain shown in the picture is only part of the
total rain) and maintaining the multimodal semantic representation substantially
unchanged.
5.2.4 Bush Liebury
This cartoon by author Pat Bagley was published in the Salt Lake Tribune
on 24 April 2013 on the occasion of the then recent inauguration of the
George W. Bush Library. The text shows two clusters—one is formed by
the visual contribution, in which part of the verbal content is embedded;
the other is formed by the verbal contribution in the top left corner
(Fig. 5.4).
The top left of the panel, within the first cluster, shows a website
address: this provides information related to the text (adjunct) in order to
spatially enhance it with details of the location where it can be found.
Below the internet address is the author’s signature. This relates to the rest
of the text by again providing details relevant but not necessary for its
understanding (adjunct); their role is to project the ideas expressed in the
panel onto the author himself.
The rest of the verbal content of this text is embedded in the second
cluster, in a complementary relation with the image of the building it is
written upon, acting as its defining apposition: the building is defined as
the George W. Bush ‘lie-bury’. The word ‘lie-bury’ (invented joining the
words ‘lie’ and ‘bury’) can be seen as a paronym of the word ‘library’,
given that the phonemic representations of ‘lie-bury’ and ‘library’ are sim-
ilar but not identical (see Attardo 1994: 110–111 for a discussion of the
use of paronyms in relation to humour). The purpose of this complemen-
tary relationship is meaning extension: the use of the paronym suggests
that, instead of housing books as a library normally would, the building
118 S. DICERTO
Fig. 5.4 Pat Bagley (2013) Bush Library. Salt Lake Tribune, 24 April 2013
dedicated to George W. Bush is the place where lies are buried (explica-
ture). Given the role of libraries as spaces dedicated to study and the
acquisition of new knowledge (implicated premise), the defining apposi-
tion creates an incongruity (i.e. a contradiction) between the text and the
recipient’s knowledge of the world, showing the author’s humorous
intention in distorting reality. The visual content also shows a caricature of
ex-president George W. Bush that depicts him making a gesture of approval
and satisfaction. This also generates an explicature (‘George W. Bush is
happy with the lie-bury building’).
Knowledge of the political context and the event of the inauguration
of the library are important in order to understand this cartoon.
Recipients need to be aware that the inauguration of the George W. Bush
Library took place (implicated premise) and that the former US presi-
dent had often been criticised for the policies adopted by his govern-
ment and accused of lying to US citizens on many occasions (implicated
premise).
Given these two implicated premises, recipients can get to the impli-
cated conclusion conveyed by the cartoon: the text points out that the
dedication of a ‘lie-bury’ to George W. Bush instead of a library would
have been more appropriate and would have made him happy. Based on
the implicated meanings conveyed along with the incongruity generated
for purposes of humour, it can be concluded that an interpretation of the
author’s intention consistent with the principle of relevance is that he
meant to humorously criticise the event of the dedication, putting
forward the view that a hiding place for lies would have been more
appropriate and appreciated than a library (Table 5.4).
In a translation scenario in which this text is to be reproduced in Italian
for publication in a newspaper, magazine or website dealing with political
topics, the translator would have to face problems in replicating the paro-
nyms. Indeed, this part of the verbal content is based on a wordplay
between the words ‘library’ and ‘lie-bury’ which may be difficult to repro-
duce in another language. At the same time, maintaining this pun is fun-
damental in order to preserve the defining apposition that identifies the
building and its content and the related meaning extension. Indeed, these
are instrumental in triggering the retrieval of the related implicated prem-
ise necessary to support the implicated conclusion.
The word for ‘library’ in Italian is ‘biblioteca’, and the word for ‘lie’ is
‘bugia’. Both words start with the same letter, making it possible for the
translator to merge the words ‘biblioteca’ and ‘bugia’ into ‘bugioteca’,
which indicates a place where lies are collected. This solution maintains
the pun found in the ST with one variation, since the concept of burying
lies is lost in the TT. This loss of meaning could be compensated for by
including additional verbal content that would give to the TT a sense that
the content of the ‘bugioteca’ is inaccessible (e.g. ‘bugioteca privata’, ‘pri-
vate lie-brary’), or with the inclusion of additional visual elements to the
same effect (e.g. a padlock on the door of the building). However, the
sounds of ‘bugioteca’ and ‘biblioteca’ are rather different, unlike in the ST,
in which ‘lie-bury’ is phonetically very close to ‘library’ (especially for
particular English pronunciations, like Bush’s own). Resemblance of
sound with a different spelling is very difficult to achieve in languages like
Italian, which are highly phonetic. Nevertheless, the solution proposed
achieves a good level of interpretive resemblance, maintaining intact all
the visual-verbal relations, triggering the retrieval of the related implicated
premise, and leading the readership towards the same communicative
intention.
Table 5.4 Summary table, Bush Liebury
Sender’s meaning
Grouping of Semantic representation of individual Semantic representation of multimodal text Inferential meanings
items modes

relations
1VER: – 1VER-ALL: 1VER-ALL: 3VER-1VIS: 3VER-1VIS:

www.caglecartoons.com Complementarity—adjunct Enhancement—spatial The building Libraries are places
1CL 2VER: 2VER-ALL: 2VER-ALL: dedicated to George dedicated to the
[signature of author] Complementarity—adjunct Projection —idea W. Bush is the place acquisition of
3VER: 1VIS: 3VER-1VIS: 3VER-1VIS: where lies are buried knowledge, not to
George W. Bush Library building Complementarity— Extension 2VIS-3VER,1VIS: burying lies
lie-bury 2VIS: defining apposition George W. Bush is 3VER-1VIS-2VIS:
George W. Bush happy with the George W. Bush was
2CL lie-bury building dedicated a library in
making gestures
of approval April 2013
3VER:
Bush was accused
several times of
lying to the
American citizens
3VER-1VIS-2VIS:
The dedication of a
‘lie-bury’ instead of
a library would
have been more
appropriate and
appreciated by
George W. Bush
5.3 Informative Texts

The function performed by informative texts is the ‘plain communication
of facts’ (Reiss 1977/1989: 105). Texts of this type can appear in many
forms and environments; an example of an informative text would be a
scientific report (Snell-Hornby 1997: 278), whose function is to provide
the recipient with information on a specific scientific topic or subject.
The communicative situation associated with informative texts is
described by Reiss as follows:
Here the topic itself is in the foreground of the communicative intention

and determines the choice of verbalization. In the interest of merely trans-
mitting information, the dominant form of language here is functional
language. The text is structured primarily on the semantic-syntactic level
[…]. If an author of such a text borrows aspects of a literary style, this
‘expressive’ feature is nevertheless only a secondary one […]. (Reiss
1977/1989: 106)
Consistent with the definition provided above, examples of informative

multimodal texts include, among others, various types of articles (e.g. for
newspapers, journals and magazines), textbooks and some types of web
pages.
Following on from Reiss’s explanation, informative multimodal texts,
whose purpose is primarily to transmit information, are likely to be organ-
ised more on the explicit (semantic-syntactic) than on the implicit level,
just like their monomodal counterparts; therefore, the visual and verbal
textual resources used (or their interaction) may be of a rather complex
nature, or be organised to communicate a wealth of detail. At the same
time, a greater level of explicitness is likely to determine a diminished use
of inferential meanings and therefore a less likely occurrence of implicatur
es/explicatures.
Translation challenges in informative multimodal texts are therefore
likely to be related to the textual resources because of their complexity
(e.g. problems of specialised terminology or specialised drawing stan-
dards) and because of the way these interact with each other. Reduced
reliance on inferentiality means that the case of explicatures/implicatures
being a crucial obstacle to translation is considerably less likely for an
informative multimodal text than for other text types; problems with lead-
ing the audience towards a certain implicature may therefore be less fre-
quent for informative multimodal texts in general, although informative
122 S. DICERTO
multimodal texts with an expressive/operative component that ‘borrow’

expressive features from other styles may still present issues concerning
inferential meanings.
In this section I analyse a page from a textbook, an extract from a web-
site, one from Wikipedia and one from an annual report.
5.3.1 Anatomy for Art Students

This illustration is a page excerpt from A Handbook of Anatomy for Art
Students by Thomson (1964), showing the back of a male human body in
detail. This excerpt of the textbook shows one cluster only (Fig. 5.5).
In this text verbal tags apply to different parts of the image: the relevant
tags extend the meaning of the image, providing the recipient with infor-
mation fundamental for the identification of each area (complementar-
ity—defining apposition). Only the names of the muscles are mentioned
in each tag; their relationship with the relevant parts of the image gener-
ates explicatures (e.g. ‘[the part of the image this tag is connected to is
the] spine of VII cervical vertebra’).
The author’s intention in putting together this text is an informative
one, and this text can be classified as an ordinary assertion, as no implica-
tures are strongly communicated. The purpose of the text is to provide
specialised information on the structure of the male human back
from a muscular point of view.
Assuming that a textbook of anatomy for art students may well be
translated into a foreign language for publication in a similar format
(book), this text extract would require translation as well (Table 5.5).
Given that all verbal elements relate to their visual counterparts aiming
at meaning extension, providing information fundamental for the
identification of all the parts of the image, it is essential for the translation
to be terminologically accurate. Other than the amount of research
required to accurately translate each anatomical term, one of the transla-
tion challenges that can be identified in this text is that of spatial require-
ments. The Italian terms may well not fit in the same space as the original
tag; even in the ST, some tags had to be moved to the top of the image
and numbered, given the impossibility of fitting them in the space sur-
rounding the image. Differences in word length may well determine the
necessity to move around and/or renumber the tags in the illustration.
While this would not have serious consequences for the internal coherence
of the illustration itself, any reference the rest of the book makes to the
Fig. 5.5 Arthur Thomson (1964), A Handbook of Anatomy for Art Students, p.34
124
S. DICERTO
Table 5.5 Summary table, Anatomy for Art Students

Sender’s meaning
Grouping of Semantic representation of individual Semantic representation of multimodal Inferential meanings

items modes text

relations
1CL 1VER: 1VIS: 1VER-1VIS: 1VER-1VIS: 1VER-1VIS: –

[tags] image of the back of a Complementarity— Extension The part of the
male human body, defining apposition image this tag is
sketching the shape of connected to is
the various muscles the [content of
tags]
numbered tags will have to change accordingly in order to provide mean-

ingful information to the recipient. Then, the translation challenge is to
ensure that the TT uses the correct specialist terminology and that the
terminology used is consistently maintained throughout the whole text,
taking into account any changes applied to the illustration in terms of
position and number of the tags for correct referencing. Such changes do
not affect the type of relationship existing between visual and verbal
resources. Therefore, the TT will have a multimodal semantic representa-
tion identical to the ST, with changes being applied only to the verbal
content and, possibly, its position with respect to the visual contribution.
Given the potential knock-on effect triggered by changes in tag number-
ing, the translator would be wise to ensure that such changes to the verbal
content of the excerpt are kept to a minimum to promote consistency
throughout the handbook.
5.3.2 Climate Concepts
This text extract, entitled ‘Climate Concepts’, is part of the Student’s
Guide to Global Climate Change provided by the United States
Environmental Protection Agency (EPA).3 The extract is the main portion
of the relevant webpage included in the EPA website. The text shows two
clusters. The first cluster corresponds to the left-hand column, entitled
‘Climate Concepts’; the second cluster is the one on the right-hand side,
entitled ‘Weather Versus Climate’ (Fig. 5.6).
The title ‘Climate Concepts’ summarises the content of the cluster it
belongs to, behaving as a defining apposition to the rest of the information
provided, aimed at meaning extension. The verbal content continues with
a quotation from Mark Twain, which is thematically related to the rest of
the visual and verbal content in the cluster (‘weather’, ‘climate’); the quota-
tion does not have to be processed in conjunction with the other textual
resources (independence-symbiosis), and it elaborates on the rest of the
content by restating it more concisely. The main body of verbal content and
the diagram underneath are complementary in the development of the
text, given that the first sets out the premises of the argument (definition of
climate and climate change, exemplification of its effects) and the second
completes it, adding more detailed information on how climate-related
phenomena interact with each other (adjunct, meaning extension).
3
https://archive.epa.gov/climatechange/kids/basics/concepts.html.
126
S. DICERTO
Fig. 5.6 EPA—Climate Concepts, Student Guide to Global Climate Change

The diagram itself is formed of visual and verbal elements, and could be
considered a sub-cluster. The diagram has a title, which behaves as a defin-
ing apposition (meaning extension) in relation to the other elements of
the diagram itself, like the cluster title. Each oval figure represents the
phenomenon it relates to, showing only its place in the network of rela-
tions (metonymy—meaning extension), while the arrows establish con-
nections between the various elements of the diagram, suggesting essential
exophoras that extend the meaning of the verbal contribution. Finally,
the diagram also has a relevant legend, with additional information on the
nature of the diagram itself (adjunct—meaning extension). The superim-
position of the verbal content on the oval shapes generates explicatures
(e.g. ‘(this oval represents) stronger storms’).
The second cluster also has a title, also showing the previously dis-
cussed relation of defining apposition (meaning extension) with the rest
of the verbal content in the cluster. A relation of symbiosis can also be
observed in this cluster: the image at the top of the cluster may be an
exemplification (elaboration) of the concept of weather discussed in the
verbal content, but it does not appear to provide any further information,
being only thematically related to it.
In general, the intention behind the communication of this text is to
provide information about the concepts of weather and climate, com-
paring them and explaining the difference. An example of a part of the
text in which this intention can be detected clearly is the last sentence of
the ‘weather versus climate’ cluster, which provides a practical tip to
remember the difference between the two concepts (Table 5.6).
Assuming a translation scenario in which the Student’s Guide to Global
Climate Change was to be translated for an Italian student audience to be
released in a very similar outlet to the ST, translation problems would be
mostly found in the reproduction of the first cluster, depending on the
outlet’s style guidelines. The content of the caption of the diagram would
be rather unusual in Italian if it was translated literally, since this seems
more like a statement that would fit in the main body of the text rather
than a caption. Also, some outlets require authors to make explicit refer-
ence to diagrams or illustrations in the main body of the verbal content in
their style guides to help maintain textual cohesion. The two problems
could be solved simultaneously by moving the content of the caption to
the end of the main body of the verbal content. In this way, the sentence
would introduce the diagram, and the diagram itself would be left without
a caption. This would obviously affect the organisation of the text: the
Table 5.6 Summary table, Climate Concepts
128
Sender’s meaning

S. DICERTO
Cluster Verbal content Visual content COSMOROE Logico-semantic relations Explicatures Implicatures
1CL 1VER: 1VIS: 1VER-ALL (1CL): 1VER-ALL (1CL): 5VER- –

Climate [oval shapes] Complementarity—defining Extension 1VIS: This
concepts 2VIS: apposition 2VER-ALL (1CL): shape
2VER: [arrows] 2VER-ALL (1CL): Elaboration represents
[quotation] Independence-symbiosis 3VER-4VER, 5VER, [content of
3VER: 3VER-4VER, 5VER, 1VIS, 2VIS: tags]
[main body] 1VIS, 2VIS: Extension
4VER: Complementarity—adjunct 4VER-5VER, 6VER,
Climate 4VER-5VER, 6VER, 1VIS, 2VIS:
connections 1VIS, 2VIS: Extension
5VER: Complementarity—defining 5VER-1VIS:
[tags] apposition Extension
6VER: 5VER-1VIS: 2VIS-1VIS, 5VER:
[legend] Equivalence—metonymy Extension
2CL 7VER: 3VIS: 2VIS-1VIS, 5VER: 6VER-4VER, 5VER,
Weather versus Image of city Complementarity—essential 1VIS, 2VIS:
climate with snow exophora Extension
8VER: 6VER-4VER, 5VER, 7VER-8VER, 3VIS:
[main body of 1VIS, 2VIS: Extension
cluster] Complementarity—adjunct 8VER-3VIS:
7VER-8VER, 3VIS: Elaboration
Complementarity—defining
apposition
8VER-3VIS:
Independence—symbiosis
relationship of complementarity between the main body of the text and

the diagram (adjunct) would be modified into an essential exophora, given
that the verbal mode would make a reference to a diagram, which would
be ‘resolved’ by the visual mode. This would make the visual-verbal inter-
action tighter, by transforming a non-essential relationship into an essen-
tial one. The change would not impact negatively on the text, since the
information provided by the caption would simply change location and
make the relationship between verbal content and diagram more easily
accessible to the audience.
This text is a good example of how textual resources are deployed dif-
ferently in different cultures, and of how translation issues can arise in
some cases purely because a rearrangement of visual/verbal content is
advisable in order to comply with the standards expected by the TA. While
the issue of how textual resources are deployed and organised in a text is
certainly relevant to ‘monomodal’ translation, it is all the more important
for translations involving more than one type of textual resource, as it
affects the multimodal semantic representation of the text and changes the
way the resources relate to each other, potentially confusing the TA and
making them less likely to process the text in its entirety.
5.3.3 Yalta Conference
This text is an excerpt from the Wikipedia entry on the 1945 Yalta
Conference—it consists of the very top of the webpage including the entry
title, a brief general description of the entry, a table of contents and an
image with a legend. The text shows three clusters: the first—on the left-
hand side—contains mostly verbal content (also including the coordinates
provided at the top), the second consists of the image and its caption on
the right-hand side, and the third displays the table of contents. The analy-
sis does not include remarks on the latter, considering that the table of
contents relates mostly to the following parts of the Wikipedia entry
(Fig. 5.7).
The title of the entry relates to the other elements of the webpage
being complementary to them (defining apposition, meaning exten-
sion). The tag ‘from Wikipedia, the free encyclopaedia’ provides informa-
tion on the source of the text (agent) in order to extend the meaning of
all the other elements; however, given that the entry is included in the
website of Wikipedia, this information can be derived by the audience
from other sources, and is anyway not essential for recipients to under-
130
S. DICERTO
Fig. 5.7 Yalta Conference entry, from Wikipedia

stand the rest of the content of the entry. The main body of the text (part
of cluster 1) provides general information on the Yalta Conference, sum-
marising its circumstances and its purpose. In the second cluster, on the
right-hand side, the image represents the Yalta Conference through a pic-
ture taken during the conference itself; this captures a single moment of
the conference and hence is in a metonymical relation (equivalence) with
the title and the main body of the text, aiming to visually enhance the
spatial aspects of the content described verbally. A similar role is played by
the coordinates provided at the top of the image, which spatially enhance
the other textual resources by providing further information (adjunct) on
the location of the conference. In this second cluster, the caption at the
bottom of the image provides additional information (adjunct) that ver-
bally describes the visual content, in order to extend its meaning. It is
important to point out that the recipient is here expected to partly com-
plete the information provided by the caption, which is not exhaustive; the
names of the main personalities included in the picture are provided with
no reference to the order in which they appear, leaving the recipient to
complete the semantic representation (explicature) ‘reading’ the image
left to right (i.e. the standard reading order for English speakers), if they
cannot identify independently the faces of the personalities described by
their image only.
By the means described, the excerpt intends to provide a general
account of the Yalta Conference, communicating to the recipient relevant
information through visual and verbal resources (Table 5.7).
Given that Wikipedia presents itself as a multilingual online encyclopae-
dia, its content is often translated from and into different languages, for
internal as well as external use—this particular entry might be realistically
translated for a website about history. The text does not appear to show
particular translation challenges related to its multimodal semantic repre-
sentation. Its partial reliance on the retrieval of an explicature, however,
needs to be considered by the translator: if the TA cannot be expected to
be acquainted with the faces of the important personalities in the picture,
the content that is left implicit in the ST may need to be made explicit in
the TT, and recipients may have to be directed more closely in order to be
able to identify the state representatives referenced in the legend under-
neath the picture. Nevertheless, a general Italian TA can be reasonably
expected to possess the required knowledge, and does not need additional
support.
132
Table 5.7 Summary table, Yalta Conference

Sender’s meaning

S. DICERTO
Cluster Verbal content Visual COSMOROE Logico-semantic Explicatures Implicatures

content relations
1CL 1VER: – 1VER-ALL: 1VER-ALL: 5VER-1VIS: –

Yalta Conference Complementarity—defining Extension Order of
2VER: apposition 2VER-ALL: personalities
From Wikipedia, 2VER-ALL: Extension in the picture
the free Complementarity—non- 1VIS-1VER, 3VER: is from left to
encyclopaedia essential agent Enhancement— right
3VER: 1VIS-1VER, 3VER: temporal and spatial
[main body of text] Equivalence—metonymy 4VER-1VER,
4VER: 4VER-1VER, 3VER, 3VER, 5VER, 1VIS:
[coordinates] 5VER, 1VIS: Enhancement—spatial
2CL 5VER: 1VIS: Complementarity—adjunct 5VER-1VIS:
The “Big three” Picture 5VER-1VIS: Extension
at the Yalta related to Complementarity—adjunct
Conference: the Yalta
Winston Conference
Churchill,
Franklin
D. Roosevelt and
Joseph Stalin. […]
On the other hand, the legend assigned to the image is rather long.
Cultural or editorial norms governing the expected length of a legend may
require the possible sacrifice of parts of the information. For example, the
Italian Wikipedia entry on the Yalta Conference (which may or may not be
based on the text analysed here) shows a caption limited to the names of
the main personalities and their location, without providing any informa-
tion on the other (perhaps less known) historical figures or their order in
the picture. This may be due to different cultural expectations. If a transla-
tor had used the English version as their ST, in order to maintain all the
information provided by the original they would have had to move parts
of the legend to the other cluster, perhaps merging them with the main
body of the text. This would have resulted in a slightly different multi-
modal organisation, in which exophoras are created between clusters. The
quantity and type of information would not have changed substantially,
with the informative intention being still clear and with the TT presenting
a relocation of some textual resources.
5.3.4
Save the Children
This text is an extract from the 2012 annual report by Save the Children,
an international non-governmental organisation aiming to improve the
life conditions of children around the world. Reports by charitable organ-
isations usually have a twofold function, that is, to inform people of a
certain reality and also to obtain their support (financial or otherwise) in
changing things for the better. In the current case, the whole annual
report includes several articles related to charity work for children. The
persuasive function of the report is mostly evident in the opening letters
from the association’s president and chair at the beginning of the report,
in which more or less direct appeals are made to obtain the readership’s
contribution to the cause (e.g. ‘As we strive to accelerate the progress we
have made and build a world where no child dies needlessly, we will call on
your invaluable support once again’, ‘If you are already with us, thank you
for your help and support. If not, then I would encourage you to join us’).
The articles included in the report, on the other hand, reference a wealth
of factual information on a variety of issues, and their objective is to inform
the audience of individual issues and of the progress that has been made in
tackling them, without making use of clearly persuasive elements.
Therefore, although the report in its entirety is certainly a text with a
strong persuasive function, the article of which this text is an extract can
be considered as predominantly informative (Fig. 5.8).
134 S. DICERTO
Fig. 5.8 Save the Children

The text in Fig 5.8 shows three clusters: the title at the top with the
small image of the United Kingdom on the left-hand side; an image with
caption and acknowledgement of authorship in the middle; and the main
body of the verbal content at the bottom.
The verbal content in the top cluster is the title of the article. In con-
trast to other titles previously considered, in this instance, the title does
not define the content of the rest of the text, but rather, it proposes a
judgement of value on the topic expounded by the article (adjunct—
meaning extension). The verbal content of the title is to be considered
incomplete, given that knowledge of the context is required to assign ref-
erence to the deictics it includes. While the image on the left-hand side
provides a reference for the deictic ‘here’ (essential exophora—meaning
extension), showing a map of the United Kingdom with the image of a
person superimposed on Wales, the pronoun ‘it’ finds no clear referent in
the same cluster, and requires clarification.
The second cluster includes a picture of a child. The legend provides
information on the subject of the picture, adding details about the living
conditions of the boy’s family, his health and the place where they live
(adjunct—meaning extension and spatial enhancement). On the top
left of the image, it is also possible to find information about the author of
the image, meant for meaning extension, but not fundamental for the
general understanding of the text (non-essential agent).
Finally, the third cluster gives detailed information on the main topic of
the article, that is, poverty in the United Kingdom, helping recipients to
assign a clear referent to the pronoun ‘it’ in the title (explicature). The
content of the second cluster is not referred to in any way by the main
body of the verbal content, which is about child poverty in general and
does not concern the subject of the picture or his family. Therefore, the
information included in the second cluster is only thematically related to
the rest of the text (symbiosis—meaning elaboration by exemplifica-
tion). The third cluster also generates explicatures, most notably those
due to the use of deictics such as ‘we’ and ‘our’ in the first column.
Reference assignment for these elements is achieved through context,
since the article is included in the annual report by Save the Children.
The intention in publishing this article in the Save the Children 2012
annual report is to inform supporters and potential supporters of the
living conditions of some children in the United Kingdom and of the
actions taken by Save the Children in order to improve them
(Table 5.8).
136
Table 5.8 Summary table, Save the Children

Sender’s meaning
S. DICERTO
Grouping Semantic representation of individual Semantic representation of multimodal text Inferential meanings
of items modes

relations
1CL 1VER: 1VIS: 1VER-ALL: 1VER-ALL: 4VER- –

It shouldn’t Image of United Complementarity —adjunct Extension 1VER,1VIS:
happen here Kingdom with 1VER-1VIS: 1VER-1VIS: Poverty should
person on Wales Complementarity—Essential Extension not happen in
2CL 2VER: 2VIS: exophora 2VER-2VIS: the United
[legend] Image of child 2VER-2VIS: Extension— Kingdom
3VER: Complementarity—adjunct enhancement 4VER:
Photo: Abbey 3VER-2VIS: (spatial) We = Save the
Trayler-Smith/ Complementarity—non- 3VER-2VIS: Children
Save the children essential agent Extension Our = Save the
3CL 4VER: – 4VER-2VIS, 2VER: 4VER-2VIS, Children’s
[main body of Independence-symbiosis 2VER:
text] Elaboration
In consideration of the international status of Save the Children, it is

possible that part of the informative material that is distributed in the
United Kingdom may well be translated into other languages for different
TAs.
Assuming that this article was to be translated into Italian, translation
problems connected with its multimodal nature would arise mainly in
relation to its title. The translator would have to decide whether to leave
the deictic ‘here’ unchanged in the TT, using its Italian equivalent (qui),
or whether to modify it somehow. In the first case, the image of the
United Kingdom in the same cluster may have to be changed: using the
same image may confuse recipients, who would try to match the deictic
with their own reality (Italy) and would not understand why the text
refers to the United Kingdom as ‘here’. The image could then be
replaced with the one of a reality incorporating both Italy and the United
Kingdom—for example, an image of Europe—to help readers assign the
reference correctly, albeit less precisely. Another possibility would be to
omit the image altogether to eliminate confusion, if this is not problem-
atic in terms of consistency with the surrounding co-text. Using this
strategy, the essential exophora would be eliminated and recipients
would have to compensate for its absence with their knowledge of the
world. Interpretation of the concept of ‘here’ would then be less strongly
guided, and in the light of the verbal content of the article, the audience
may assign the reference to Europe or perhaps to developed countries
more in general.
If the image is omitted, the translator may decide to make the content
of the deictic explicit for compensation. This could be replaced with
Regno Unito (United Kingdom), Europa (Europe) or Paesi sviluppati
(developed countries) depending on the emphasis required by the
TT. The application of this strategy would remove the relation of essen-
tial exophora. The explicit reference to a country (or continent, or group
of countries) other than Italy would lose the direct connection with the
recipient’s own space, making the problem of poverty as described by
the text one that is not necessarily related with the environment imme-
diately surrounding the TA. Losing this connection would not cause
serious problems for the recipients’ understanding and it would not hin-
der the text’s informative function, but it may result in a weaker emo-
tional connection between the TA and the article, due to a reduced level
of perceived relevance.
138 S. DICERTO
5.4 Operative Texts

Operative texts are defined as texts whose function is ‘the inducing of
behavioural responses’, and they ‘can be conceived as stimuli to action or
reaction on the part of the reader’ (Reiss 1977/1989: 105). The commu-
nicative situation corresponding to operative texts is described by Reiss as
follows:
Here the form of verbalization is mainly determined by the (addressed)

receiver by virtue of his being addressable, open to verbal influence on his
behaviour. (Reiss 1977/1989: 105)
Multimodal texts of this type can be found in a variety of genres. An

example of an operative multimodal text is the advertisement (Snell-
Hornby 1997: 278), whose function is to persuade the recipient to buy a
product/service or embrace an idea. Based on the definition above, other
types of operative multimodal texts include, among others, some catego-
ries of leaflets, political websites and some types of web banners.
Since operative texts aim to generate a response from the readership
and determine a change in their behaviour, this type of text needs to be
memorable and to be of relatively straightforward interpretation. A writ-
ten text that can be easily remembered needs to make a limited use of
textual resources, both for reasons related to spatial constraints and quick
delivery of the message to the audience. These should not normally be
resources requiring special types of literacy, especially if the text addresses
a wide audience. Hence, it appears consistent with the nature of operative
texts that their multimodal subtype would often rely heavily on the syn-
ergy between visual and verbal textual resources to bring about a new
interpretation of visual and verbal elements that are otherwise not com-
plex to understand. Also due to their nature, operative multimodal texts
will be likely to appeal to the recipients’ social and emotional world as
leverage to influence the behaviour of the addressees; thus, contextual and
cultural knowledge (i.e. pragmatic contributions to meaning) are likely to
play a key role in the formation and delivery of the message, influencing
the choice on which textual resources are to be deployed and how.
In this section, I analyse a poster from a charity campaign, an internet
banner for environmental publicity, a leaflet for blood donations and a
commercial advertisement, to show how their organisation is consistent
with the general description of the multimodal text type they belong to
and what this means for their translation.
5.4.1 UNICEF Water Campaign

This poster, released by UNICEF in 2007, is part of UNICEF’s awareness
campaign about the consequences of water pollution. The poster shows
one cluster only. Although the text has a strong informative function, it
was classified as mainly operative given the ultimate purpose of getting the
public not only to be aware of the problem but also to act in order to help
resolve it (Fig. 5.9).
In this poster, the verbal content claims that ‘1.5 million children die
every year from drinking polluted water’. As the poster means to convey
this verbal content as objective data, its full logical form is ‘According to
data, 1.5 million children die every year from drinking polluted water’
(explicature). This verbal message is superimposed on the visual content,
which represents water in the shape of an atomic mushroom. This would
be defined by Forceville (1996) as a pictorial metaphor: water is likened to
an atomic bomb by merging visual aspects of the two elements (explica-
ture). The shape of the water in the image and its recognition as an atomic
bomb triggers the retrieval of the recipients’ knowledge on atomic bombs
(implicated premise).
Fig. 5.9 UNICEF (2007) Water Campaign

140 S. DICERTO
The verbal content therefore provides details on the visual metaphor,

explaining on what level water can be considered similar to an atomic
explosion, and behaving as a defining apposition to the image: water is
defined as polluted and its deadly effects are described. By providing
details on the visual content, the verbal contribution extends the meaning
provided by the image, guiding recipients to form an implicated conclu-
sion about water being likened to an atomic explosion in terms of the
number of casualties it causes, with specific reference to children.
The UNICEF name and logo (which represents another visual ele-
ment) positioned below the verbal content provide information on the
source of the message, projecting the ideas contained in the message onto
the sender. This relation of agency is essential to the full understanding of
the text: through the recognition of UNICEF’s name and logo, recipients
will be reminded of their encyclopaedic knowledge of UNICEF as a chari-
table organisation committed to helping children worldwide (implicated
premise) and hence of the charitable purpose of the communication.
Furthermore, an internet address is provided, also as part of the verbal
content: this address adds information that is not essential for the under-
standing of the main message, behaving as an adjunct to the rest of the
multimodal text. By adding new details about an internet location where
more information can be found, this part of verbal content spatially
enhances the meaning provided by the other textual resources. This
address, along with the implicated premise on the role of UNICEF and
the information on the effects of polluted water provided by the poster,
plays a role in the operative function of the text, actively contributing to
the formation of an implicated conclusion coinciding with the author’s
intention: recipients are encouraged to help UNICEF in their chari-
table goal of fighting the problem of polluted water, a disaster com-
parable to the effects of an atomic bomb that endangers the lives of
millions of children, and to learn how to do so, they should visit
UNICEF’s website (Table 5.9).
Given UNICEF’s role as an international organisation linked to the
United Nations, it is easy to imagine that its campaign materials may very
often require translation into various languages and be published in many
different outlets for a variety of audiences. Assuming a translation scenario
in which this poster is used in school campaigns to raise young people’s
awareness about the problem of polluted water in the world, some modi-
fications may be required for the TT to be accessible for the TA. In this
case, the translation of the verbal content does not seem particularly com-
Table 5.9 Summary table, UNICEF Water Campaign
Sender’s meaning
Grouping Semantic representation of Semantic representation of Inferential meanings

zof items individual modes multimodal text

relations
1CL 1VER: 1VIS: 1VER-1VIS: 1VER-1VIS: 1VER: 1VIS:

1.5 million water in the Defining Extension According to General knowledge on
children die shape of apposition 3VER, 2VIS-ALL: data, 1.5 million atomic bombs
every year atomic 3VER, Projection—idea children die 1VER-1VIS:
from drinking explosion 2VIS-ALL: 2VER-ALL: every year from Polluted water has similar
polluted water 2VIS: Essential agent Enhancement— drinking effects to atomic bombs in
2VER: www. UNICEF logo 2VER-ALL: spatial polluted water terms of its casualties
UNICEF.de Adjunct 2VER: 3VER, 2VIS:
3VER: Water is like an UNICEF is a charitable
UNICEF atomic bomb organisation committed to
helping children worldwide
ALL:
The TA should help
UNICEF fight the
problem of polluted water,
a disaster comparable to
the launch of an atomic
bomb, and to learn how to
do so they should visit
UNICEF’s website
MULTIMODAL ST ANALYSIS: THE MODEL APPLIED
141
142 S. DICERTO
plex: the concepts expressed in the verbal content are not culture-specific;
the internet address would require modification according to the target
locale to direct the audience to a website they can access and consult, but
this does not appear to constitute a problem either; UNICEF’s name has
official translations into different languages that can be easily accessed by
the translator.
A potentially more problematic area of this multimodal text is repre-
sented by the visual contribution. Will the readership’s expected encyclo-
paedic knowledge allow them to identify the atomic mushroom,
understanding the pictorial metaphor? If so, does the TA possess the gen-
eral knowledge on atomic explosions necessary to reach the implicated
premise activated by the visual content? Missing out on the identification
of the first implicated premise would in turn hinder the understanding of
the implicated conclusions, given that these are built partly on the impli-
cated premise. If the readership cannot identify correctly the image of the
atomic bomb, they will miss out on part of the intended message, being
unable to identify the role played by the image.
In the case of a school TA, then, the translator may choose to support
the readership’s processing effort by adding details to the verbal contribu-
tion in order to reinforce the semantic representation of the TT. The ver-
bal contribution, for example, could mention atomic bombs, and be
translated as ‘1 milione e mezzo di bambini muore ogni anno dopo aver
bevuto acqua inquinata. Solo una bomba atomica ne ucciderebbe altret-
tanti.’ (‘1 million and a half children die every year after having drunk
polluted water. Only an atomic bomb would kill as many’). This would
allow the TA to achieve a fuller understanding of the message; children
might still not be able to access their own encyclopaedic knowledge about
atomic bombs, but this gap can be filled by teachers through a discussion
of the topic. The resulting TT would be more accessible for the intended
TA, taking into account their developing skills and knowledge of the
world.
5.4.2
WWF’s Earth Hour
This internet banner was released by WWF to support their 2012 Earth
Hour campaign.4 The Earth Hour ‘is a global annual event where millions
of people switch off their lights for one hour to show they care about our
4
http://earthday2017.today/wp-content/uploads/2017/03/6.png.
Fig. 5.10 WWF (2012) Earth Hour
planet’ (Earth Hour 2016). The text shows two clusters: the first cluster is
made of visual content (sketched image of the Earth in space), which is
also the background of two parts of verbal content (‘Our world is bril-
liant – help keep it that way’ and ‘WWF’s Earth Hour – 8.30PM 31
March’). The second cluster is WWF’s Earth Hour 2012 ‘signature’ clus-
ter in the top right corner on a white background (Fig. 5.10).
In the first cluster, the image of the Earth, depicted in brilliant hues of
green and blue to show the outline of the continents (most notably
Africa at the centre of the planet), is in a complementary relationship
with the top chunk of verbal content, which calls our world ‘brilliant’.
This polysemous adjective is commonly used in English with the literal
meaning of ‘shining’, but it is also common to use the same adjective
with the meaning of ‘superb, wonderful’. Recipients are aware that the
planet is not shining and thus retrieve the first meaning based on the
principle of optimal relevance. However, the image leads the recipient’s
process of sense selection in favour of the second option. Both interpre-
tations are therefore activated in the context of this banner, reinforcing
what is expressed by the visual contribution (elaboration) while at the
same time adding new elements of positive judgement towards the planet
(extension). The image of the Earth was produced with a photographic
technique called ‘light painting’, in which light is used to ‘draw’ on a
144 S. DICERTO
dark background (in this case, the night sky) and the image is captured
by the camera, thanks to a long exposure time. The technique used for
the production of the visual content is then also connected with the cen-
tral theme of the WWF’s Earth Hour. If the audience is aware of the
content of the initiative (implicated premise triggered by the name of
the initiative) and of the means by which the image was produced, they
will understand that the meta-information carried by the visual compo-
nent is relevant to the campaign, and that the Earth in the text is ‘made
of light’ (elaboration). Nevertheless, knowledge of this particular tech-
nique should not be assumed for a non-specialist audience, and recipi-
ents can still access the message without the meta-information and the
relevant elaboration.
Again in the first cluster, the bottom part of the verbal content provides
the readership with temporal details on the initiative (adjunct), enhanc-
ing the meaning of the cluster and, as a consequence, of the message as a
whole. As the semantic representation of the verbal content is incomplete,
this produces an explicature (‘our world is brilliant, so help keep it that
way by taking part in WWF’s Earth Hour at 8.30pm on 31 March 2012’).
The role played by the second cluster is very similar to the role played
by the UNICEF name and logo in the UNICEF text: The WWF Earth
Hour name and logo provide information on the message sender, project-
ing the ideas contained in the text onto the author of the message. Since
the first cluster already states the identity of the agent, the relationship
producing the projection is non-essential to the full understanding of the
text. The presence of the name and logo, however, leads the readership
strongly towards the implicated premise about the nature of WWF as a
charitable organisation concerned with the preservation of the planet and
its species.
The full implicated conclusion, coinciding with the author’s inten-
tion and derived from the interaction of all elements, is that WWF, a
charitable environmental organisation, believes that our world is
brilliant, and encourages the audience to help keep it that way by tak-
ing part in the Earth Hour initiative, to be held at 8.30pm on 31
March 2012 (Table 5.10).
The main challenge for translation of this poster into Italian for a simi-
lar campaign by WWF would arguably be the reproduction of the relations
between the top part of verbal content and the other textual resources.
Word choice is crucial in this sense, as the polysemous adjective ‘brilliant’
is the verbal element that allows the establishment of the relationship with
Table 5.10 Summary table, WWF Earth Hour campaign
Sender’s meaning
Grouping Semantic representation of Semantic representation of multimodal Inferential meanings

of items individual modes text

content relations
1CL 1VER: 1VIS: 3VER, 2VIS-ALL: 3VER, 1VER-2VER: 2VER:

Our world is Earth seen Complementarity— 2VIS-ALL: Our world is the Earth Hour initiative
brilliant/ help from space, non-essential agent Projection—idea brilliant, so involves showing support
keep it that way depicted in 1VER-1VIS: 1VER-1VIS: help keep it to the preservation of the
2VER: green and Complementarity— Elaboration, that way by planet by turning lights
WWF’S Earth’s blue brilliant sense selection extension taking part in off for an hour.
Hour/8.30PM 31 hues using 1VIS-1VER, 2VER: 1VIS-1VER, WWF’s Earth 3VER-2VIS:
March light Independence—meta- 2VER: Hour initiative WWF, a charity focusing
painting information Elaboration at 8.30pm on on environmental issues,
techniques 2VER-ALL: 2VER-ALL: 31 March 2012 is the message sender
2CL 3VER: 2VIS: Complementarity— Enhancement ALL:
WWF/Earth WWF adjunct (temporal) WWF, a charitable
Hour/2012 symbol environmental
organisation, believes that
our world is brilliant, and
encourages the audience
to help keep it that way
by taking part in the
Earth Hour initiative, to
be held at 8.30pm on 31
March 2012
145
146 S. DICERTO
the image of the planet. The word chosen in the TL should ideally offer
the same polysemy. However, as this may not be possible in some lan-
guages, the translator should consider which of the two meanings to
favour in the TT. The relationship of sense selection established between
the verbal content and the image of the world contributes both to mean-
ing elaboration (re-stating the visual content through the use of ‘brilliant’
with the meaning of ‘shining’) and meaning extension (adding a positive
element of judgement through the use of ‘brilliant’ with the meaning of
‘wonderful’). If a choice needed to be made, the translator may either
choose to favour the elaboration (brilliant = shining), since it makes an
explicit reference to the content of the campaign, or the extension (bril-
liant = wonderful), as this adds new elements that enrich the text and does
not merely restate information provided by the other mode.
In a language like Italian, the two options would result in translations
such as:
‘Il nostro mondo è brillante – aiuta a mantenerlo com’è’

(‘Our world is shining – help keep it that way’)
‘Il nostro mondo è stupendo – aiuta a mantenerlo com’è’
(‘Our world is wonderful – help keep it that way’)
Either solution is potentially acceptable, depending on the aspect the

translator prefers to emphasise. However, it should be noted that the
world brillante is used in Italian with a second sense of ‘clever’ and is more
often associated with people and ideas than objects. Therefore, its use in
this context may be regarded as slightly unidiomatic. To support the trans-
lation solution using the word ‘stupendo’, it could also be argued that a
meaning extension should be favoured, as it adds new elements and makes
the text richer from the informative point of view; in some cases, however,
an elaboration by reiteration could help engage the audience more effec-
tively, which is an important feature in an operative text. In either case, the
relationship between the textual resources involved will not be one of
sense selection, given the impossibility of replicating in the TT the same
polysemy found in the ST. The verbal content will act as a defining apposi-
tion to the image of the planet; as this relationship still falls in the category
of essential relationships of complementarity, the multimodal semantic
representation of the TT will not show significant variations. This change
has no noticeable effects in terms of overall textual interpretation, as the
intended meaning is anyway clear; its most important outcome lies in the
deletion of a logico-semantic relationship (elaboration or extension) and

therefore in a somewhat reduced multimodal semantic representation for
the TT as compared to its ST, although the two texts are still interpretively
similar.
5.4.3 American Red Cross Leaflet

This text is a reproduction of a leaflet by the American Red Cross, meant
to advertise the lack of type-0 blood in their supply in order to get people
to donate. It shows two clusters: the top cluster is formed by two verbal
components, while the bottom cluster consists of the American Red Cross
name and logo. This text differs from previous ones since features of its
layout are crucial for meaning interpretation (Fig. 5.11).
The projection of the ideas carried out by means of a complementary
relation of non-essential agency between the bottom and the top cluster
is very much like the one described in the other operative texts analysed
(the relationship is not essential as the agent is mentioned in the top clus-
ter, and it does not strictly need to be reiterated for the text to be under-
standable). Also, as in the other operative texts, the retrieval of an
implicated premise about the nature of the American Red Cross as a
humanitarian association is triggered.
In the top cluster, the first part of the verbal content is an elliptical sen-
tence (often used in spoken language in the context of auctions). This is
not meaningful until it is completed based on the second part of the verbal
content, the main body of the text, which informs the readership that the
thing which is ‘going, going, gone’ is blood of type-0. The second part of
the verbal content is then useful for completing the information provided
by the first part, producing an explicature. The second part of verbal
content is also the one that carries the majority of the text’s operative
function, asking the readership explicitly to donate blood and provid-
ing them with information on the reason for the request and on how
to comply with it (intention).
What is important to note in the first part of verbal content is that some
of the information is carried by the absence of a character in the words
‘going, going, gone’, from which all the letter Os have been deleted. As
the subject of the campaign also states the lack of Os (‘We need O’s’), the
two parts are in an equivalent relationship of metaphor, aimed at mean-
ing extension: indeed, in both parts of the verbal content, the Os are
missing, but these are of a different nature in the two verbal contributions
148 S. DICERTO
G ING,
G ING,
G NE.
We need O’s.
Without enough type O
blood on the shelves, we
risk being unable to meet
the needs of patients
here in our community.
On any given day our

blood supply is fragile,
especially our inventory
of Type O which is so
critical in fulfilling our
live-saving mission. Only
7% of people have type-0-
negative blood, the
universal type that is
safe for all patients. 30%
of people have type 0-
positive, which can be
transfused to 80% of all
patients.
Please give blood!
Call 1-800-GIVE-LIFE to
make an appointment at
one of our upcoming
community blood drives
or at a local Red Cross
blood donor center.
Fig. 5.11 Reproduction of American Red Cross leaflet

(one being a typographical character and the other one being a blood
type). The text treats them as one and the same thing, projecting some of
the qualities of one onto the other. By means of this metaphor, recipients
are led to the implicated conclusion that for the American Red Cross, the
need for type-0 blood is similar to the need for Os in writing. This rein-
forces the explicit request by the American Red Cross that recipients
donate blood, by metaphorically qualifying its importance. In this instance,
then, the absence of an expected component in the message carries mean-
ing. This is not a new concept in pragmatics, in which studies on silence
(meant as the absence of an expected utterance or part of it) as a source of
pragmatic meaning are well known: Jaworski, for example, claims that ‘[s]
ilence definitely belongs to the nonverbal component of communication’
(1993: 85) (Table 5.11).
The challenge for the reproduction of this campaign for the Italian Red
Cross would lie, somewhat paradoxically, in the reproduction of the absent
character. The missing letter Os are, indeed, at the basis of the establish-
ment of the equivalent relationship between the two parts of verbal con-
tent in the top cluster, and they are responsible for the meaning extension
deriving from this relationship. A translator would need to find TL equiva-
lents for the words ‘going’ and ‘gone’ containing the letter O to be able
to reproduce this relationship successfully. In Italian, this could be achieved
with the use of two gerunds and a past participle of the verb finire (finish,
run out), with a solution such as ‘STA FINEND , STA FINEND , È FINIT
’ (it is running out, it is running out, it has run out). The forms of the verb
used for this solution would normally all end in ‘O’ (finendo, finito), and
the character has been omitted to replicate the strategy used in the ST. The
position of the missing letter at the end of the word (as opposed to the
middle, as in the ST) is likely to mean that the main body of the text will
have to be moved to the right if the alignment of the main body with the
missing character is to be achieved. In turn, moving the main body of the
text to the right would determine the relocation of the second cluster
either to the centre or to the left, as the main body occupies the space all
the way to the bottom of the leaflet, leaving no room for the symbol of the
American Red Cross underneath it.
Given that the ultimate purpose of the equivalent relationship between
the two parts of verbal content in the top cluster is only to add to the
strength of a request explicitly formulated in the main body, failure to
reproduce the metaphor would not mean a collapse of the multimodal
organisation. The request would, however, lose some support coming
150
Table 5.11 Summary table, American Red Cross leaflet

S. DICERTO
Sender’s meaning


content relations
1CL 1VER: – 3VER, 1VIS-1VER, 3VER, 1 VIS- 1VER-2VER: 3VER, 1VIS:

G ing, g 2VER: 1VER, 2VER: Type O blood The American Red
ing, g ne Complementarity— Projection—ideas is going, going, Cross is a
2VER: non-essential agent 1VER-2VER: gone humanitarian charity
[main body of 1VER-2VER: Extension 1VER-2VER:
text] Equivalence-metaphor The need of having
type O blood for
3VER: 1VIS: the American Red
American Red Symbol of Cross is equal to
2CL Cross Red Cross having Os in
writing
from contextual meanings, and possibly the establishment of a new rela-

tionship between textual resources in the top cluster would be needed to
maintain cohesion. Given the operative purpose of the text, it is important
that the message is communicated as strongly persuasive; therefore, if
translation into a different language did not allow for the reproduction of
the metaphor, compensation strategies may have to be used to replace it
either with a different metaphor or another relationship capable of leading
the reader towards a similar implicated conclusion.
5.4.4 Coldwater Creek
This advertisement was used by Coldwater Creek, a US-based clothing
company and retailer, to publicise their products on their own website. It
shows one cluster only, containing visual and verbal content. The linguis-
tic contribution is divided into two sentences (Fig. 5.12).
The first sentence at the top is an expression of military origin that
refers to having braved adverse circumstances and obtained recognition
for this (implicated premise). While the content suggested verbally seems
unrelated to the topic of fashion, the visual content on the left offers a clue
Fig. 5.12 Coldwater Creek advertisement

152 S. DICERTO
that makes the verbal contribution relevant to a fashion reader: the stripes
mentioned verbally are shown visually, building an essential exophora
that leads the recipient towards a reinterpretation of the verbal content,
producing a meaning extension.
The second sentence again builds an essential exophora with the visual
content, referring explicitly to the image (‘a vibrant pairing like this’). This
relationship is meant to produce a meaning elaboration through
exemplification.
Both parts of verbal content build an extratextual relationship with the
readership, addressing recipients directly. However, they do so in two dif-
ferent ways: the top part simply addresses the recipient as ‘you’, whereas
the second part builds the extratextual connection through the pronoun
‘us’, ‘embracing’ the recipient and including them in a group. Both pro-
nouns need reference assignment, as well as the deictic ‘this’, which refers
to the outfit in the image (explicatures).
The operative function relies heavily on all the elements of the message:
the first sentence conveys that the message sender believes that the reader-
ship deserve stripes because they have earned them. This suggests through
an implicated premise that stripes are reserved for a select group of peo-
ple who are ready to do something to obtain such desirable items.
The second sentence conveys explicit information about the current
fashion, drawing the recipient’s attention both to the general trend and to
how they can use stripes to attract people’s attention as desired by wearing
‘a vibrant pairing like this’.
Together, the various elements of the message aim to suggest the
implicated conclusion that recipients should acquire clothing with stripes
such as the one in the picture because they are desirable and fashionable,
and the recipients deserve them. The author’s intention is to convince
the readership of the implicated conclusion (Table 5.12).
If Coldwater Creek were to open shops in Italy and required its adver-
tisements to be translated, the first sentence would probably represent a
translation issue for an Italian TT. Indeed, Italian does not possess a widely
known cultural equivalent for the idiomatic expression that could be used
replicating the same play on words. However, using an idiomatic expres-
sion with the same meaning is not strictly necessary to the overall func-
tioning of the text, provided that a reference to stripes is maintained in the
TT (as this is necessary to maintain the essential exophora and the related
meaning extension). The TT also needs to suggest to the recipient, implic-
itly or explicitly, that items of clothing with stripes are desirable and
Table 5.12 Summary table, Coldwater Creek
Sender’s meaning
Grouping Semantic representation of Semantic representation of multimodal text Inferential meanings

of items individual modes

content relations
1CL 1VER: 1VIS: 1VER-1VIS: 1VER-1VIS: 1VER: 1VER:

You’ve earned Image of Complementarity—essential Extension you = the Earning one’s stripes
your stripes model exophora 2VER-1VIS: recipient = demonstrating to
2VER: wearing top 2VER-1VIS: Elaboration 2VER: deserve something
Stripes are with stripes Complementarity—essential this = the ALL:
everywhere this exophora one in the The audience should
season, and a picture acquire clothing with
vibrant pairing us = the stripes like the one in
like this draws recipient the picture because
attention and the they are desirable and
upward, where message fashionable and
most of us sender(s) recipients deserve
prefer it them
153
154 S. DICERTO
reserved for an elite who are ready to earn them (in order to preserve the
content of the implicated premise). This will support the implicated con-
clusion, preserving as much of the multimodal semantic representation of
the ST as possible in spite of the change in the verbal content.
A possible solution for an Italian version of the advertisement, then,
could be based on a translation of ‘You’ve earned your stripes’ such as
‘Mettersi in riga a volte è fantastico’ (‘lining up sometimes is great’). The
Italian word for ‘line’ also translates ‘stripe’. Just like its ST counterpart,
the Italian set phrase ‘mettersi in riga’ derives from military slang and
indicates the act of soldiers lining up. In common usage, it signifies the act
of making an effort to go back to duty after a period of undisciplined
behaviour. Therefore, the Italian translation for the first sentence suggests
that, contrary to what is generally believed, there are circumstances in
which ‘mettersi in riga’ can be highly desirable (i.e. getting into clothes
with stripes). This solution would maintain the relation of essential
exophora with the picture, and it would still produce a meaning extension,
mimicking the multimodal semantic representation of the ST.
5.5 Conclusion on the Analysis

Based on the discussion of the examples in Sects. 5.2, 5.3 and 5.4, it is
possible to draw some conclusions concerning the application of the
model. During the analysis of the multimodal texts examined, two main
general patterns emerged.
Firstly, through the use of the model, it was consistently possible to
ascribe a translation problem to one of the three dimensions of the
model—semantic representation of individual modes, semantic
representation of the multimodal text and inferential meanings—estab-
lishing the requirements that needed to be met by a specific TT resource
in order to achieve interpretive resemblance with the ST. The ability to
ascribe translation problems to a specific dimension of the model can raise
awareness of the nature of the issue and help in selecting specific transla-
tion strategies as necessary in order to support interpretive resemblance.
The ability of the model to ascribe a translation issue to its relevant dimen-
sion in a consistent way for all the texts analysed points to a certain level of
comprehensiveness. Of course, the texts analysed in the course of this
journey through multimodal meaning are not representative of all text
categories. For example, they do not include examples of dynamic texts
and they are all analysed for the same language pair, albeit accompanied by
some more general observations. Further applications of the model will
shed light on whether modifications are required for it to realise its full
potential as an evaluative framework, as an analytical method for research-
ers, as a contribution to the training of translators, and for practising trans-
lators as a basis for reflecting on intuitive decisions.
The second important trend identified is the tendency of translation
issues ascribed to a certain dimension of the model to ‘spill over’ into
other dimensions. Indeed, while it is possible to affirm that a certain trans-
lation issue originates from one of the three analytical dimensions, this
does not mean that applying changes will have an effect on that dimension
only. For example, problems generated by difficulties in reproducing the
ST textual resources might mean that changes are required to the resources
employed in the TT; nevertheless, any change in the area of individual
textual resources is likely to have an impact on the accessibility of the prag-
matic meaning required to get to the sender’s intention. An example of
this phenomenon can be found in text Sect. 5.2.4, in which the paronym
‘lie-bury’ allows the ST audience to access a set of pragmatic meanings
connected to President Bush that help them get to the intended message.
While the translation of the paronym can constitute a problem in itself, in
this specific case, it also reflects on the implicit meaning the TT is capable
of suggesting to its audience by reminding them of the political context in
which the cartoon was published. In other cases, translation issues con-
nected with a certain textual resource impact on the relationships this
establishes with other textual resources; this phenomenon can be exempli-
fied by the text on WWF’s Earth Hour (Sect. 5.4.2), in which difficulty in
reproducing a pun—‘brilliant’—may not allow the establishment of a rela-
tion of complementarity between textual resources that is important for
their interplay in the multimodal text. ‘Chain reactions’ involving all tex-
tual dimensions are not uncommon either: in the Coldwater Creek text
(Sect. 5.4.4), the main translation issue lies in the rendering of the verbal
content, not because of the complexity of its internal structure or a highly
specialised terminology but because the translator requires a textual
resource evoking the same (or a similar) implicature (i.e. clothes with
stripes are desirable) while at the same time maintaining the same (or a
similar) relationship between the linguistic and visual content.
The strong likelihood that translation issues arising in one dimension
will influence one or more of the other dimensions suggests that the analy-
sis of these three ways of conveying meaning cannot be fully compartmen-
talised and that resolving a potential issue belonging to one dimension
does not necessarily mean that the solution found will not in turn create a
challenge in another.
156 S. DICERTO
In fact, it is important to highlight again that, even though information

in the analytical table is displayed left to right from specific to general
(individual textual resources, interaction of textual resources, inferential
meanings), the analytical process is not, and never should be seen, as pro-
gressive (i.e. starting with the study of individual resources and ending up
with inferential meanings), as confirmed by the discussion of individual
examples. Indeed, as shown in the analysis of the texts, explicit and implicit
meanings interact and modify each other, leading the analyst (or, indeed,
the recipient) of a multimodal text through several reinterpretations of the
three dimensions that then result in what the audience sees as the intended
sender’s meaning, without set ‘start’ and ‘end’ points for the analytical
process. Thus, the second general trend identified in the analysis of multi-
modal texts takes us back once more to Marsh and White’s view of a mul-
timodal message as a wedding of interacting components, adding to their
view one further reason why understanding this interaction should be of
concern to all communicators.
To conclude this book, Chap. 6 offers a few reflections on the current
form of the model, its contribution to the advancement of literature on
multimodality, potential modifications for its improvement and its future
applicability within and outside of translation studies.
References
Attardo, S. (1994). Linguistic Theories of Humor. New York: Mouton de Gruyter.
London: Equinox.
Earth Hour. (2016). What Is Earth Hour? [online]. Available at: http://earth-
hour.wwf.org.uk/earth-hour/. Last accessed 15 May 2016.
Federici, F. (2011). Introduction: Dialects, Idiolects, Sociolects: Translation
Problems or Creative Stimuli? In F. Federici (Ed.), Translating Dialects and
Languages of Minorities: Challenges and Solutions (pp. 1–20). Oxford: Peter
Lang.
Horn, L. R. (2004). Implicature. In L. R. Horn & G. Ward (Eds.), The Handbook
of Pragmatics (pp. 3–28). Oxford: Blackwell.
Jaworski, A. (1993). The Power of Silence: Social and Pragmatic Perspectives.
Newbury Park: Sage.
Prljic, M. (2014). The Web Designers’ Guide to Photo Selection [blog post]. Available
at: https://webdesign.tutsplus.com/tutorials/the-web-designers-guide-to-
photo-selection--cms-21592. Last accessed 19 June 2017.

In A. Chesterman (Trans. & Ed.), Readings in Translation Theory (pp. 105–115).
Helsinki: Oy Finn Lectura Ab.
Thomson, A. (1964). A Handbook of Anatomy for Art Students (p. 34). New York:
Dover Publications.
CHAPTER 6
Multimodal ST Analysis: Current Status,

Opportunities, Ways Forward
Abstract This chapter provides an assessment of the extent to which the

model has met its initial purposes. At the start of this journey, it was
claimed that if a mapping of multimodal source texts could be achieved,
multimodal message formation could be better understood, and transla-
tion issues/solutions could be discussed based on the knowledge acquired.
The model resulting from this study is an analytical tool that could be used
to investigate particular types of translation (e.g. AVT, localisation), indi-
vidual case studies, language-pair-specific issues or specific source text
genres. The model’s descriptive capabilities also make it into a valuable
tool for other uses. Indeed, the model could be applied to other types of
textual analysis in any field of research in which multimodality is pervasive
(e.g. literature, art, advertising).
Keywords ST analysis • Multimodal translation • AVT • Localisation •

Text analysis
This final chapter reflects upon several aspects of the model proposed in
this study, aiming to assess to what extent it has met its initial objectives.
The main goal set at the beginning of this work was to develop a model
for the textual analysis of multimodal STs for translation purposes. The
model was developed with an approach bringing together aspects of

https://doi.org/10.1007/978-3-319-69344-6_6
160 S. DICERTO
ragmatics, studies on multimodality, translation studies and semiotics;

p
thanks to its tripartite structure, this new model for ST analysis is capable
of recording aspects related to individual modes as well as to their interac-
tion and to the pragmatic meaning the text prompts the readership to
contribute. The adoption of the principle of optimal relevance as the
‘communicative glue’ holding together multimodal texts and as the most
general principle to which translation needs to respond (in the form of
interpretive resemblance) has resulted in a working model that finds its
main root in Pragmatics and in particular in Relevance Theory as applied
in translation studies by Gutt (2000).
The claim made at the start of this journey was that if a mapping of the
multimodal organisation of STs could be achieved, multimodal message
formation could be better understood, and translation issues (as well as
the proposed solutions) could be discussed in the light of the new knowl-
edge acquired. As seen in Sect. 2.4, in Gutt’s view, a translation is an
‘utterance’ interpretively resembling another ‘utterance’ in a different lan-
guage, namely, a text sharing with its ST analytic and contextual features.
It was argued that these analytic and contextual features correspond respec-
tively to explicit and implicit meanings and, therefore, the model was built
to account for such meanings as distinguished by Sperber and Wilson in
their conception of RT. The present study has aimed to explain in some
detail how interpretive resemblance can manifest itself in realistic transla-
tion scenarios, providing an overview of aspects of a text that may influ-
ence the level of interpretive resemblance of a TT to its ST. This aspect of
the model adds to Gutt’s work, tackling Tirkkonen-Condit’s criticism that
sees Gutt’s view on translation as valuable, but at the same time too vague
to explain translation in any detail (Tirkkonen-Condit 1992: 238).
As discussed in detail in Chap. 4, the model studies the explicit and
implicit meanings conveyed by a multimodal ST through different dimen-
sions of analysis: the semantic representation of individual modes, the
semantic representation of the multimodal text (based on visual-verbal
relations) and the inferential meanings to which the text ‘leads’ its recipi-
ents. This tripartite structure has enabled a detailed description of rela-
tively complex units of meaning such as those analysed in Chap. 5, which
are described in terms of their visual and verbal information, the interac-
tion of textual resources and their associated inferential meanings. The
table used to summarise the results of the analysis for each ST, obtained
from the integration of the theoretical constructs of the model with the
multimodal transcription system proposed by Baldry and Thibault (2005),
MULTIMODAL ST ANALYSIS: CURRENT STATUS, OPPORTUNITIES, WAYS… 161
has proved useful for organising the results of analysis for each of the three
dimensions of the model and their connection with the (combinations of)
textual resources involved in conveying meaning, offering a mapping of a
multimodal text ‘at a glance’.
The information on the STs obtained by applying the model has been
useful for discussing translation strategies for the selected texts in terms of
interpretive resemblance, detailing the practical challenges of producing a
TT interpretively resembling its ST. The ST analysis has helped to pin-
point potential issues, ascribing them to a particular dimension (as dis-
cussed in Sect. 5.5) and supporting a discussion of how these could be
resolved in the final TT. This discussion focused mainly on the conse-
quences of certain potential solutions in terms of their impact on the over-
all TT organisation, with particular reference to interpretive resemblance.
In some cases, the solutions proposed did not reproduce the same visual-
verbal relations or the same exact inferential meanings detected in the ST
(supporting Gutt’s view that total interpretive resemblance may not
always be possible in translation); consequently, partial solutions were
looked at in terms of the changes they generated in the organisation of the
multimodal TT in order to investigate interpretive compatibility with the
ST and ascertain whether any compensation was desirable or required for
the multimodal TT to interpretively resemble its ST. The application of
the model to a range of multimodal texts has demonstrated that it can be
used for conscious reflection on the translation challenges posed by mul-
timodal texts based on a higher or lower degree of detected interpretive
resemblance. In this sense, the model has achieved its primary goal as an
analytical framework to improve our understanding of the meaningful
organisation of multimodal STs for translation purposes, as stated in
Chap. 1.
The analysis of the texts included in this study was carried out from the
point of view of an individual, involving an interpretation of the text in a
limited knowledge environment that does not allow direct access to the
sender’s intentions, as described in Chap. 3. Other users of the model may
disagree on the meaning assigned to those texts based on their personal
interpretation, identifying different visual-verbal relations or contextual
implications in the same ST. While this may seem at first a limitation of the
model, these are the conditions under which translators operate the world
over, being able to rely on their own interpretation of a text and trying to
guide their audience towards a similar understanding. Application of the
model by other users should be encouraged as a means of testing its
162 S. DICERTO
descriptive power further; the input of new users may be fundamental to

making the model more nuanced and comprehensive.
A possible amendment to the structure of the model that would increase
its descriptive power could be to extend the reach of the analysis to
embrace the relationship between the text and the recipient, which cannot
currently be accommodated, even when this is important in terms of tex-
tual relevance (and, hence, for translation). Looking at the model in all its
detail, in its current form, it offers ways of analysing deictics meant to
address recipients (e.g. ‘you’, ‘your’, ‘we’, ‘our’) or to refer to their envi-
ronment (e.g. ‘here’), as well as to examine the explicatures these deictics
trigger. However, the model offers no means to evaluate the role played
by explicit textual references to the readership and their environment in
terms of their influence on perceived relevance. For example, the Lord
Kitchener text (Fig. 4.1) makes an appeal for recipients to join the army.
Here it is essential that the relevant group of recipients identify themselves
as ‘you’, feeling ‘called’ by their government, for the sender’s persuasive
intention to get across to the readership. The extratextual connection to
the recipient is paramount in this case: if for some reason the expected
audience does not identify themselves as ‘you’, they will not feel that the
text is directed at them, and the message will lose relevance. This might
mean that its intended audience might not deem the text worth process-
ing in its entirety. If the message is not fully processed, the sender’s com-
municative intention will be lost and the text will have failed to convey its
persuasive intention to the readership altogether. This does not negate the
implicated conclusion suggested by the text or change its nature in any
way; however, it makes the text irrelevant to the intended audience, and,
thus, its implicated conclusion has no force.
This example shows the potential importance of the recipient’s self-
identification as the addressee; however, this is not an on/off ‘relevance
switch’, as in some cases, lack of identification of the extratextual connec-
tion may not have such dramatic effects. In the text from Save the
Children’s annual report (Sect. 5.3.4), there is an explicit reference to the
intended audience’s environment, but this is included in the message
mostly in order to trigger a stronger emotional response towards a reality
the readership feels close to. A reader of the report would still be likely to
fully process the text without that reference, perhaps because of its rele-
vance to their personal interest in the topic regardless of the ‘emotional’
connection. Therefore, when it comes to translation, the extratextual con-
nection needs to be evaluated on a case-by-case basis.
A possible way to account for this part of meaning that is at the moment
unaccounted for in the model would be to treat extratextual connections
to the recipient as a special type of explicature that requires an assessment
of the expected influence on the level of textual relevance perceived by the
TA. Indeed, this indirectly affects the meanings conveyed by a text and in
particular implicit meanings, as the level of perceived relevance determines
the cognitive effort the audience is willing to make to process a text fully.
A low level of perceived relevance is likely to mean that the explicit and
implicit meanings conveyed by a text will not be fully (or partly) processed
by the readership and that the text will lose all or part of its force—being,
in effect, inadequate to convey the intention behind it.
With or without this integration, researchers in translation studies
could use the model to investigate, among other topics, particular types of
translation. As noted in Chap. 4, the model could find applicability in
research on AVT, localisation or other areas of translation studies dealing
with multimodal meaning. Its applicability as an analytical tool can also
include individual case studies, the analysis of language-pair-specific issues
or systematic investigations of specific ST genres. Extensive application of
the model to individual ST types/genres could be useful to detect trends
specific to certain groups of texts, allowing researchers to produce gener-
alisations regarding the way some multimodal STs are normally organised.
Such generalisations, on top of expanding our current knowledge of mul-
timodal matters, could be useful for informing translation theory with a
more detailed account of the challenges that can be reasonably expected
from each category of texts and, hence, what to ‘look out for’ in the trans-
lation activity. For example, the model’s application to operative and
informative texts would result in a more detailed picture of how these are
respectively centred on their receiver and on their topic. Within the lim-
ited scope of the sets of examples analysed here, operative texts have
already shown a rather heavy reliance on the receiver to contribute to the
text through the elaboration of inferential meanings, while informative
texts have appeared as rather more heavily invested in their semantic
representation.
Topics of authorship and creativity could also be investigated by means
of the model—as anticipated at the beginning of Sect. 5.2, expressive texts
are comparatively free-form, and their shape mostly depends on the
author’s personal idiolect and aesthetic choices. Therefore, the works of
individual authors could be analysed and compared, both in order to iden-
tify common patterns and to see if/how an author’s style develops over
164 S. DICERTO
time. The application of the model to expressive texts may help elaborate
on Reiss’s definition by providing a detailed account of the features form-
ing the ‘multimodal signature’ of a certain author, going beyond the gen-
eral description of expressive texts as author-dependent with an analysis of
the communicative strategies and schemes of interaction between dimen-
sions that are consistently used by an author. Application of the model to
different language combinations could also help to differentiate between
generalisations applicable to a specific language pair and more highly gen-
eralizable trends (or ‘norms’) in the translation of multimodal STs.
The model could also be directly useful for the training of new transla-
tors. The type of analysis supported by the model is likely to encourage
trainee translators to come to terms with the necessity to steer away from
the common starting point of ‘literal’ translation in order to serve the
higher purpose of interpretive resemblance; this realisation is in turn likely
to increase their awareness of potential translation issues and of the strate-
gies that can be applied to overcome them based on a framework that can
help them reflect critically on the strengths and weaknesses of their own
choices.
Nevertheless, it could be argued that, as it stands, the model is too
complex for use by translation students. Indeed, some training is required
to understand how to map out texts. Use of the model requires an under-
standing of Relevance Theory and of visual-verbal relations that should
be delivered to students prior to any potential application of the model.
Also, the analytical process can be lengthy in particular when first used,
making its in-class use potentially impracticable. While it seems hardly
possible to produce a simple model accounting in detail for a complex
and multifaceted reality such as multimodality, didactic difficulties need
to be acknowledged. For these reasons, especially at the beginning,
trainee translators may be presented with a ‘reduced’ version of the
model, in which contextual meanings are addressed in less depth (e.g.
replacing explicatures and implicatures in Table 4.7 with a single column
where contextual factors are listed in connection with textual resources)
and visual-verbal relations are divided into the four broad categories of
equivalence, contradiction, independence and symbiosis without any fur-
ther specification. The tool could find a didactic application in a multimo-
dality module, and be used with tutor guidance in this reduced version.
This would partly resolve the issue of complexity, leaving trainees with the
freedom to study v isual-verbal relations in more detail in their own time,
if they have a specialist interest in multimodal meaning. Alternatively, and
perhaps more realistically given its complexity, the model could be used
mainly as a background resource to raise student awareness of the organ-
isation of multimodal STs and its potential impact on translation.
While the model has been developed with translation in mind, its
descriptive value makes it into a valuable tool also for uses other than
translation. The possibility of mapping a multimodal text according to the
three dimensions of the model could indeed be useful for other types of
textual analysis in entirely different fields of research. Multimodality is a
pervasive phenomenon, and relevant studies (e.g. on literature, art, adver-
tising, language for special purposes and many more) could use the blue-
print proposed in this book to produce versions of the model corresponding
to their specific analytical needs. These may require the replacement of the
semiotic modes used for this study with different ones, the addition of
further modes, or maybe a stronger emphasis on one of the dimensions,
and are likely to be unconnected to the original idea of supporting inter-
pretive resemblance between a TT and its ST.
The model’s multidisciplinary nature inherently means that its poten-
tial applications are also multidisciplinary. It is now time for this study to
return the concepts borrowed from each field of studies to the respective
owners, hopefully with some interest gained through interdisciplinary
interaction; in the course of this journey through multimodal meaning,
this interaction, which represents a point of contact between the seem-
ingly parallel lines of independent disciplines, has proven to have not
merely additive but also multiplicative effects—just like the interaction
between modes in a multimodal text.
References
London: Equinox.
Tirkkonen-Condit, S. (1992). A Theoretical Account of Translation—Without
Translation Theory? Target, 4(2), 237–245.
References
Aguiar, D., & Queiroz, J. (2010). Modeling Intersemiotic Translation: Notes
Toward a Peircean Approach [online]. Available at: http://french.chass.uto-
ronto.ca/as-sa/ASSA-No24/Article6en.htm. Last accessed 26 June 2017.
Anstey, M., & Bull, G. (2010). Helping Teachers to Explore Multimodal Texts.
Curriculum Leadership [online]. Available at: http://cmslive.curriculum.edu.
au/leader/default.asp?id=31522&issueID=12141. Last accessed 26 June 2017.
Attardo, S. (1994). Linguistic Theories of Humor. New York: Mouton de Gruyter.
Bach, K. (1994). Conversational Impliciture. Mind & Language, 9(2), 124–162.
Bagley, P. (2013, April 24). Bush Library. Salt Lake Tribune.
London: Equinox.
Bell, S. (2013, May 22). The Backbone. The Guardian.
Berinstein, P. (1997). Moving Multimedia: The Information Value in Images.
Searcher, 5(8), 40–49.
Bernardo, A. M. (2010). Translation as Text Transfer—Pragmatic Implications.
Estudos Linguisticos / Linguistic Studies, 5, 107–115.

https://doi.org/10.1007/978-3-319-69344-6
168 References
Blakemore, D. (1992). Understanding Utterances: An Introduction to Pragmatics.

Oxford: Blackwell.
Braun, S. (2016). The Importance of Being Relevant? A Cognitive-Pragmatic
Framework for Conceptualising Audiovisual Translation. Target, 28(2),
302–313.
Carston, R. (2004). Relevance Theory and the Saying/Implicating Distinction. In
L. Horn & G. Ward (Eds.), The Handbook of Pragmatics (pp. 633–656).
Oxford: Blackwell.
Cattrysse, P. (2001). Multimedia & Translation: Methodological Considerations.
In Y. Gambier & H. Gottlieb (Eds.), (Multi) Media Translation: Concepts,
Practices and Research (pp. 1–12). Amsterdam/Philadelphia: Benjamins.
Chaume, F. (2004). Cine y Traducción. Madrid: Ediciones Cátedra.
Chesterman, A. (1997). Memes of Translation. Amsterdam/Philadelphia:
Benjamins.
Chiaro, D., Heiss, C., & Bucaria, C. (Eds.). (2008). Between Text and Image:
Updating Research in Screen Translation. Amsterdam/Philadelphia: Benjamins.
Cohn, N. (2007). A Visual Lexicon. The Public Journal of Semiotics, 1(1), 35–56.
Cooren, F. (2008). Between Semiotics and Pragmatics: Opening Language Studies
to Textual Agency. Journal of Pragmatics, 40, 1–16.
Crystal. (2010). The Cambridge Encyclopedia of Language. Cambridge: Cambridge
University Press.
Desilla, L. (2014). Reading Between the Lines, Seeing Beyond the Images: An
Empirical Study on the Comprehension of Implicit Film Dialogue Meaning
Across Cultures. The Translator, 20(2), 194–214.
Desjardins, R. (2017). Translation and Social Media in Theory, in Training and in
Professional Practice. London: Palgrave Macmillan.
Díaz-Cintas, J. (2004). In Search of a Theoretical Framework for the Study of
Audiovisual Translation. In P. Orero (Ed.), Topics in Audiovisual Translation
Dr Seuss. (2004). The Cat in the Hat (p. 1). London: Harper Collins.
Durant, A., & Lambrou, M. (2009). Language and Media: A Resource Book for
Students. New York: Routledge.
Earth Hour. (2016). What Is Earth Hour? [online]. Available at: http://earth-
hour.wwf.org.uk/earth-hour/. Last accessed 15 May 2016.
References
169

Federici, F. (2011). Introduction: Dialects, Idiolects, Sociolects: Translation
Problems or Creative Stimuli? In F. Federici (Ed.), Translating Dialects and
Languages of Minorities: Challenges and Solutions (pp. 1–20). Oxford: Peter
Lang.
Forceville, C. (1999). Educating the Eye? Kress and Van Leeuwen’s Reading
Images: The Grammar of Visual Design (1996). Language and Literature, 8,
163–178.
Forceville, C., & Clark, B. (2014). Can Pictures Have Explicatures? Linguagem
em (Dis)curso, 14(3), 451–472.
Gambier, Y., & Gottlieb, H. (Eds.). (2001). (Multi) Media Translation: Concepts,
Practices and Research. Amsterdam/Philadelphia: Benjamins.
Gregory, M. (2002). Phasal Analysis Within Communication Linguistics: Two
Contrastive Discourses. In P. H. Fries, M. Cummings, D. Lockwood, &
W. Sprueill (Eds.), Relations and Functions in Language and Discourse
(pp. 316–345). London: Continuum.
University Press.
Hagan, M. S. (2007). Visual/Verbal Collaboration in Print: Complementary
Differences, Necessary Ties, and Untapped Rhetorical Opportunity. Written
Communication, 24(1), 49–83.
Halliday, M. A. K. (1994). An Introduction to Functional Grammar (2nd ed.).
London: Edward Arnold.
Halliday, M. A. K., & Webster, J. J. (2003). On Language and Linguistics. London:
Continuum.
Hatim, B., & Mason, I. (1990). Discourse and the Translator. London: Longman.
Hermans, T. (1999). Translation in Systems: Descriptive and System-Oriented
Approaches Explained. Manchester: St. Jerome.
Hickey, L. (1998). The Pragmatics of Translation. Clevedon: Multilingual Matters.
Horn, L. R. (2004). Implicature. In L. R. Horn & G. Ward (Eds.), The Handbook
of Pragmatics (pp. 3–28). Oxford: Blackwell.
House, J. (1997). Translation Quality Assessment: A Model Revisited. Tübingen:
Gunter Narr.
Uxbridge.
Hyland, K. (2005). Metadiscourse: Exploring Interaction in Writing. London:
Continuum.
170 References
Jakobson, R. (1959). On Linguistic Aspects of Translation. In L. Venuti (Ed.), The

Translation Studies Reader (2nd ed.). London: Routledge.
Jaworski, A. (1993). The Power of Silence: Social and Pragmatic Perspectives.
Newbury Park: Sage.
Jucker, A. H., & Smith, S. W. (1995). Explicit and Implicit Ways of Enhancing
Common Ground in Conversations. Pragmatics, 6(1), 1–18.
Kagan, N. (2014). Why Content Goes Viral: What Analyzing 100 Million Articles
Taught Us [blog post]. Available at: http://www.huffingtonpost.com/noah-
kagan/why-content-goes-viral-wh_b_5492767.html. Last accessed 19 June
2017.
Kaindl, C. (2004). Multimodality in the Translation of Humour in Comics. In
Kitis, E. (2009). The Pragmatic Infrastructure of Translation. Tradução &
Comunicação, 18, 63–85.
Koster, C. (2011). Comparative Approaches to Translation. In Y. Gambier &
L. Van Doorslaer (Eds.), Handbook of Translation Studies (Vol. 2). Amsterdam/
Kress, G. (2000). Multimodality: Challenges to Thinking About Language.
TESOL Quarterly, 34(2), 337–340.
Leech, J. (1983). Principles of Pragmatics. London: Longman.
Lemke, J. L. (1998). Multiplying Meaning: Visual and Verbal Semiotics in
Scientific Text. In J. Martin & R. Veel (Eds.), Reading Science: Crticial and
Functional Perspectives on Discourses of Science (pp. 87–113). London:
Routledge.
Levin, J. R., & Mayer, R. E. (1993). Understanding Illustrations in Text. In B. K.
Britton, A. Woodward, & M. Binkley (Eds.), Learning from Textbooks: Theory
and Practice. Hillsdale: Lawrence Erlbaum Associates.
Levinson, S. C. (1983). Pragmatics. Cambridge: Cambridge University Press.
Liu, Y., & O’Halloran, K. L. (2009). Intersemiotic Texture: Analyzing Cohesive
Devices Between Language and Images. Social Semiotics, 19(4), 367–388.
University Press.
References
171
McCloud, S. (1994). Understanding Comics: The Invisible Art. New York:
HarperPerennial.
Multimodal Analysis Company. (2013). Concept [online]. Available at: http://
multimodal-analysis.com/about/concept/. Last accessed 16 June 2017.
Munday, J. (2004). Advertising: Some Challenges to Translation Theory. The
Translator, 10, 199–219.
(2nd ed.). New York: Routledge.
(4th ed.). Oxon: Routledge.
Newmark, P. (1981). Approaches to Translation. Oxford: Pergamon.
Nida, E. (1964). Toward a Science of Translating. Leiden: E.J. Brill.
Nikolajeva, M., & Scott, C. (2000). The Dynamics of Picturebook Communication.
Children’s Literature in Education, 3(3), 225–239.
Nord, C. (2005). Text Analysis in Translation: Theory, Methodology and Didactic
Application of a Model for Translation-Oriented Text Analysis (2nd ed.).
Amsterdam/New York: Rodopi.
London: Routledge.
O’Sullivan, C. (2013). Introduction: Multimodality as Challenge and Resource
for Translation. In C. O’Sullivan & C. Jeffcote (Eds.), Special Issue on
Translating Multimodalities, JoSTrans (Vol. 20, pp. 2–14).
O’Sullivan, C., & Jeffcote, C. (Eds.). (2013). Special Issue on Translating
Multimodalities, JoSTrans (Vol. 20).
Olson, D. R. (1994). The World on Paper: The Conceptual and Cognitive
Implications of Writing and Reading. Cambridge: Cambridge University Press.
Orero, P. (Ed.). (2004). Topics in Audiovisual Translation. Amsterdam/
Orlebar, J. (2009). Understanding Media Language. MediaEdu [online]. Available
at: http://media.edusites.co.uk/article/understanding-media-language/.
Last accessed 26 June 2017.
Pedersen, J. (2008). High Felicity: A Speech Act Approach to Quality Assessment
in Subtitling. In D. Chiaro, C. Heiss, & C. Bucaria (Eds.), Between Text and
Image: Updating Research in Screen Translation. Amsterdam/Philadelphia:
Benjamins.
172 References
Pedersen, J. (2011). Subtitling Norms for Television. Amsterdam/Philadelphia:

Benjamins.
Pérez-González, L. (2014). Audiovisual Translation: Theories, Methods and Issues.
London: Routledge.
Pfister, J. (2010). Is There a Need for a Maxim of Politeness? Journal of Pragmatics,
42, 1266–1282.
Plato (370 BC). (1925). Phaedrus (H. N. Fowler, Trans.). In Plato in Twelve
Volumes (Vol. 9). London: William Heinemann.
Prljic, M. (2014). The Web Designers’ Guide to Photo Selection [blog post]. Available
at: https://webdesign.tutsplus.com/tutorials/the-web-designers-guide-to-
photo-selection--cms-21592. Last accessed 19 June 2017.
Pym, A. (2010). Exploring Translation Theories. London: Routledge.
In A. Chesterman (Trans. & Ed.), Readings in Translation Theory
(pp. 105–115). Helsinki: Oy Finn Lectura Ab.
Reiss, K. (1981/2004). Type, Kind and Individuality of Text: Decision Making in
Translation. In S. Kitron (Trans.) & L. Venuti (Ed.), The Translation Studies
Reader (pp. 160–171). London: Routledge.
Remael, A. (2001). Some Thoughts on the Study of Multimodal and Multimedia
Translation. In Y. Gambier & H. Gottlieb (Eds.), (Multi) Media Translation:
Concepts, Practices and Research (pp. 13–22). Amsterdam/Philadelphia:
Benjamins.
Sacks, H., & Schegloff, E. A. (1979). Two Preferences in the Organization of
Reference to Persons in Conversation and Their Interaction. In G. Psathas
(Ed.), Everyday Language: Studies in Ethnomethodology (pp. 15–21). New York:
Irvington.
Salway, A., & Martinec, R. (2005). Some Ideas for Modelling Image-Text
Combinations (Department of Computing Technical Report CS-05-02).
Guildford: University of Surrey.
Saussure, F. (1916/1983). Course in General Linguistics (R. Harris, Trans.).
London: Duckworth.
Searle, J. (1969). Speech Acts. Cambridge: Cambridge University Press.
Snell-Hornby, M. (1995). Translation Studies: An Integrated Approach.
Snell-Hornby, M. (2006). The Turns of Translation Studies: New Paradigms or
Shifting Viewpoints? Amsterdam/Philadelphia: Benjamins.
References
173
Stöckl, H. (2004). In Between Modes: Language and Image in Printed Media. In

Tanaka, K. (1999). Advertising Language: A Pragmatic Approach to Advertisements
in Britain and Japan. London: Routledge.
Thompson, S. (1978). Modern English from a Typological Point of View: Some
Implications of the Function of Word Order. Linguistische Berichte, 54, 19–35.
Thomson, A. (1964). A Handbook of Anatomy for Art Students (p. 34). New York:
Dover Publications.
Tirkkonen-Condit, S. (1992). A Theoretical Account of Translation—Without
Translation Theory? Target, 4(2), 237–245.
Toury, G. (1995). Descriptive Translation Studies—and Beyond. Amsterdam/
United Nations. (2015). Inequality Matters: Report on the World Social Situation
2013. New York: United Nations Department of Economic and Social Affairs.
Available at: http://www.un.org/esa/socdev/documents/reports/Inequality
Matters.pdf. Last accessed 19 June 2017.
Van Leeuwen, T. (2006). Typographic Meaning. Visual Communication, 4, 137–143.
Ventola, E., Cassily, C., & Kaltenbacher, M. (Eds.). (2004). Perspectives on
Multimodality. Amsterdam/Philadelphia: Benjamins.
Vermeer, H. (1996). A Skopos Theory of Translation. Heidelberg: TEXTconTEXT.
Williams, J., & Chesterman, A. (2014). The Map: A Beginner’s Guide to Doing
Research in Translation Studies. Oxon: Routledge.
Yus Ramos, F. (1998). Relevance Theory and Media Discourse: A Verbal-Visual
Model of Communication. Poetics, 25, 293–309.
Zanettin, F. (2004). Comics in Translation Studies. An Overview and Suggestions
for Research. In Tradução e interculturalismo. VII Seminário de Tradução
Científica e Técnica em Língua Portuguesa (pp. 93–98). Lisboa: União Latina.
Index1
B 112, 116, 120, 124, 128, 132,

Baldry, A., 22, 23, 26–28, 63, 89–92, 136, 141, 145, 150, 153
102, 161
See also Multimodal transcription
D
Dimensions, 22, 23, 26, 57, 61–64, 66,
C 85, 88, 89, 101, 102, 104, 105,
Clusters, 27, 63, 81, 90–96, 154–156, 160, 161, 164, 165
101–103, 105–108, 110–113,
116, 117, 120, 122, 124, 125,
127–129, 131–133, 135–137, E
139, 141, 143–145, 147, Equivalence, 4, 55, 79–83, 93, 95, 108,
149–151, 153 115–117, 128, 131, 132, 150, 164
See also Baldry, A.; Multimodal Explicatures, 49, 50, 66, 68, 70–72,
transcription; Thibault, P. J. 84, 85, 89, 92, 94, 95, 102, 103,
Cooperativeness, 50–57 106–110, 112, 113, 115, 116,
See also Grice, H. P. 118, 120–122, 124, 127, 128,
COSMOROE, see Cross-media 131, 132, 135, 136, 139, 141,
interaction relations 144, 145, 147, 150, 152, 153,
Cross-media interaction relations 162–164
(COSMOROE), 78, 79, 81, 82, See also Implicatures; Logical form;
84, 89, 93, 95, 101, 102, 108, Semantic representation
Note: Page numbers followed by “n” refer to notes.

1

https://doi.org/10.1007/978-3-319-69344-6
176 INDEX
G 71, 77, 78, 82, 89, 94, 105, 115,

Grammar, 24, 26, 29–31, 48, 65–67, 121, 129, 144, 156, 160, 164,
72, 78, 82, 84, 86–88 165
visual, 24, 65, 86–88 (see also Kress, Interpretive resemblance, 55–57, 72,
G.; Van Leeuwen, T.) 84, 85, 96, 101, 119, 154, 160,
Grice, H. P., 9, 42–51 161, 164, 165
See also Cooperativeness; Pragmatics See also Gutt, E. A.; Implications;
Gutt, E. A., 22, 52, 54–57, 62, 71, Optimal resemblance
72, 160, 161
See also Interpretive resemblance;
Optimal resemblance; K
Relevance theory (RT) Kress, G., 22–26, 28–31, 33, 67,
86–88
See also Pragmatics; Strata; Visual
H grammar; Visual mode
Halliday, M. A. K., 16, 28–32, 74–78,
81, 82, 84, 86
See also Logico-semantic relations L
Logical form, 50, 66–68, 72, 77, 92,
113, 114n2, 115, 139
I See also Explicatures; Implicatures;
Implicated Semantic representation
conclusions, 70, 71, 106, 107, 119, Logico-semantic relations, 28,
140, 142, 144, 149, 151, 152, 73, 75–78, 84, 89, 101, 102,
154, 162 (see also implicatures) 147
premises, 70, 71, 106, 109, 111, See also Cross-media interaction
118, 119, 139, 140, 142, 144, relations (COSMOROE);
147, 151, 152, 154 (see also Halliday, M. A. K.
implicatures)
Implications
analytic, 54, 55 M
contextual, 54, 55, 70–72, 84, 85, Mode, 18, 31
161 aural, 31, 33, 88
See also Gutt, E. A.; Interpretive semiotic, 6, 9, 16–18, 20, 23–33,
resemblance 41, 42, 50, 56, 90, 94, 101,
Implicatures, 46, 48–51, 66, 70–72, 104, 165
84, 85, 89, 92, 94, 95, 102, 103, verbal, 31, 62, 65, 127–129 (see also
107, 108, 112, 114n2, 115, 116, Halliday, M. A. K.)
120–122, 124, 128, 132, 136, visual, 31, 32, 86–88, 129 (see also
141, 145, 150, 153, 155, 164 Kress, G.; Van Leeuwen, T.)
See also Explicatures; Logical form; Multimodal text
Semantic representation dynamic, 3, 10, 91–94
Interactions, 3, 4, 6, 9, 16, 21–29, 33, static, 3, 10, 18, 71, 90, 91, 94, 95,
39, 50, 57, 62, 63, 65, 67, 70, 99
INDEX
177
Multimodal transcription, 26, 90, 102, Relevance theory (RT), 42–45, 47–49,
160 51, 52, 54–57, 62, 64, 66, 67,
See also Baldry, A.; Thibault, P. J. 70, 78, 88, 160, 164
See also Gutt, E. A.; Sperber, D.;
Wilson, D.
O RT, see Relevance theory
Operative text, 10, 100, 109, 122,
138–154, 163
Optimal relevance, 55, 64–67, 70, 72, S
73, 82, 84, 88, 89, 103, 107, Semantic representation, 50, 66–68,
113, 143, 160 70–85, 87–89, 92–96, 101–103,
presumption of, 55, 64–67, 107, 108, 110–113, 115–117,
70, 72, 73, 82, 84, 88, 120, 124, 125, 128, 129, 131,
89, 107 132, 136, 141, 142, 144–147,
See also Gutt, E. A.; Sperber, D.; 150, 153, 154, 160, 163
Wilson, D. See also Explicatures; Implicatures;
Optimal resemblance, 55 Logical form
See also Implications; Optimal Shared cognitive environment, 38–42
relevance Social semiotics, 8, 9, 16, 23
See also Kress, G.; Van Leeuwen, T.
Sperber, D., 9, 38, 42–50, 54, 56, 62,
P 64–68, 70, 71, 103, 115, 160
Pastra, K., 9, 10, 18, 62, 73, 78–80, See also Gutt, E. A.; Relevance
82, 83 theory (RT)
See also Cross-media interaction Strata, 24, 25
relations (COSMOROE); See also Kress, G.; Van Leeuwen, T.
Logico-semantic relations
Phases, 27, 63, 90, 103
See also Baldry, A.; Multimodal T
transcription; Thibault, P. J. Text, 5
Pragmatics, 8, 9, 16, 22, 34, 37–57, expressive, 10, 100, 101, 103–119,
62–73, 78, 82, 85–89, 114n2, 122, 163, 164 (see also Reiss, K.)
138, 149, 155, 160 informative, 10, 100, 109, 121–137,
See also Cooperativeness; 163 (see also Reiss, K.)
Grice, H. P.; Kress, G.; operative, 10, 100, 109, 122,
Relevance theory (RT); Van 138–154, 163 (see also Reiss, K.)
Leeuwen, T. Textual resources, 2–5, 9, 51, 57, 62,
63, 76, 85, 91–96, 102, 104,
115, 121, 125, 129, 131, 133,
R 138, 140, 144, 146, 151, 155,
Reiss, K., 5, 10, 100, 101, 103, 104, 156, 160, 161, 164
121, 138, 164 Thibault, P. J., 22, 23, 26, 28, 63,
See also Expressive, Informative, 89–92, 102, 161
Operative text See also Multimodal transcription
178 INDEX
Translation, 1–10, 15–18, 22, 23, Visual mode, 31, 32, 86–88, 129
27–29, 38, 40, 50–52, 61–96, 99, Visual-verbal relations, 10, 62, 74, 76,
101–104, 107, 109, 111, 117, 83–85, 88, 102, 104, 113, 119,
119, 121–125, 127, 129, 131, 160, 161, 164
137, 138, 140–142, 144, 146, See also Cross-media interaction
151, 152, 154–156, 159–165 relations (COSMOROE);
Logico-semantic relations
V
Van Leeuwen, T., 9, 23, 24, 28–33, W
67, 86–89 Wilson, D., 9, 38, 42–50, 54, 56, 62,
See also Visual grammar; Visual 64–68, 70, 71, 103, 115, 160
mode; Pragmatics; strata See also Gutt, E. A.; Relevance
Visual grammar, 24, 65, 86–88 theory (RT)

Sara Dicerto (Auth.) - Multimodal Pragmatics and Translation - A New Model For Source Text Analysis - 2018)

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Sara Dicerto (Auth.) - Multimodal Pragmatics and Translation - A New Model For Source Text Analysis - 2018)

Hochgeladen von

Copyright:

Verfügbare Formate

PALGRAVE STUDIES IN

TRANSLATING AND INTERPRETING

More information about this series at

Palgrave Studies in Translating and Interpreting

Library of Congress Control Number: 2017960751

© The Editor(s) (if applicable) and The Author(s) 2018

Cover pattern © Melisa Hasan

Printed on acid-free paper

This Palgrave Macmillan imprint is published by Springer Nature

Filon, for reasons he knows very well.

1 A New Model for Source Text Analysis in Translation 1

2 On the Road to Multimodality: Semiotics 15

3 Multimodal Meaning in Context: Pragmatics 37

4 Analysing Multimodal Source Texts for Translation:

5 Multimodal ST Analysis: The Model Applied 99

6 Multimodal ST Analysis: Current Status,

Fig. 3.1 Meaning detection scheme, after Levinson (2000: 188) 47

Table 3.1 Pragmatic approaches to meaning, after Levinson

A New Model for Source Text Analysis

Abstract What is considered important in translation has undergone sev-

Keywords Translation theory • Multimodality • Translation • ST analy-

© The Author(s) 2018 1

What was and is considered important in translation has undergone

1.1 Moving Towards Multimodality

texts do not necessarily show meaningful interaction between semiotic

translation is usually thought of as being about the printed word […].

translation would be better explained with a multidisciplinary approach

The need for new perspectives in multimodality has been stated in a

1.2 Overview of the Book

Chapter 4 presents the development of the proposed interdisciplinary

Hermans, T. (1999). Translation in Systems: Descriptive and System-Oriented

Nida, E. (1964). Toward a Science of Translating. Leiden: E.J. Brill.

Sperber, D., & Wilson, D. (1986/1995). Relevance: Communication and

On the Road to Multimodality: Semiotics

Abstract Given the various perspectives that can be adopted to under-

Keywords Multimodality • Translation • Social semiotics • ST analysis

Given the complexities of multimodal issues and the various perspectives

© The Author(s) 2018 15

­ ultidisciplinary result. As discussed in Chap. 1, literature with the poten-

2.1 The Organisation of Signs: The Realm

The chosen starting point of this journey through multimodal meaning is

this definition. In my view, the differentiating characteristic of ‘multimodal’

While the reference to Barthes’ work is an early one, it is important to

as the model is ultimately intended to be potentially applicable to all types

– Symbols have an arbitrary relation to their signifieds: they represent them

Mapping this taxonomy on the three semiotic modes investigated in

potential interpretations, even for expert ‘readers’, whereas verbal

concentrate on specific features of one semiotic system only, exploring its

–– Investigating in depth the verbal, visual and aural semiotic modes in

2.2 One Text, Many Semiotic Modes: Social

2.2.1 Kress and Van Leeuwen (2001): Multimodal Discourse

• Discourses are socially situated forms of knowledge about reality

• Production is the material articulation of the message, the actual

• Distribution is the ‘technical’ side of production and relates to the

[L]anguage is still the mode which is foregrounded in terms of the poten-

2.2.2 Baldry and Thibault (2005): Multimodal Transcription

the meaning of a sentence can be informed by the relative position of

usage of the word ‘grammar’ and its traditional language-based meaning,

between “cognition” and “affection” processes’ (2006: 77). On the basis

the perceived distance of sounds can be used as a means for foregrounding

The difference in the level of organisation of these three semiotic modes

2.3 Beyond the ‘Multimodal Code’

allow these meanings to be conveyed through multimodal textual organ-

1 A New Model for Source Text Analysis in Translation 1

2 On the Road to Multimodality: Semiotics 15

3 Multimodal Meaning in Context: Pragmatics 37

4 Analysing Multimodal Source Texts for Translation:

5 Multimodal ST Analysis: The Model Applied 99

6 Multimodal ST Analysis: Current Status,

Fig. 3.1 Meaning detection scheme, after Levinson (2000: 188) 47

ultidisciplinary result. As discussed in Chap. 1, literature with the poten-

– Symbols have an arbitrary relation to their signifieds: they represent them

interpretation of a communicative act. Sperber and Wilson justify their

communication (Sperber and Wilson 1995: vii) implies applications

Keywords Text analysis • ST analysis • Multimodal translation • Visual-

– Anchorage: the verbal content supports the visual content