Beruflich Dokumente
Kultur Dokumente
MULTIMODAL
PRAGMATICS AND
TRANSLATION
A New Model for
Source Text Analysis
Sara Dicerto
Palgrave Studies in Translating and Interpreting
Series editor
Margaret Rogers
Department of Languages and Translation
University of Surrey
Guildford, UK
This series examines the crucial role which translation and interpreting in
their myriad forms play at all levels of communication in today’s world,
from the local to the global. Whilst this role is being increasingly recog-
nised in some quarters (for example, through European Union legisla-
tion), in others it remains controversial for economic, political and social
reasons. The rapidly changing landscape of translation and interpreting
practice is accompanied by equally challenging developments in their aca-
demic study, often in an interdisciplinary framework and increasingly
reflecting commonalities between what were once considered to be sepa-
rate disciplines. The books in this series address specific issues in both
translation and interpreting with the aim not only of charting but also of
shaping the discipline with respect to contemporary practice and research.
Multimodal
Pragmatics and
Translation
A New Model for Source Text Analysis
Sara Dicerto
King’s College London
London, UK
If there is one thing I have learnt out of writing this book, it is that life is
multimodal. Meaning comes to us in all forms at all times, and getting the
best out of it depends entirely on our ability to make sense of our acquain-
tances, readings, experiences and circumstances.
As the product of a multimodal life, a book on multimodality (or on
any other topic, for that matter) is hardly the result of a person’s effort,
but rather it is the outcome of a community’s work. For this reason, I
would like to thank the following people for their special contributions:
Ark Globe Academy and King’s College for granting me the time and the
resources to work on this publication;
Prof Sabine Braun and Dr Dimitris Asimakoulas for their continued help
and support when this work was in its infancy;
Prof Margaret Rogers for her extensive feedback and lots of food for
thought;
Giacinto Palmieri for being my academic sparring partner;
v
Contents
References 167
Index 175
vii
List of Figures
ix
List of Tables
xi
CHAPTER 1
Scholars in translation studies have debated for decades what the inform-
ing principle of the activity of translation should be. The roots of this
debate, however, date back to long before the advent of the discipline
itself.
even visit your article’ (Kagan 2014). As the use of non-verbal sources of
meaning in a variety of texts for all sorts of purposes (e.g. technical texts,
illustrated books, comics, websites) is ubiquitous in today’s world, it is
worth considering carefully how these resources (i.e. images and sounds)
interact with the verbally communicated message (written or spoken),
sometimes even changing its meaning drastically, as will be shown. More
specifically, different textual resources can be said to influence each other
and create a multimodal message, the interpretation of which requires dif-
ferent types of literacy and the ability to combine them. Examples of these
types of text come from all domains, and the influence of the multimodal
phenomenon on translation is pervasive—medical texts, promotional
material, catalogues, webpages, advertisements, newspaper articles, com-
ics, user manuals are all translatable materials, and they are just a few
examples of potential STs likely to include elements of multimodality the
translator needs to take into account.
In line with the continuous effort in translation studies to develop rel-
evant frameworks in order to support developments in the discipline with
adequate theoretical tools, this book intends to offer a model for multi-
modal ST analysis that can be used as a tool to improve our understanding
of how multimodal texts are organised to convey meaning and of what this
means when it comes to rendering them into a target text (TT). Therefore,
the central focus of this work is ST analysis—this is defined by Williams
and Chesterman as the area of translation studies that consists of a careful
analysis of the text’s potentially problematic aspects as a step in the prepa-
ration for translation (2014: 9).
The multimodal focus of this book distinguishes it from other work in
the same area, shifting the spotlight from language to a detailed analysis of
how a variety of multimodal text types convey meaning. This work includes
theoretical provision for both static and dynamic multimodal texts (i.e.
respectively, texts including images and written language, and texts which
also make use of moving images, spoken language and/or sound sources),
irrespective of their genre, following Munday’s suggestion that concepts
from research on visual and multimodal communication need to be incor-
porated into the study of all types of translation (2004: 199). The more
detailed discussion in later chapters is, however, focused on the analysis of
verbal-visual interactions in static multimodal texts, for reasons discussed
later in this chapter.
4 S. DICERTO
light of the translation’s purpose. This is often the case with texts includ-
ing visual content; for example, translators of comics generally have to fit
their work around the visual content of the original cartoon, and subtitlers
of audiovisual texts by and large can only intervene in the verbal content
they need to translate, as the visual component cannot be easily adapted.
Among the translation models that explicitly mention multimodal
meaning, Reiss’ research on text types (1977/1989), a precursor to
Skopos theory, is worthy of note. However, Reiss claims that in multi-
modal texts, the verbal content is somehow supported by the presence of
other textual resources. This view seems to have limitations: the role of
other semiotic resources is not only to support the verbal content (or vice
versa), but rather to merge with it to produce a multimodal message.
Although Reiss at first claimed that there were four text types (informa-
tive, operative, expressive and the ‘multimedial’), she later modified her
position, claiming that multimedial texts are actually a ‘hyper-type’, a
‘super-structure for the three basic types’ that ‘possesses its own regulari-
ties, which ought to be taken into account when translating, besides – and
above – the regularities of the three basic forms of written communica-
tion’ (1981/2004: 164). However, these regularities are not investigated
in any detail by Reiss, who in her work was more concerned with the
analysis of the three basic text types she identified than with issues regard-
ing multimodality.
The approach proposed by Snell-Hornby (1995), one that aims to inte-
grate approaches from linguistics and translation, has as its main focus the
linguistic aspect of texts as well; Snell-Hornby, however, acknowledges the
importance of investigating what she terms ‘audiomedial’ texts in later
work, in which she mentions how this ‘might well prove to be a topic
worth resurrecting’ (1997: 288). She later goes on to discuss a few aspects
related to multimodality (2006: 84–90), albeit briefly, pointing out how
virtually no research on multimodal aspects of translation was carried out
until the 1980s; in the same context, Snell-Hornby (2006) proposes a
classification of texts that depend on non-verbal elements (which, using
her terminology, are divided into ‘multimedial’, ‘multimodal’, ‘multisemi-
otic’ and ‘audiomedial’) and reviews studies that deal with translation
challenges closely connected to specific genres of such texts (e.g. the rhet-
oric and speakability of texts that are ‘written to be spoken’). However,
her focus is mainly on the audiomedial category, that is, texts that are
essentially language-based and whose multimodal component lies in the
fact that they are written to be performed (e.g. theatre play scripts). Such
6 S. DICERTO
Simply looking at how translation theory has dealt so far with multi-
modal meaning would produce a rather incomplete result; indeed, the
problem might also be looked at from an alternative perspective, that is,
reviewing research focusing on multimodality that has also dealt with
translation issues. However, existing research on multimodality seems
mostly to address the way visual and verbal content create mutual connec-
tions, without any mention of translation (e.g. Lemke 1998; Marsh and
White 2003; Baldry and Thibault 2005; Martinec and Salway 2005;
Salway and Martinec 2005; Hughes et al. 2007; Pastra 2008; Liu and
O’Halloran 2009; Bateman 2014). In some cases, these studies have only
addressed the topic of visual/verbal relations; in other cases, they have
started a discussion on how these relations are connected with meaning
production. In particular, the work by Bateman (2014) represents a com-
prehensive resource on the various approaches to visual/verbal relations
and the production of multimodal narratives. However, to my knowledge,
at the time of writing, no extensive study on multimodality has taken one
further step, linking visual/verbal relations not only to issues connected to
meaning but also to translation matters.
Research trying to build this connection in relation to written texts still
seems to be in its infancy. In some literature on multimodal translation,
scholars have expressed the view that the larger picture of multimodal
A NEW MODEL FOR SOURCE TEXT ANALYSIS IN TRANSLATION 7
semiotic research on sign systems sets out the first theoretical foundations
of this book. Indeed, a general understanding of the role of different types
of signs in communication is an important first step towards grasping the
general multimodal picture. Major research in the area, such as the studies
carried out by Peirce (1960), Jakobson (1968), Eco (1976) and Barthes
(1977), are used to provide the semiotic basis for the study of meaning as
produced in the different semiotic systems.
Literature on social semiotics is acknowledged for its fundamental
role in influencing current views of multimodality and of the interaction
between signs from different semiotic modes. Social semiotics has
engaged with how semiotically different modes can produce meaning in
cooperation with one another, why this happens and what the possible
implications in terms of text comprehension can be. Studies on this topic
include Van Leeuwen (1999), Kress and Van Leeuwen (2001, 2006),
Norris (2004), Machin (2007) and Liu and O’Halloran (2009). Studies
and reviews on multimodality with specific reference to visual/verbal
relations and their connection to meaning (mainly Marsh and White
2003; Martinec and Salway 2005; Pastra 2008; Bateman 2014) are also
reviewed here.
However, it will be argued that defining multimodal text comprehen-
sion as the comprehension of the message carried by the single modes and
by their interaction would mean taking too narrow a view of how multi-
modal texts convey meaning. Understanding individual textual resources
does not explain fully how meaning is derived from a text, and nor does
understanding their interaction. Indeed, the same message can be com-
municated with different intentions under different circumstances, and
the context in which a message is communicated influences its meaning.
Part of the literature review is therefore dedicated to discussing contextual
influence as explained in pragmatics (Chap. 3), through the work of schol-
ars like Grice (1989) and Sperber and Wilson (1995).
It is well known that pragmatics has already been applied to translation,
but the relevant literature directly addressing multimodal translation is
relatively scarce and fragmented. Pragmatics has influenced some models
in translation studies (see House 1997; Baker 2011 for notable examples),
but in spite of this, the only full-fledged attempt towards a translation
theory in which pragmatics occupies a central role comes from Gutt
(2000). The insights pragmatics can offer into the activity of translation
are analysed to establish to what extent this discipline can influence the
current study on multimodal translation (Chap. 3).
10 S. DICERTO
References
Baker, M. (2011). In Other Words: A Coursebook on Translation (2nd ed.).
New York: Routledge.
Baldry, A., & Thibault, P. J. (2005). Multimodal Transcription and Text Analysis.
London: Equinox.
Barthes, R. (1977). Rhetoric of the Image. In R. Barthes (Ed.), Image–Music–Text
(pp. 32–51). London: Fontana.
Bateman, J. A. (2014). Text and Image: A Critical Introduction to the Visual/
Verbal Divide. Oxon: Routledge.
Baumgarten, N. (2008). Yeah, That’s It!: Verbal Reference to Visual Information
in Film Texts and Film Translations. Meta, LIII(1), 6–24.
Cattrysse, P. (2001). Multimedia & Translation: Methodological Considerations.
In Y. Gambier & H. Gottlieb (Eds.), (Multi) Media Translation: Concepts,
Practices and Research (pp. 1–12). Amsterdam/Philadelphia: Benjamins.
Chaume, F. (2004). Cine y Traducción. Madrid: Ediciones Cátedra.
Chesterman, A. (1997). Memes of Translation. Amsterdam/Philadelphia:
Benjamins.
Chiaro, D., Heiss, C., & Bucaria, C. (Eds.). (2008). Between Text and Image:
Updating Research in Screen Translation. Amsterdam/Philadelphia: Benjamins.
Clark, B. (2011, September). Relevance and Multimodality. Paper Presented at
Analysing Multimodality: Systemic Functional Linguistics Meets Pragmatics,
Loughborough University.
Desilla, L. (2014). Reading Between the Lines, Seeing Beyond the Images: An
Empirical Study on the Comprehension of Implicit Film Dialogue Meaning
Across Cultures. The Translator, 20(2), 194–214.
Desjardins, R. (2017). Translation and Social Media in Theory, in Training and in
Professional Practice. London: Palgrave Macmillan.
Díaz-Cintas, J., & Remael, A. (2007). Audiovisual Translation: Subtitling.
Manchester: St Jerome.
Díaz-Cintas, J. (2004). In Search of a Theoretical Framework for the Study of
Audiovisual Translation. In P. Orero (Ed.), Topics in Audiovisual Translation
(pp. 21–34). Amsterdam/Philadelphia: Benjamins.
Eco, U. (1976). A Theory of Semiotics. Bloomington: Indiana University Press.
Forceville, C., & Clark, B. (2014). Can Pictures Have Explicatures? Linguagem
em (Dis)curso, 14(3), 451–472.
Gambier, Y., & Gottlieb, H. (Eds.). (2001). (Multi) Media Translation: Concepts,
Practices and Research. Amsterdam/Philadelphia: Benjamins.
Grice, H. P. (1989). Studies in the Way of Words. Cambridge, MA: Harvard
University Press.
Gutt, E. A. (2000). Translation and Relevance: Cognition and Context (2nd ed.).
Manchester: St. Jerome.
12 S. DICERTO
–– Expression:
The production of a text will then start with the individuation of its
discourse of reference: for example, a text on fashion would draw on the
social knowledge about fashion and contribute to the debate on that
aspect of our reality. Once the appropriate discourse is identified, there will
be a choice of what semiotic resources to use to convey the message; this
choice may be partly influenced by the genre(s) that are typically associ-
ated with the specific discourse of fashion. The choice of semiotic resources
is then enacted during the process of design, when the most adequate
channel(s) and/or material(s) to convey the message are selected. In this
example, the channel could be a fashion magazine and the materials may
include pictures of models and related items. The actual production of the
message will then happen in various steps (e.g. writing, editing, printing),
and the text so produced will be distributed. The aspect of distribution
addresses the circulation of the message, selecting at what time the mes-
sage should be sent out, to whom, how often it will be recirculated (if
ever) and other distributional details. Kress and Van Leeuwen argue that
each stratum contributes meaning, and notably, that each semiotic mode
can be more or less ‘in the spotlight’ at each stage.
In Kress and Van Leeuwen’s work, these concepts are considered in
multimodal terms to show that the simultaneous presence of different
semiotic resources influences meaning in the four strata of the multimodal
message; at the same time, their discussion of meaning production through
strata remains quite generic, as it does not go into the details of how
exactly different semiotic resources interact in the various domains of
practice and how the analysis of multimodal texts can be practically carried
out using their theoretical framework. The authors move directly from a
generic discussion of the strata to their practical application using a selec-
tion of texts. The practical angle of analysis, albeit interesting and novel,
produces findings that are applicable to the texts under consideration but
are difficult to generalise further in the absence of a detailed supporting
framework explaining how modes interact. The level of interaction
between semiotic resources remains under-researched in Kress and Van
Leeuwen’s work; this aspect of their theoretical proposal finds its justifica-
tion in the inadequacy of existing theoretical frameworks for semiotic
modes other than language:
26 S. DICERTO
Kress and Van Leeuwen’s work does good service providing evidence
that an integrated multimodal analysis is possible, but at the same time, it
does not venture into a systematic description of the common principles
of multimodality; their work supports indirectly the existence of regulari-
ties in multimodal communication, but by and large, it does not set out
generalisable principles of multimodal interaction in relation to meaning.
Their main contribution seems to be to indicate that the simultaneous use
of specific semiotic systems can be analysed in the context of particular
dimensions of practice, in which genre-based differences can be realised.
This approach certainly offers a way of looking at the multimodal phe-
nomenon as the product of ‘layers’ of meaning. At the same time, it does
not provide the systematic guidance on how to understand the interaction
between different modes which was laid out as a prerequisite for building
a model for multimodal analysis in Sect. 2.1.
that two or more signs from different modes form part of the same unit of
meaning due to their proximity and are therefore to be analysed together.
The second is the concept of ‘phase’—namely, a time-based grouping of
items ‘codeployed in a consistent way over a given stretch of text’ (Baldry
and Thibault 2005: 47). The concept of ‘phase’ is used to signal that two
or more signs from different modes are to be analysed in conjunction due
to their simultaneous or near-simultaneous use. This idea is mostly appli-
cable to dynamic texts, as these show a development over time. The idea
of ‘phase’ is taken from Gregory’s work on phasal analysis, in which phases
are described as characterising ‘stretches within discourse […] exhibiting
their own significant and distinctive consistency and congruity’ (2002:
321).
While certainly spatial proximity and chronological co-deployment of
the signs contribute to the meaning of a multimodal text, these relation-
ships only explain a small part of the meaningful interaction between semi-
otic modes. For example, Baldry and Thibault’s multimodal transcription
scheme has been applied to the translation of multimodal texts by Taylor
(2003); however, in Taylor’s work the practical support offered by multi-
modal transcription to two subtitling case studies is limited to choices
concerning the timing of the subtitles (‘spotting’, tied to the concept of
phase) and to suggestions on how to edit down text in subtitles to erase
information already provided by other modes, tied to the concept of clus-
ter. Baldry and Thibault’s model offers little guidance in terms of the
contextual challenges that could be faced by the subtitler, and it only
partly explains the interaction between signs from different modes.
Although this framework of multimodal transcription and text analysis
represents a contribution to studies in multimodality, it seems to fall short
of its ambitious goal to reveal the multimodal basis of a text’s meaning for
two main reasons. Firstly, Baldry and Thibault hint at the influence con-
text exerts on the meaning of any text, but the idea of multimodal mean-
ing being influenced by the circumstances in which the text is produced is
not pursued in their framework, leaving that area of multimodal meaning
unaccounted for. Secondly, the relationship between semiotic modes is
somewhat underdeveloped, as Baldry and Thibault’s work only offers
tools to analyse the spatial proximity and simultaneity of deployment of
resources from different semiotic systems. Reducing the analysis of the
interaction between signs from different semiotic modes to an analysis of
their co-deployment, either in space or in time, appears rather reductive of
the full range of relationships between signs. For example, it is true that
28 S. DICERTO
2.2.3 Taking Stock
The problem of individuating general principles explaining how meaning
is conveyed by multimodal texts has not found a fully satisfactory solution
to date through ‘top-down’ approaches such as the ones described above.
As discussed, Kress and Van Leeuwen, as well as Baldry and Thibault,
realise the necessity of using an approach that analyses various layers of
meaning-making in multimodal texts, albeit proposing different views on
how this approach should be systematised. As argued here, Kress and Van
Leeuwen do not engage with principles of interaction between semiotic
modes, and Baldry and Thibault do so purely from a point of view of spa-
tial and chronological co-deployment. Such studies are important steps
towards a more complete view of multimodality, but it is important to
acknowledge that their limitations do not allow them to build a compre-
hensive picture of how multimodal meaning is conveyed and therefore
how to approach the analysis of multimodal texts for translation.
Approaching the problem ‘bottom-up’ by looking at specific types of
multimodal interaction to identify possible homogeneous traits across
modes that can be further generalised does not seem particularly fruitful
either. In such studies, so far researchers have needed to face the issue of
diversity in potential for analysis among the various modes; as noted by
ON THE ROAD TO MULTIMODALITY: SEMIOTICS 29
Kress and Van Leeuwen, non-linguistic modes have received far less atten-
tion than language in academic and professional discussion fora. This has
led to analytical resources being widely available for language and less so
for other modes, resulting in comparisons across modes being difficult to
conduct due to the lack of analytical tools (see, e.g. the work by Norris
(2004) on the topic of multimodal interaction and by Machin (2007) on
the interplay between images and language). This issue means that a ‘bot-
tom-up’ approach based on current knowledge might well produce unbal-
anced analytical frameworks focused on the analysis of one semiotic system,
which might work to explain particular types of multimodal interaction
without grasping the overarching communicative principle of multimodal
texts.
Nevertheless, studies on multimodality are growing in number, and
research on the regularities of the single modes is growing, too. Before
Kress and Van Leeuwen’s proposal on multimodal discourse, analytical
frameworks for non-linguistic semiotic resources (some of which are out-
lined below) had already been gaining ground. The dominance of lan-
guage over all the other semiotic modes has arguably started to fade in the
last decades, during which studies on aural and visual meaning have started
attracting the academic spotlight (e.g. Kress and Van Leeuwen 2006 on
visual meaning and Van Leeuwen 1999 on aural meaning). This perhaps
follows the realisation that modern means of communication, that is,
‘new’ media such as the internet, nowadays allow a growing number of
users to create messages that contain much more than ‘just’ words.
If a form of communication exists, the assumption is that it is likely to
involve the existence of a shared code allowing users to communicate with
each other; different forms of communication are thus likely to require
different forms of ‘literacy’. On this basis, research has been carried out to
establish if some sort of ‘grammar’ for semiotic modes other than lan-
guage exists, given their growing importance in a number of modern
genres (and, consequently, their translation). The word ‘grammar’ is used
to suggest a parallel between the formal mechanisms of language and
those of other semiotic systems, without this implying a similarity in these
regularities or the assumption that an analysis of these regularities should
be carried out by similar means.
Following this assumption, Kress and Van Leeuwen (1996/2006)
studied and proposed what they call a ‘grammar of images’, following
Halliday’s definition of grammar as ‘a means of representing patterns of
experience’ (1985: 101). In order not to create confusion between their
30 S. DICERTO
References
Aguiar, D., & Queiroz, J. (2010). Modeling Intersemiotic Translation: Notes
Toward a Peircean Approach [online]. Available at: http://french.chass.uto-
ronto.ca/as-sa/ASSA-No24/Article6en.htm. Last accessed 26 June 2017.
Anstey, M., & Bull, G. (2010). Helping Teachers to Explore Multimodal Texts.
Curriculum Leadership [online]. Available at: http://cmslive.curriculum.edu.
au/leader/default.asp?id=31522&issueID=12141. Last accessed 26 June
2017.
Baldry, A., & Thibault, P. J. (2005). Multimodal Transcription and Text Analysis.
London: Equinox.
Barthes, R. (1977). Rhetoric of the Image. In R. Barthes (Ed.), Image–Music–Text
(pp. 32–51). London: Fontana.
Bateman, J. A. (2014). Text and Image: A Critical Introduction to the Visual/
Verbal Divide. Oxon: Routledge.
Baumgarten, N. (2008). Yeah, That’s It!: Verbal Reference to Visual Information
in Film Texts and Film Translations. Meta, LIII(1), 6–24.
Bublitz, W. (1999). Introduction: Views of Coherence. In W. Bublitz, U. Lenk, &
E. Ventola (Eds.), Coherence in Spoken and Written Discourse. How to Create It
and How to Describe It (pp. 1–7). Amsterdam/Philadelphia: Benjamins.
Chandler, D. (2007). Semiotics: The Basics. New York: Routledge.
Díaz-Cintas, J., & Remael, A. (2007). Audiovisual Translation: Subtitling.
Manchester: St Jerome.
Durant, A., & Lambrou, M. (2009). Language and Media: A Resource Book for
Students. New York: Routledge.
Eco, U. (1976). A Theory of Semiotics. Bloomington: Indiana University Press.
Gregory, M. (2002). Phasal Analysis Within Communication Linguistics: Two
Contrastive Discourses. In P. H. Fries, M. Cummings, D. Lockwood, &
W. Sprueill (Eds.), Relations and Functions in Language and Discourse
(pp. 316–345). London: Continuum.
ON THE ROAD TO MULTIMODALITY: SEMIOTICS 35
Gutt, E. A. (2000). Translation and Relevance: Cognition and Context (2nd ed.).
Manchester: St. Jerome.
Hagan, M. S. (2007). Visual/Verbal Collaboration in Print: Complementary
Differences, Necessary Ties, and Untapped Rhetorical Opportunity. Written
Communication, 24(1), 49–83.
Halliday, M. A. K. (1985/1994). An Introduction to Functional Grammar (2nd
ed.). London: Edward Arnold.
Halliday, M. A. K., & Webster, J. J. (2003). On Language and Linguistics. London:
Continuum.
Jakobson, R. (1968). Language in Relation to Other Communication Systems. In
R. Jakobson (1971) Selected Writings. The Hague: Mouton.
Kress, G. (2000). Multimodality: Challenges to Thinking About Language.
TESOL Quarterly, 34(2), 337–340.
Kress, G., & Van Leeuwen, T. (1996/2006). Reading Images: The Grammar of
Visual Design. London: Routledge.
Kress, G., & Van Leeuwen, T. (2001). Multimodal Discourse: The Modes and
Media of Contemporary Communication. London: Hodder Arnold.
Machin, D. (2007). Introduction to Multimodal Analysis. New York: Oxford
University Press.
Marsh, E. E., & White, M. D. (2003). A Taxonomy of Relationships Between
Images and Text. Journal of Documentation, 59(6), 647–672.
Martinec, R., & Salway, A. (2005). A System for Image-text Relations in New
(and Old) Media. Visual Communication, 4(3), 337–371.
Multimodal Analysis Company. (2013). Concept [online]. Available at: http://
multimodal-analysis.com/about/concept/. Last accessed 16 June 2017.
Norris, S. (2004). Analysing Multimodal Interaction: A Methodological Framework.
London: Routledge.
Olson, D. R. (1994). The World on Paper: The Conceptual and Cognitive
Implications of Writing and Reading. Cambridge: Cambridge University Press.
Pastra, K. (2008). COSMOROE: A Cross-Media Relations Framework for
Modelling Multimedia Dialectics. Multimedia Systems, 14, 299–323.
Peirce, C. S. (1960). Collected Writings. Cambridge: Harvard University Press.
Saussure, F. (1916/1983). Course in General Linguistics (R. Harris, Trans.).
London: Duckworth.
Stöckl, H. (2004). In Between Modes: Language and Image in Printed Media. In
E. Ventola, C. Cassily, & M. Kaltenbacher (Eds.), Perspectives on Multimodality
(pp. 9–30). Amsterdam/Philadelphia: Benjamins.
Taylor, C. J. (2003). Multimodal Transcription in the Analysis, Translation and
Subtitling of Italian Films. The Translator, 9(2), 191–205.
Van Leeuwen, T. (1999). Speech, Music, Sound. London: Macmillan.
Van Leeuwen, T. (2006). Typographic Meaning. Visual Communication, 4,
137–143.
CHAPTER 3
[t]he differences of structure and use between spoken and written language
are inevitable, because they are the product of radically different kinds of
communicative situation. Speech is time-bound, dynamic, transient – part of
an interaction in which, typically, both participants are present, and the
speaker has a specific addressee (or group of addressees) in mind. Writing is
space-bound, static, permanent – the result of a situation in which, typically,
the producer is distant from the recipient – and, often, may not even know
who the recipient is (as with most literature). Writing can only occasionally
be thought of as an ‘interaction’ in the same way as speech […].
for texts in languages that do not normally attribute agency to the subjects
of certain verbs). Through textual agency, pragmatics could be used to
analyse a text based on the text-reader shared cognitive environment
rather than the writer-reader shared cognitive environment. This would
bring ‘speaker’ and reader together in the same place, at the same time
eliminating de facto some of the aforementioned problems relating to
temporal/spatial mismatch between writer and reader.
From a practical point of view, the notion of textual agency intends to
bring text and reader close together. However, the text still only makes
sense in relation to its author, the context in which it was produced, the
purpose for which it was produced and the audience addressed by the
writer, and these are elements that cannot be ignored in a textual analysis.
Therefore, the notion of textual agency does not bridge the shared envi-
ronment gap for analytical purposes, and is not upheld by this study. As
Hyland (2005) states, writers compose their messages with a reader-
oriented focus, aiming to guide them towards a certain action or under-
standing; the text is therefore not an independent entity, but rather a tool
created by its writer as the carrier of that intention. Also, assigning agency
to texts clashes with the view of coherence supported by this study, which
sees coherence as only text-based (as opposed to text-inherent); the source
of coherence is the user who interprets the text in connection with ‘the
linguistic context, the socio-cultural environment, the valid communica-
tive principles and maxims and the interpreter’s encyclopedic knowledge’
(Bublitz 1999: 2), rather than the text itself. This is all the more important
in a study on translation, where differences between the context of the ST
and that of the TT are a major concern.
The point of view proposed in this study, then, is that there is no need
for a notional ‘patch’ that would allow pragmatics to be applied to written
texts, as Cooren has argued: these texts consist of verbal content designed
to carry a ‘writer’s meaning’ whose interpretation by the reader is possible
and is partly dependent on usage in context. Thus, pragmatics can and
should be applied to written texts. However, suggesting such an approach
means coming to terms with its inherent limitations: for the reasons previ-
ously mentioned, establishing the shared cognitive environment between
writer and reader is far more problematic than for speaker and hearer, and
this represents a difficulty that needs to be acknowledged. From the point
of view of an external observer of the writer-reader communication, the
readership of a certain text is made up of a group of people whose indi-
vidual cognitive environments are unknown for the most part and not
MULTIMODAL MEANING IN CONTEXT: PRAGMATICS 41
Writing, Phaedrus, has this strange quality, and is very like painting; for the
creatures of painting stand like living beings, but if one asks them a ques-
tion, they preserve a solemn silence. And so it is with written words; you
might think they spoke as if they had intelligence, but if you question them,
wishing to know about their sayings, they always say only one and the same
thing. And every word, when once it is written, is bandied about, alike
among those who understand and those who have no interest in it, and it
knows not to whom to speak or not to speak; when ill-treated or unjustly
reviled it always needs its father to help it; for it has no power to protect or
help itself. (Plato, Phaedrus: 275d–e)
Given that writer and reader often do not communicate in any way
external to the written words of a message, a writer will write being aware
that the knowledge available to the audience who will process the message
is limited. This influences the composition of the message itself in the
form of a conscious effort to make it understandable ‘as is’ (referred to as
‘recipient design’ by Sacks and Schegloff 1979). As written texts may be
the sole form of communication between writer and reader (e.g. advertise-
ments are often a one-way channel between a company and potential cus-
tomers), multimodal content can be advantageous in their composition: as
seen previously in this chapter, semiotic modes other than the verbal can
provide information that language would convey perhaps less efficiently or
with more difficulty (e.g. the way an object looks), helping the message
sender guide potential recipients towards their intended interpretation
more quickly and precisely. Therefore, in my view, the tendency of some
42 S. DICERTO
and usage in context (2007: 8), exactly as observed by Sperber and Wilson
for language. Therefore, it can be concluded that the same (de)coding and
inferential processes apply to linguistic as well as other types of sign and
that in this sense the way we communicate through language is no differ-
ent than through other modes. As Sperber and Wilson’s theory was devel-
oped based on coding and inferentiality as the two ‘gears’ of communication,
RT is a good candidate for application to the interpretation of texts includ-
ing a variety of types of signs.
It must also be noted that, in a multimodal text, each mode finds its
most immediate context of reference in the other mode(s), and this influ-
ences the usage of each semiotic system. Since information can be drawn
from different sources, in multimodal texts, the message communicated
by a single mode is incomplete without the remaining information, as a
mode can rely on the other to express what it has left unexpressed or to
enhance its meaning: usage in context—and hence pragmatics—becomes
then a key factor in multimodality, possibly even more so than in ‘mono-
modal’ texts.
Nevertheless, RT is not the only pragmatic approach available, and sug-
gesting an application of pragmatics to multimodality means having to
face a choice about which theoretical framework should be applied. The
two major pragmatic approaches to meaning explanation are the Gricean
model on cooperativeness (developed in 1967, and subsequently elabo-
rated by Levinson 1983, among others) and the already mentioned RT, a
relevance-based framework first published by Sperber and Wilson in 1986.
While differing in the view they take towards the principle that informs
the inferential system, the two approaches show an array of commonali-
ties; arguably, this still does not allow for the identification of enough
common ground to proceed to an analysis that uses either approach inter-
changeably without significant variations. Whether communication is ana-
lysed in terms of cooperativeness or in terms of relevance, both frameworks
are meant to cover the meaning of utterances in context as a whole, and
from a strictly practical point of view, they differ mainly in what they
ascribe to explicit or implicit meaning production. However, the focus
here is not on the pragmatic question of whether certain meanings are
explicitly or implicitly communicated, but rather on the overall meaning
assigned by the audience to a text and how this can be dealt with by a
translator operating in yet another cognitive environment.
According to both Grice and Sperber and Wilson, inferentiality is the
principle that informs communication itself and is at the basis of any
44 S. DICERTO
1. Maxim of Quality
Try to make your contribution one that is true
2. Maxim of Quantity
Make your contribution as informative as is required
3. Maxim of Relation
Be relevant
4. Maxim of Manner
Be perspicuous.
(Grice 1989: 26–27)
leave the topic aside as far as possible. Regardless of B’s rationale for his
communicative choice, A finds herself in a position where her question has
received what at first seems like an irrelevant answer; however, instead of
perhaps choosing to reiterate the question to obtain the desired informa-
tion, A may try to reconcile the violation of the maxim of relation with an
assumption that B is trying to be cooperative and that therefore his answer
is designed to be relevant to the communicative exchange—this will allow A to
get to B’s intention to communicate by means of an implicature, that is, an
additional meaning communicated by an utterance in light of its context.
Grice’s view on communication hints at a clear-cut division between
the roles of semantics and pragmatics in language comprehension: when
the semantic content of an utterance does not obey the conversational
maxims, an inferential analysis aimed at reconciling the semantic meaning
with the cooperative principle is needed. During this inferential analysis,
the hearer may be able to identify additional meanings (implicatures)
which will help them understand the speaker’s meaning.
Grice’s ‘division of labour’ has been attacked and criticised on many an
occasion, even by scholars supporting a neo-Gricean approach. It is nowa-
days widely accepted that Grice’s theory of communication implies a few
inner contradictions, most notably what Levinson calls ‘Grice’s circle’
(2000: 186). Levinson sets out to demonstrate that Grice’s claim about
the division between semantics and pragmatics is not tenable: inferential
processes of the same type as the ones that determine implicatures are
involved in the determination of ‘what is said’, which Grice says semantics
should account for. For example, pronouns (and, in general, deictics) are
part of ‘what is said’, but the determination of their reference is context-
bound and inferential, in the same way as implicatures are. Levinson
attempts to resolve this inner contradiction of the Gricean theory by sug-
gesting two levels of intervention of inferential processes in meaning
detection, as summarised by Fig. 3.1.
This line of thought still sees meaning interpretation as a step-by-step
process starting with the semantic representation of an utterance and end-
ing with the determination of the speaker’s meaning by inferential means.
The division between the role played by semantics and pragmatics is no
longer clear-cut, as inferential processes intervene in the semantic inter-
pretation, but meaning analysis as a whole is still seen as progressive and
made up of sequential steps.
A different view is offered by Sperber and Wilson. As anticipated, they
oppose the Gricean and neo-Gricean view that communication is ruled by
a principle of cooperativeness and a set of maxims that accompany it, claiming
MULTIMODAL MEANING IN CONTEXT: PRAGMATICS 47
have put it in the form we thought was optimal for the recipient to process
in order to obtain information about what we wanted to communicate
with it. In RT terms, our message will make use of stimuli (i.e. utterances
in the case of a linguistic message), which communicate at an explicit level;
information that is not explicitly communicated by the stimuli (like B’s
unwillingness to discuss his relationship with Emma in the example above)
is regarded as implicit and termed an implicature. However, in contrast to
Levinson’s viewpoint, the information derived from deictics and pronouns
is not considered as implicitly communicated, as these elements of the
message still form part of the stimulus and do not provide information
different than the stimulus itself. The information they convey is neverthe-
less the result of an inferential process, and therefore they are considered
as a separate type of explicit meaning called explicatures.
To sum up the differences between the two major theories in pragmat-
ics, the Gricean model suggests a division of labour in which ‘what is said’
prepares the ground for ‘what is implicated’. In this view, ‘what is said’
roughly corresponds to the semantic meaning of an utterance, while impli-
catures are its inferential components. Modern developments of Grice’s
theory, such as Levinson’s work, have conceded that inferential processes
intervene in the semantic interpretation (e.g. when we need to assign a
reference to a pronoun) and that the information that can be derived from
these processes is not to be considered as part of the explicit content of the
utterance, that is, it is implicit. In Sperber and Wilson’s view, although the
retrieval of the information connected to processes such as reference
assignment happens through inferential means, the role of this informa-
tion has no other function than to ‘flesh out’ the blueprint delivered by
grammar (Blakemore 1992: 59). For this reason, RT sees these processes
as forming part of what is explicitly communicated by an utterance, called
by Blakemore the ‘proposition expressed’ (1992: 65–90). In RT, implica-
tures only account for additional meanings not stated in the semantic
content.1
1
It must be acknowledged that the division between explicit and implicated meaning is not
agreed upon by all pragmaticians: Bach’s contribution to this debate was to propose the idea
of a third category of meaning falling between the two sides of the ‘pragmatic fence’, namely,
the category of impliciture (1994). In Bach’s view, developments of the logical form of an
utterance are not to be considered as part of either the explicit or the implicated meaning,
falling somewhere in between: these elements of meaning are not uttered (and hence not said
in a Gricean view), but at the same time, they do not contribute any additional proposition
either, thus not qualifying fully as either explicit or implicit meaning.
MULTIMODAL MEANING IN CONTEXT: PRAGMATICS 49
Grice’s cooperative principle and its maxims seem to be quite well suited
for the analysis of stimuli that come in the form of utterances. However,
when talking about static texts, and of multimodal texts as a sub-category
of the static type, it is difficult to imagine how a principle of cooperative-
ness would work between a sender and a recipient who do not have a
chance to interact, for the reasons of spatial and temporal mismatch con-
sidered previously. How could a sender be cooperative with an audience
with whom they cannot have a contemporaneous dialogic interaction, and
who is likely to approach the text from different backgrounds, with differ-
ent cognitive environments, and perhaps at different historical moments?
This is even more so the case when translation is added to the mix.
Also, as considered in Sect. 2.2, not all semiotic modes have the same
levels of organisation, which means that in some cases, there might not be
a clear code, a standard against which a stimulus can be compared in order
to individuate possible communicative violations. The types of stimuli at
play in a multimodal text are many more than those in single utterances or
talk exchanges, and a Gricean model may therefore not be easily applicable
to the type of analysis attempted here. The information non-linguistic
signs provide might be difficult to quantify and/or qualify in terms of its
truth value or clarity. It could, however, be relevant or irrelevant to each
message recipient depending on their personal interpretation.
In Sperber and Wilson’s work, it is possible to find an enlightening
example in this sense:
For example, Peter asks Mary, […] How are you feeling today?
Mary responds by pulling a bottle of aspirin out of her bag and showing
it to him. Her behaviour is not coded: there is no rule or convention which
says that displaying a bottle of aspirin means that one is not feeling well.
(Sperber and Wilson 1995: 25, my emphasis)
MULTIMODAL MEANING IN CONTEXT: PRAGMATICS 51
2
Leech (1983), for example, discusses maxims related to tact and politeness: debate about
these additional maxims is still open, as the much more recent article by Pfister (2010)
demonstrates.
52 S. DICERTO
often read […] that in recent theories of translation linguistics is left behind.
This may be right if by the term ‘linguistics’ one understands the more for-
mal or syntactic oriented theories […]. However, it cannot be more untrue
if one considers pragmatics […] as that component of linguistics that can
greatly inform, and has indeed informed, various approaches in translation
studies. (2009: 64)
reflecting the same cognitive world as the original (2009: 70) is hardly ten-
able, even just because of the temporal and/or spatial mismatch between
the ST and the TT (see also Sect. 3.1). Nevertheless, Kitis’s view of texts as
instances of communication embedded in a series of contextual and cultural
layers supports the idea that translation should be seen mainly from a prag-
matic point of view, although her comments are confined to language.
Both Bernardo and Hatim and Mason argue in favour of seeing transla-
tion as a pragmatic process. Guidelines on the scope and boundaries of
such a pragmatic approach are provided by Hickey, who maintains that a
thorough pragmatic framework for translation should
[T]he following important points can be drawn out. Firstly, the translator
must be seen and must see himself clearly as a communicator addressing the
receptor language audience: whatever his view of translation […], he always
has an informative intention which the translated text is to convey to the
receptor language audience. […] Secondly, […] whatever he does in his
56 S. DICERTO
that leads translators in the selection of the analytic and contextual attri-
butes a TT needs to feature in order to interpretively resemble its ST.
The strategies employed by the translator for the production of a mul-
timodal TT and any required reorganisation of the text will therefore
depend on their ability to achieve optimal interpretive resemblance. Any
constraints that might hinder this ability need to be taken into account,
and they might result in the need to reorganise the TT’s explicit and
implicit content. Therefore, the notion of interpretive resemblance will
constitute the frame of reference for how the translator manages the indi-
vidual textual resources and their interaction.
Moving into the next chapter, Gutt’s notion of interpretive resem-
blance is used as the cornerstone to develop the model for multimodal text
analysis along three different dimensions, reflecting the need for the trans-
lator to address the analytic and contextual attributes of the text in her
work. As the pragmatic view adopted in this work is central to the model
hereby proposed, the first section further develops the idea introduced in
Sect. 3.1 of using RT to account for multimodal communication and sets
the boundaries and scope of the pragmatic dimension of the model. Then,
existing frameworks aiming to explain the logico-semantic relationships
that can be created between visual and verbal content are discussed as a
second step towards the full development of the model, to support trans-
lators in understanding such relationships in the ST and modelling them
in the TT. Lastly, the considerations made so far concerning the qualities
of the different individual modes and on communication through sign
systems are integrated into the model to discuss the use of individual tex-
tual resources according to different communicative purposes.
References
Bach, K. (1994). Conversational Impliciture. Mind & Language, 9(2), 124–162.
Baker, M. (2011). In Other Words: A Coursebook on Translation (2nd ed.).
New York: Routledge.
Bernardo, A. M. (2010). Translation as Text Transfer—Pragmatic Implications.
Estudos Linguisticos / Linguistic Studies, 5, 107–115.
Blakemore, D. (1992). Understanding Utterances: An Introduction to Pragmatics.
Oxford: Blackwell.
Braun, S. (2016). The Importance of Being Relevant? A Cognitive-Pragmatic
Framework for Conceptualising Audiovisual Translation. Target, 28(2),
302–313.
58 S. DICERTO
Sacks, H., & Schegloff, E. A. (1979). Two Preferences in the Organization of
Reference to Persons in Conversation and Their Interaction. In G. Psathas
(Ed.), Everyday Language: Studies in Ethnomethodology (pp. 15–21). New York:
Irvington.
Searle, J. (1969). Speech Acts. Cambridge: Cambridge University Press.
Sperber, D., & Wilson, D. (1986/1995). Relevance: Communication and
Cognition (2nd ed.). Oxford: Blackwell.
Tanaka, K. (1999). Advertising Language: A Pragmatic Approach to Advertisements
in Britain and Japan. London: Routledge.
Thompson, S. (1978). Modern English from a Typological Point of View: Some
Implications of the Function of Word Order. Linguistische Berichte, 54, 19–35.
Wilson, D., & Sperber, D. (1988). Representation and Relevance. In R. M.
Kempson (Ed.), Mental Representations: The Interface Between Language and
Reality. Cambridge: Cambridge University Press.
Yus Ramos, F. (1998). Relevance Theory and Media Discourse: A Verbal-Visual
Model of Communication. Poetics, 25, 293–309.
CHAPTER 4
Abstract The approach used in this chapter first sets the most general
dimension of analysis, and subsequently identifies more specific dimen-
sions, looking into more detail at the various components of a ST and
their interaction. A pragmatic perspective is adopted as the most general
dimension of multimodal analysis; indeed, the ultimate purpose of a mul-
timodal message recipient is to recognise the sender’s intention.
Multimodal meaning is therefore analysed according to the distinction
between explicit and implicit meaning in Relevance Theory. The second
dimension of analysis concerns the visual-verbal relations and how these
contribute to multimodal message formation. Finally, the third dimension
of analysis regards the meaning of individual modes.
is derived both from code analysis (e.g. grammar) and inferential activity.
The existence of a grammar is what provides the end receiver with guid-
ance on how the individual words of a linguistic message are to be inter-
preted when put together and it informs the inferential activity. On the
other hand, as previously argued, multimodal messages are the product of
the interaction between elements from different modes, some of which
might not have a grammar (see Sect. 2.2 regarding the visual and verbal
modes).
If these messages were dealt with in the same way as monomodal ones,
each mode would be dealt with separately in terms of its identifiable regu-
larities (regardless of whether these can be defined as amounting to a
‘grammar’), its usage in context and the inferential information that can
be derived from it (as if it were a monomodal text in its own right). As
already observed in Sect. 2.2, however, a separate analysis of the different
modes in a multimodal text does not satisfactorily account for the message
conveyed, as this is to be interpreted as a single entity. The presence of
multiple modes influences the meaning which each suggests to the receiver,
and the information they provide separately is often partial, different or
even irrelevant to the audience if the multimodal interaction is not anal-
ysed. These elements can be compared to cogs in a clock: their presence is
as important as their interaction in terms of the overall functioning of the
clock, and analysing each of them separately is unlikely to determine
whether the clock works and what time it marks.
However, if it is true that no set of rules connects items belonging to
different modes, it is legitimate to wonder why end receivers consider the
different components of a multimodal message as part of a single textual
unit, how the unit’s internal coherence is perceived and analysed and what
types of connections exist between signs from different modes. It is my
claim that what makes the end receiver analyse the content of the various
modes as part of a single, multimodal structure, is indeed Sperber and
Wilson’s presumption of optimal relevance: the presumption that the multi-
modal form is considered by the sender as the optimal form of communi-
cation of a particular message in a particular context, indeed, entails that
the different modes are to be considered as conveying a single message,
and that they have to be processed as interrelated components of a single
textual unit.
If what binds together the different modes is this pragmatic presump-
tion of optimal relevance, the interaction of the modes does not happen,
66 S. DICERTO
thanks to a grammar, but rather, it has an inferential basis. Far from pro-
viding strict combinatory rules, the presumption of optimal relevance sug-
gests that connections between the different modes must exist and that
they are to be identified in order to lead us to the recognition of inten-
tions. It is then to the presumption of optimal relevance that a multimodal
analysis has to make reference in order to identify the sender’s intention
and the textual organisation chosen by the sender as optimally relevant to
communicate the intention itself. If the ‘communicative glue’ of multi-
modal texts is the presumption of optimal relevance, then relevance is the
general principle all the other dimensions of analysis have to refer to and
comply with, and multimodal texts can be profitably analysed from a
relevance-theoretic perspective.
As discussed in Chap. 3, RT supports the view that, other than through
the semantic meaning derived from linguistic encoding/decoding, recipi-
ents identify the sender’s intentions also through explicatures and impli-
catures, namely, inferential meanings brought about by the use of language
in a certain context. These inferential meanings are retrieved on the basis
of the semantic representation of utterances among other factors; they
are so important in communication that their existence can change dra-
matically the meaning that would normally be associated with the seman-
tic representation.
As seen in Chap. 3, RT was developed mostly on a linguistic basis—
therefore, the concepts of ‘semantic representation’, ‘explicature’ and
‘implicature’ were elaborated for language only. It is thus important to
assess how adaptable to multimodal communication such concepts are.
Do multimodal messages have a semantic representation? Are they capable
of suggesting explicatures and implicatures? In what follows, we start from
the presence of a multimodal semantic representation and move on to
discussing the ability of multimodal texts to generate explicatures and
implicatures.
The semantic representation of an utterance is also called its ‘logical
form’ in Sperber and Wilson’s work, and it is defined as follows:
Explicatures Implicatures
72 S. DICERTO
This last point introduces the next section. While the presumption of
optimal relevance governs the reconstruction of the semantic representa-
tion in multimodal texts, what this means in practical terms still needs to
be explained. Section 4.2 therefore analyses some frameworks meant to
explain the relationships occurring between visual and verbal content to
gain an insight into multimodal textual organisation. It shows how the
presumption of optimal relevance is the key to the identification of the
logico-semantic relationships between visual and verbal content that play
a role in shaping the semantic representation of a multimodal text.
Fig. 4.2 Relationships of status, after Martinec and Salway (2003: 351)
ANALYSING MULTIMODAL SOURCE TEXTS FOR TRANSLATION: A PROPOSAL 75
–– Elaboration: the clause restates the main clause in other words, re-
exposing or exemplifying it;
–– Extension: the expansion is carried out with the addition of new ele-
ments, alternatives or exceptions;
–– Enhancement: the clause embellishes its main clause adding circum-
stantial features like time, space and causal relations.
Martinec and Salway argue that Halliday’s model of clause relationships
can also account for the logico-semantic contributions images and lan-
guage can add to each other. Halliday’s clause relationships, summarised
in the following diagram, are applied by Martinec and Salway to visual-
verbal relationships as in Fig. 4.3.
Martinec and Salway’s claim is that the two systems (status and logico-
semantic relationships) complement each other in describing the types of
relationship existing between images and language; therefore, in their
view, a visual-verbal relation should be identified in terms not only of the
relative status of signs but also of their logico-semantic connection, select-
ing a relationship from each of the two systems. Halliday originally pro-
posed his taxonomy to account for both paratactic and hypotactic relations,
and therefore, his taxonomy can apply equally well to anchorage, illustra-
tion and relay as the relevant status; indeed, these are the multimodal
equivalents to paratactic (relay) and hypotactic (anchorage and illustra-
tion) clause relations, due to the difference in status between the contribu-
tions made by the different modes to the message.
However, it is worth asking whether selecting from the two systems
simultaneously, as Martinec and Salway suggest, might be useful for
describing multimodal relations for the purpose of translation analysis. If
texts are thought of not as the result of an addition, but as the result of a
multiplication of semiotically different meanings, it seems unlikely that
there is a well demarcated relationship of equal or subordinate status
between textual resources in multimodal texts. While the ‘quantity’ of
information provided by each mode, assuming that this could be mea-
sured in some way, could be a criterion to determine status, there can be
texts in which a single sign from one mode can influence to a great extent
the meaning of an even relatively large set of signs from another mode; for
example, the meaning of a page full of verbal instructions can be entirely
reframed, or anyway drastically altered, if the page is crossed out in red
pen. In this example, the outnumbered single sign could, from a certain
perspective, be claimed to be ‘dominant’ because of the influence it exerts
ANALYSING MULTIMODAL SOURCE TEXTS FOR TRANSLATION: A PROPOSAL 77
This visual grammar must have a lexicon of elements that can be chosen to
create meaning in combinations [and] there must be a finite system of rules
for combination of elements. In the first case we have found that it is hard to
identify visual elements in themselves. […] In the second case we have not
found an arbitrary code that is the first layer of meaning. (Machin 2007: 176)
Table 4.3 Tripartite structure of the model for the analysis of multimodal STs
Sender’s meaning
The concept of multimodal text development over space and time is cen-
tral to Baldry and Thibault’s work on multimodal transcription and text
analysis. These two forms of multimodal text development are based on
the concepts of cluster and phase, respectively, as discussed in Chap. 2.
We can recall that a cluster is a local grouping of items. The concept of
cluster is particularly relevant to the analysis of static multimodal texts
(Baldry and Thibault 2005: 31), which only develop over space, as these
tend to show more or less well-defined groupings of items useful for guid-
ing the message recipient through the various textual ‘areas’. Elements
belonging to different clusters in the same text are normally separated
visually, for example, with a line demarcating the top part of a leaflet from
the bottom, indicating that the two parts are to be considered as different
‘spatial units’. Without the use of clear markers of separation, a cluster may
be identified on the basis that all its elements are found within close prox-
imity of each other, and further removed from others. These elements can
be from different semiotic modes, and a cluster may contain images and
verbal content.
Leaflets, for example, show clusters marked by paper foldings; within
the same cluster, there may be further subclusters, for example, at the top,
centre and bottom of each cluster. Some genres within static multimodal
texts are more reliant on a cluster-based organisation than others. We can
think of comics as clear examples of multimodal texts in which clusters are
used to help the reader to follow the development of the narrative effec-
tively through frames. The heavy use of clusters in comics is so embedded
in the genre that comic readers expect to receive this type of organisational
guidance. The use of this technique in comics also gives the impression of
text development over time, as cluster sequences are strongly associated
with the passing of time, and exceptions (e.g. flashbacks) may need to be
signalled to the reader.
This very specific use of clusters makes comics into one of the few genres
of static multimodal texts in which spatial and temporal development are
claimed to coincide (McCloud 1994: 100), although it has been success-
fully argued that panel sequences are not always used to communicate a
temporal development (Cohn 2007). However, in the context of static
multimodal texts, development over time does not really happen—we may
only have the impression of it. Indeed, the analytical framework for static
ANALYSING MULTIMODAL SOURCE TEXTS FOR TRANSLATION: A PROPOSAL 91
Table 4.4 Transcription table for static multimodal texts, after Baldry and
Thibault (2005: 29)
Cluster Textual resources used in the clusters
Image of a cluster Information about the textual resources contained in the cluster, for
within the text example, wordings, visual images, spatial disposition of the items, size
of the cluster
92 S. DICERTO
Table 4.5 Transcription table for dynamic multimodal texts, after Baldry and
Thibault (2005)
Phase Cluster Textual resources used in the clusters
Grouping of items Semantic representation of individual modes Semantic representation of Inferential meanings
multimodal text
Grouping of Semantic representation of individual modes Semantic representation of multimodal text Inferential meanings
items
References
Baldry, A., & Thibault, P. J. (2005). Multimodal Transcription and Text Analysis.
London: Equinox.
Barthes, R. (1977). Rhetoric of the Image. In R. Barthes (Ed.), Image–Music–Text
(pp. 32–51). London: Fontana.
Berinstein, P. (1997). Moving Multimedia: The Information Value in Images.
Searcher, 5(8), 40–49.
Cohn, N. (2007). A Visual Lexicon. The Public Journal of Semiotics, 1(1), 35–56.
ANALYSING MULTIMODAL SOURCE TEXTS FOR TRANSLATION: A PROPOSAL 97
Abstract This chapter applies the model for multimodal ST analysis to inves-
tigate the organisation of some real multimodal texts for translation purposes.
Given the general scope of the book, several multimodal texts are used, select-
ing them from the different types outlined by Reiss (Text Types, Translation
Types and Translation Assessment. In A. Chesterman (Trans. & Ed.),
Readings in Translation Theory (pp. 105–115). Helsinki: Oy Finn Lectura
Ab, 1977/1989). Based on this choice, texts are organised into expressive,
informative and operative. At the beginning of each textual analysis, some
contextual information on the circumstances of publication of the message
and its author is included. Then, the model is applied to investigate the mul-
timodal organisation of the text, in order to identify potentially problematic
areas for its translation. Finally, potential solutions to the individual transla-
tion challenges are discussed in terms of how the applicable strategies may
affect the level of interpretive resemblance between source and target text.
In order to explore the model proposed in Chap. 4 for the analysis of static
multimodal texts for translation purposes, this chapter gives a demonstra-
tion of its application to a set of multimodal texts. The chapter starts with
5.1.1
Selection of Material
We begin the description of the selection process by outlining the selec-
tion criteria for the texts that comprise the sample used to explore the
application of the model. Firstly, as the model is intended to be applicable
to a broad range of texts, in terms of genres and purposes, the initial crite-
ria for selection are text types as outlined by Reiss (1977/1989). The
reason for choosing Reiss’s categorisation is that her taxonomy is based on
textual functions, and choosing to analyse texts from each of her three
categories—expressive, informative and operative—ensures a discussion of
STs with different goals, conveying messages whose content is organised
artistically, to communicate knowledge or information or to persuade
readers of a certain viewpoint or action. This will help readers gain an
understanding of what the multimodal organisation of different types of
text might look like, and how this organisation might be linked with the
function(s) the text performs.
However, texts are rarely of a ‘pure’ type, as Reiss herself explicitly
acknowledges: for example, a political leaflet will often convey information
while at the same time being an expression of the author’s point of view
on political matters and being ultimately intended to persuade the reader-
ship to vote for a certain party, therefore being informative, expressive and
operative at the same time with an arguable prevalence of this latter func-
tion. It is common for texts not to be entirely expressive, informative or
operative, and as the example shows, it is likely for them to be positioned
somewhere within the ‘triangle’ made of the three functions.
MULTIMODAL ST ANALYSIS: THE MODEL APPLIED 101
5.1.2
Analytical Procedure
At the beginning of each textual analysis, some contextual information on
the setting of the message and its author is provided as appropriate. The
model is then applied in order to investigate the multimodal organisation
of the text in each analytical dimension, before the results of the analysis
are reported in the relevant summary table, with a view to identifying
potentially problematic areas for the translation into Italian (these are
underlined and in bold in the summary table). Finally, I propose solutions
to the translation challenges identified, commenting on what changes
these entail in terms of the TT’s semantic representation and inferential
meanings if interpretive resemblance to the ST is to be maintained.
Each analysis follows the model outlined in Chap. 4 and the relevant
analytical table (Table 4.7), dealing with: the grouping of items (clusters),
the semantic representation of individual modes (verbal, visual) and of the
multimodal text (COSMOROE and logico-semantic relations) and any
102 S. DICERTO
5.1.3 Coding System
As discussed in Chap. 4, the wealth of information related to the organisa-
tion of a multimodal text needs to be organised in a systematic way in
order to be accessible. For this reason, Table 4.7 integrates the model’s
theoretical structure with Baldry and Thibault’s multimodal transcription
system (2005). While the table divides and organises the three analytical
dimensions and their sub-sections, individual items such as textual
resources and their groupings or individual visual-verbal relations still
need to be clearly labelled for ease of identification using a consistent cod-
ing system. This section is perhaps more relevant for scholars who intend
to make active use of the model to analyse multimodal texts, and less so
for readers who want to gain an understanding of multimodal translation
issues—in the latter case, the reader might want to skip ahead to Sect. 5.2,
where the actual analysis of multimodal texts begins.
Starting with groupings of items, individual clusters within each text
are labelled with a code that identifies them and assigned a number (e.g.
CL1, CL2). Each cluster is shown as an image in the cluster column to
clearly identify its boundaries. Within each cluster, the labels applied to
each textual resource reflect the mode the element belongs to; different
elements from the same mode may be distinguished by number, but it is
important to note that the numbering system does not reflect priority or
relative status between elements. Thus, two verbal elements in the same
cluster/text will be labelled respectively ‘1VER’ and ‘2VER’, whereas two
visual elements will be ‘1VIS’ and ‘2VIS’.
In order to show the link with the textual resources involved, visual-
verbal relationships are named after them. Thus, a relationship between
1VER and 2VIS is labelled 1VER-2VIS and tagged according to the rele-
vant category/subcategory, both for COSMOROE and logico-semantic
relationships (e.g. 1VER-2VIS: complementarity—apposition; extension).
If an element enjoys a one-to-many relationship with two or more other ele-
ments, all elements need to be listed in the relationship label, which is tagged
according to the relevant relationship categories as above. The element
MULTIMODAL ST ANALYSIS: THE MODEL APPLIED 103
establishing the multiple relationship is listed first, and other e lements follow
after a dash, divided by a comma (e.g. 1VER-1VIS,2VIS). If the multiple
relationship is established by an element with all the others appearing in the
text, ‘ALL’ appears after the dash (e.g. 1VER-ALL) instead of a lengthy list.
As explicatures and implicatures are based on the semantic representa-
tion, these are also identified by the same codes as the other elements.
Thus, an explicature of the element 1VER is labelled ‘1VER’; an implica-
ture derived on the basis of 1VER and 1VIS is labelled ‘1VER-1VIS’. An
implicature triggered by all textual elements together is labelled ‘ALL’.
Whenever possible, a comprehensive account of the explicatures identified
in a text is transcribed in the relevant column. However, in the case of texts
including a wealth of explicatures, only the ones deemed directly relevant
to the analysis of the text for translation purposes are listed, that is, those
that are necessary to explain its interpretation in terms of optimal rele-
vance. The identification of explicatures has to be guided by the criterion
outlined by Sperber and Wilson, that is, ‘the right interpretation […] is the
one that is consistent with the principle of relevance’ (1995: 184). It is also
important to note that the number of implicatures associated with a certain
semantic representation is potentially countless and that the model is not
intended to address them all, nor is it or capable of doing so. For this rea-
son, only implicatures that are strongly communicated are included in the
analysis. Strongly communicated implicatures are defined by Sperber and
Wilson as those ‘which must actually be supplied if the interpretation [of
the text] is to be consistent with the principle of relevance’ (1995: 199).
Lastly, it is important to point out how both the column dedicated to
the semantic representation of the multimodal text and the one for con-
textual meanings are not divided by phase/cluster, as relationships trigger-
ing inferential meanings can exist between elements belonging to different
groupings, affecting the whole text. The application of the model’s coding
system is further clarified in the texts analysed in this chapter. The table
resulting from the model’s application acts as a ‘map’ of the multimodal
meaning, and is used to pinpoint potential translation issues as a starting
point for their resolution.
among others, cartoons, comics, reviews, children’s books and other fic-
tion. Paraphrasing Lotmann (1972)1, Reiss provides the following expla-
nation of the communicative situation corresponding to expressive texts:
[T]he sender is in the foreground. The author of the text writes his topic
himself; he alone, following only his own creative will, decides on the means
of verbalization. He consciously exploits the expressive and associative pos-
sibilities of the language in order to communicate his thoughts in an artistic,
creative way. (Reiss 1977/1989: 105)
1
Lotmann, J. (1972) Die Struktur literarischer Texte. München: Fink.
MULTIMODAL ST ANALYSIS: THE MODEL APPLIED 105
5.2.1 The Backbone
This text is a political cartoon drawn by Steve Bell and published on the
website of The Guardian as a comment to David Cameron’s message to
the Conservative members of his party, published through Cameron’s
Twitter account in May 2013. The cartoon shows two clusters, the first
including the visual contribution and the top part of verbal content, the
second including the verbal content on the right-hand side (Fig. 5.1).
Fig. 5.1 Steve Bell (2013) The Backbone. The Guardian, 22 May 2013
106 S. DICERTO
Cluster Verbal content Visual content COSMOROE Logico-semantic relations Explicatures Implicatures
5.2.2 Latymer
This text is an extract from the Food & Drink section of the Surrey Life
website, dealing with a review of a restaurant in the Surrey area. Restaurant
reviews are a mixed text type, as they perform an informative function
(providing factual information on a restaurant business), an operative
function (advising the audience for/against trying a certain restaurant)
but also, and arguably more centrally, one of expression (as the author is
clearly in the foreground, and the main purpose is expressing their own
subjective opinion regarding their dining experience, regardless of whether
this causes a change in the audience’s behaviour). This is only an extract
from the webpage in which the review appears, focusing specifically on the
first half of the restaurant review and disregarding links to other pages,
advertisements, banners or other content. The choice to include only half
of the review reflects a focus on the multimodal part of the text, as the rest
110 S. DICERTO
of the text is made of verbal content only. The text is made up of three
clusters, one on the top left (title, date and social network icons), one on
the top right (image and legend on the right) and one at the bottom
(main body of the article) (Fig. 5.2).
The title in the top left cluster behaves as a defining apposition for the
rest of the text (meaning extension), in that it indicates explicitly the
nature of the text itself (restaurant review) and its main subject. Its seman-
tic representation is not complete, as the title needs several adjustments to
become a full sentence, including the addition of several words. Its com-
pletion produces an explicature (‘Michael Wignall works at The Latymer
at Pennyhill Park, located in Bagshot GU19 5EU – this is a restaurant
review for the Latymer’). The date provides additional information
(adjunct) meant to temporally enhance the text. The icons in the same
cluster provide extra information (adjunct) on how many times the article
MULTIMODAL ST ANALYSIS: THE MODEL APPLIED 111
native would be to replace the image itself with something better suited
to the legend (e.g. a picture of the restaurant) in order to tighten the
relationship between visual and verbal content in the cluster. While this
might seem at first sight an unnecessary change to the original ST visual-
verbal relations, these amendments do not affect the communicative
intention conveyed by the text, which is clearly still a restaurant review;
however, as Prljic states, people scan screens based on past experience
and expectations, and using the ‘wrong’ photos can degrade user experi-
ence (Prljic 2014). In this context, a ‘weak’ relationship such as one of
independence between visual and verbal content may not fit the expecta-
tions of the outlet’s TA, who may be used to stronger links between
images and their legends. In this case, the translator may consider such
changes in order not to distract the TA from the communicative inten-
tion of the text with textual associations hard for the audience to recon-
struct under the principle of optimal relevance. Changes of this type may
be considered in an attempt to comply with cultural/corporate conven-
tions, reducing the TA’s processing effort where this is not required to
access fundamental implicated meanings.
Fig. 5.3 Dr Seuss (2004) The Cat in the Hat, p. 1. London: Harper Collins
they are not strictly necessary for the understanding of the text.2 Therefore,
these and all other potential elements of completion are excluded from
2
This type of completion of the verbal logical form is called a ‘generalised conversational
implicature’ in the Gricean theory, whose claim is that these assumptions are in a sense con-
text-independent, in that readers will apply them by default unless these are prevented from
arising by textual/contextual factors that negate them. The status and nature of such ‘default
interpretations’ have been a longstanding battleground in pragmatics, and currently the only
consensus achieved in the discipline is that they need to be differentiated from other types of
pragmatic contributions in that, contrary to the case of inferences derived from specific con-
texts, these assumptions go through unless a special context is present (Horn 2004: 4–5).
MULTIMODAL ST ANALYSIS: THE MODEL APPLIED 115
this analysis, which focuses on the explicatures without which the text
could not be processed.
Relations of equivalence can be found between the visual and the
verbal content: ‘the house’ is represented both visually and verbally
(token-token), as well as the ‘cold, cold, wet day’, of which the visual
content represents a single moment only (metonymy). No significant
new elements are added, thanks to these relations, which only state the
same content twice (elaboration). The absence of strongly communi-
cated implicatures means that the text communicates largely on an
explicit level: thus, it can be concluded that the sender’s intention is to
communicate the semantic representation of the text. Texts (utter-
ances) of this type are defined by Sperber and Wilson as ‘ordinary asser-
tions’ (1995: 183).
Relationships of equivalence for purposes of elaboration are domi-
nant in the extract. This aspect of the text seems compatible with the
type of audience it is addressed to. A very young readership is likely to
have limited access to inferential meanings, both given their contained
knowledge of the world and because they are in the process of acquir-
ing basic literacy skills. Hence, texts that reiterate content visually and
verbally with the aid of basic visual-verbal interaction seem an appro-
priate choice for their level of proficiency in text interpretation
(Table 5.3).
Difficulties in reproducing the first page of The Cat in the Hat for an
Italian TA would lie mostly in the reproduction of the verbal content,
which needs to satisfy several requisites: firstly, as the original verbal con-
tent is expressed partly in rhyme, the TT should rhyme if possible in order
to mimic this characteristic of its verbal semantic representation and share
this formal feature with its ST; secondly, the verbal content blends with
the visual contribution by making reference to some of its features—as this
creates visual-verbal relationships, these need to be maintained in order for
the overall sense of the text to be preserved. Hence, an Italian TT should
rhyme and mention the same elements of the visual contribution as the
ST. The rhyming requirement should perhaps be a lower-order priority
than the mention of certain elements of the image: indeed, although the
absence of rhyme would mean a change to one of the features of the verbal
logical form, this change does not influence the overall multimodal seman-
tic representation or the overall interpretation of the text; on the other
hand, not mentioning some of the visual features in the verbal content
would weaken the relations among textual resources, changing the way
the text is organised.
116
Sender’s meaning
This solution mentions the house, and the pronoun ‘we’ is embedded in the verb
‘restammo’. The ‘cold, cold, wet day’ is not mentioned, but the cold rain is, com-
pensating for the lost relation of metonymical equivalence with a similar relation
of metonymical equivalence (as the rain shown in the picture is only part of the
total rain) and maintaining the multimodal semantic representation substantially
unchanged.
5.2.4 Bush Liebury
This cartoon by author Pat Bagley was published in the Salt Lake Tribune
on 24 April 2013 on the occasion of the then recent inauguration of the
George W. Bush Library. The text shows two clusters—one is formed by
the visual contribution, in which part of the verbal content is embedded;
the other is formed by the verbal contribution in the top left corner
(Fig. 5.4).
The top left of the panel, within the first cluster, shows a website
address: this provides information related to the text (adjunct) in order to
spatially enhance it with details of the location where it can be found.
Below the internet address is the author’s signature. This relates to the rest
of the text by again providing details relevant but not necessary for its
understanding (adjunct); their role is to project the ideas expressed in the
panel onto the author himself.
The rest of the verbal content of this text is embedded in the second
cluster, in a complementary relation with the image of the building it is
written upon, acting as its defining apposition: the building is defined as
the George W. Bush ‘lie-bury’. The word ‘lie-bury’ (invented joining the
words ‘lie’ and ‘bury’) can be seen as a paronym of the word ‘library’,
given that the phonemic representations of ‘lie-bury’ and ‘library’ are sim-
ilar but not identical (see Attardo 1994: 110–111 for a discussion of the
use of paronyms in relation to humour). The purpose of this complemen-
tary relationship is meaning extension: the use of the paronym suggests
that, instead of housing books as a library normally would, the building
118 S. DICERTO
Fig. 5.4 Pat Bagley (2013) Bush Library. Salt Lake Tribune, 24 April 2013
dedicated to George W. Bush is the place where lies are buried (explica-
ture). Given the role of libraries as spaces dedicated to study and the
acquisition of new knowledge (implicated premise), the defining apposi-
tion creates an incongruity (i.e. a contradiction) between the text and the
recipient’s knowledge of the world, showing the author’s humorous
intention in distorting reality. The visual content also shows a caricature of
ex-president George W. Bush that depicts him making a gesture of approval
and satisfaction. This also generates an explicature (‘George W. Bush is
happy with the lie-bury building’).
Knowledge of the political context and the event of the inauguration
of the library are important in order to understand this cartoon.
Recipients need to be aware that the inauguration of the George W. Bush
Library took place (implicated premise) and that the former US presi-
dent had often been criticised for the policies adopted by his govern-
ment and accused of lying to US citizens on many occasions (implicated
premise).
MULTIMODAL ST ANALYSIS: THE MODEL APPLIED 119
Given these two implicated premises, recipients can get to the impli-
cated conclusion conveyed by the cartoon: the text points out that the
dedication of a ‘lie-bury’ to George W. Bush instead of a library would
have been more appropriate and would have made him happy. Based on
the implicated meanings conveyed along with the incongruity generated
for purposes of humour, it can be concluded that an interpretation of the
author’s intention consistent with the principle of relevance is that he
meant to humorously criticise the event of the dedication, putting
forward the view that a hiding place for lies would have been more
appropriate and appreciated than a library (Table 5.4).
In a translation scenario in which this text is to be reproduced in Italian
for publication in a newspaper, magazine or website dealing with political
topics, the translator would have to face problems in replicating the paro-
nyms. Indeed, this part of the verbal content is based on a wordplay
between the words ‘library’ and ‘lie-bury’ which may be difficult to repro-
duce in another language. At the same time, maintaining this pun is fun-
damental in order to preserve the defining apposition that identifies the
building and its content and the related meaning extension. Indeed, these
are instrumental in triggering the retrieval of the related implicated prem-
ise necessary to support the implicated conclusion.
The word for ‘library’ in Italian is ‘biblioteca’, and the word for ‘lie’ is
‘bugia’. Both words start with the same letter, making it possible for the
translator to merge the words ‘biblioteca’ and ‘bugia’ into ‘bugioteca’,
which indicates a place where lies are collected. This solution maintains
the pun found in the ST with one variation, since the concept of burying
lies is lost in the TT. This loss of meaning could be compensated for by
including additional verbal content that would give to the TT a sense that
the content of the ‘bugioteca’ is inaccessible (e.g. ‘bugioteca privata’, ‘pri-
vate lie-brary’), or with the inclusion of additional visual elements to the
same effect (e.g. a padlock on the door of the building). However, the
sounds of ‘bugioteca’ and ‘biblioteca’ are rather different, unlike in the ST,
in which ‘lie-bury’ is phonetically very close to ‘library’ (especially for
particular English pronunciations, like Bush’s own). Resemblance of
sound with a different spelling is very difficult to achieve in languages like
Italian, which are highly phonetic. Nevertheless, the solution proposed
achieves a good level of interpretive resemblance, maintaining intact all
the visual-verbal relations, triggering the retrieval of the related implicated
premise, and leading the readership towards the same communicative
intention.
Table 5.4 Summary table, Bush Liebury
Sender’s meaning
Grouping of Semantic representation of individual Semantic representation of multimodal text Inferential meanings
items modes
Fig. 5.5 Arthur Thomson (1964), A Handbook of Anatomy for Art Students, p.34
124
S. DICERTO
5.3.2 Climate Concepts
This text extract, entitled ‘Climate Concepts’, is part of the Student’s
Guide to Global Climate Change provided by the United States
Environmental Protection Agency (EPA).3 The extract is the main portion
of the relevant webpage included in the EPA website. The text shows two
clusters. The first cluster corresponds to the left-hand column, entitled
‘Climate Concepts’; the second cluster is the one on the right-hand side,
entitled ‘Weather Versus Climate’ (Fig. 5.6).
The title ‘Climate Concepts’ summarises the content of the cluster it
belongs to, behaving as a defining apposition to the rest of the information
provided, aimed at meaning extension. The verbal content continues with
a quotation from Mark Twain, which is thematically related to the rest of
the visual and verbal content in the cluster (‘weather’, ‘climate’); the quota-
tion does not have to be processed in conjunction with the other textual
resources (independence-symbiosis), and it elaborates on the rest of the
content by restating it more concisely. The main body of verbal content and
the diagram underneath are complementary in the development of the
text, given that the first sets out the premises of the argument (definition of
climate and climate change, exemplification of its effects) and the second
completes it, adding more detailed information on how climate-related
phenomena interact with each other (adjunct, meaning extension).
3
https://archive.epa.gov/climatechange/kids/basics/concepts.html.
126
S. DICERTO
The diagram itself is formed of visual and verbal elements, and could be
considered a sub-cluster. The diagram has a title, which behaves as a defin-
ing apposition (meaning extension) in relation to the other elements of
the diagram itself, like the cluster title. Each oval figure represents the
phenomenon it relates to, showing only its place in the network of rela-
tions (metonymy—meaning extension), while the arrows establish con-
nections between the various elements of the diagram, suggesting essential
exophoras that extend the meaning of the verbal contribution. Finally,
the diagram also has a relevant legend, with additional information on the
nature of the diagram itself (adjunct—meaning extension). The superim-
position of the verbal content on the oval shapes generates explicatures
(e.g. ‘(this oval represents) stronger storms’).
The second cluster also has a title, also showing the previously dis-
cussed relation of defining apposition (meaning extension) with the rest
of the verbal content in the cluster. A relation of symbiosis can also be
observed in this cluster: the image at the top of the cluster may be an
exemplification (elaboration) of the concept of weather discussed in the
verbal content, but it does not appear to provide any further information,
being only thematically related to it.
In general, the intention behind the communication of this text is to
provide information about the concepts of weather and climate, com-
paring them and explaining the difference. An example of a part of the
text in which this intention can be detected clearly is the last sentence of
the ‘weather versus climate’ cluster, which provides a practical tip to
remember the difference between the two concepts (Table 5.6).
Assuming a translation scenario in which the Student’s Guide to Global
Climate Change was to be translated for an Italian student audience to be
released in a very similar outlet to the ST, translation problems would be
mostly found in the reproduction of the first cluster, depending on the
outlet’s style guidelines. The content of the caption of the diagram would
be rather unusual in Italian if it was translated literally, since this seems
more like a statement that would fit in the main body of the text rather
than a caption. Also, some outlets require authors to make explicit refer-
ence to diagrams or illustrations in the main body of the verbal content in
their style guides to help maintain textual cohesion. The two problems
could be solved simultaneously by moving the content of the caption to
the end of the main body of the verbal content. In this way, the sentence
would introduce the diagram, and the diagram itself would be left without
a caption. This would obviously affect the organisation of the text: the
Table 5.6 Summary table, Climate Concepts
128
Sender’s meaning
Cluster Verbal content Visual content COSMOROE Logico-semantic relations Explicatures Implicatures
5.3.3 Yalta Conference
This text is an excerpt from the Wikipedia entry on the 1945 Yalta
Conference—it consists of the very top of the webpage including the entry
title, a brief general description of the entry, a table of contents and an
image with a legend. The text shows three clusters: the first—on the left-
hand side—contains mostly verbal content (also including the coordinates
provided at the top), the second consists of the image and its caption on
the right-hand side, and the third displays the table of contents. The analy-
sis does not include remarks on the latter, considering that the table of
contents relates mostly to the following parts of the Wikipedia entry
(Fig. 5.7).
The title of the entry relates to the other elements of the webpage
being complementary to them (defining apposition, meaning exten-
sion). The tag ‘from Wikipedia, the free encyclopaedia’ provides informa-
tion on the source of the text (agent) in order to extend the meaning of
all the other elements; however, given that the entry is included in the
website of Wikipedia, this information can be derived by the audience
from other sources, and is anyway not essential for recipients to under-
130
S. DICERTO
stand the rest of the content of the entry. The main body of the text (part
of cluster 1) provides general information on the Yalta Conference, sum-
marising its circumstances and its purpose. In the second cluster, on the
right-hand side, the image represents the Yalta Conference through a pic-
ture taken during the conference itself; this captures a single moment of
the conference and hence is in a metonymical relation (equivalence) with
the title and the main body of the text, aiming to visually enhance the
spatial aspects of the content described verbally. A similar role is played by
the coordinates provided at the top of the image, which spatially enhance
the other textual resources by providing further information (adjunct) on
the location of the conference. In this second cluster, the caption at the
bottom of the image provides additional information (adjunct) that ver-
bally describes the visual content, in order to extend its meaning. It is
important to point out that the recipient is here expected to partly com-
plete the information provided by the caption, which is not exhaustive; the
names of the main personalities included in the picture are provided with
no reference to the order in which they appear, leaving the recipient to
complete the semantic representation (explicature) ‘reading’ the image
left to right (i.e. the standard reading order for English speakers), if they
cannot identify independently the faces of the personalities described by
their image only.
By the means described, the excerpt intends to provide a general
account of the Yalta Conference, communicating to the recipient relevant
information through visual and verbal resources (Table 5.7).
Given that Wikipedia presents itself as a multilingual online encyclopae-
dia, its content is often translated from and into different languages, for
internal as well as external use—this particular entry might be realistically
translated for a website about history. The text does not appear to show
particular translation challenges related to its multimodal semantic repre-
sentation. Its partial reliance on the retrieval of an explicature, however,
needs to be considered by the translator: if the TA cannot be expected to
be acquainted with the faces of the important personalities in the picture,
the content that is left implicit in the ST may need to be made explicit in
the TT, and recipients may have to be directed more closely in order to be
able to identify the state representatives referenced in the legend under-
neath the picture. Nevertheless, a general Italian TA can be reasonably
expected to possess the required knowledge, and does not need additional
support.
132
On the other hand, the legend assigned to the image is rather long.
Cultural or editorial norms governing the expected length of a legend may
require the possible sacrifice of parts of the information. For example, the
Italian Wikipedia entry on the Yalta Conference (which may or may not be
based on the text analysed here) shows a caption limited to the names of
the main personalities and their location, without providing any informa-
tion on the other (perhaps less known) historical figures or their order in
the picture. This may be due to different cultural expectations. If a transla-
tor had used the English version as their ST, in order to maintain all the
information provided by the original they would have had to move parts
of the legend to the other cluster, perhaps merging them with the main
body of the text. This would have resulted in a slightly different multi-
modal organisation, in which exophoras are created between clusters. The
quantity and type of information would not have changed substantially,
with the informative intention being still clear and with the TT presenting
a relocation of some textual resources.
5.3.4
Save the Children
This text is an extract from the 2012 annual report by Save the Children,
an international non-governmental organisation aiming to improve the
life conditions of children around the world. Reports by charitable organ-
isations usually have a twofold function, that is, to inform people of a
certain reality and also to obtain their support (financial or otherwise) in
changing things for the better. In the current case, the whole annual
report includes several articles related to charity work for children. The
persuasive function of the report is mostly evident in the opening letters
from the association’s president and chair at the beginning of the report,
in which more or less direct appeals are made to obtain the readership’s
contribution to the cause (e.g. ‘As we strive to accelerate the progress we
have made and build a world where no child dies needlessly, we will call on
your invaluable support once again’, ‘If you are already with us, thank you
for your help and support. If not, then I would encourage you to join us’).
The articles included in the report, on the other hand, reference a wealth
of factual information on a variety of issues, and their objective is to inform
the audience of individual issues and of the progress that has been made in
tackling them, without making use of clearly persuasive elements.
Therefore, although the report in its entirety is certainly a text with a
strong persuasive function, the article of which this text is an extract can
be considered as predominantly informative (Fig. 5.8).
134 S. DICERTO
The text in Fig 5.8 shows three clusters: the title at the top with the
small image of the United Kingdom on the left-hand side; an image with
caption and acknowledgement of authorship in the middle; and the main
body of the verbal content at the bottom.
The verbal content in the top cluster is the title of the article. In con-
trast to other titles previously considered, in this instance, the title does
not define the content of the rest of the text, but rather, it proposes a
judgement of value on the topic expounded by the article (adjunct—
meaning extension). The verbal content of the title is to be considered
incomplete, given that knowledge of the context is required to assign ref-
erence to the deictics it includes. While the image on the left-hand side
provides a reference for the deictic ‘here’ (essential exophora—meaning
extension), showing a map of the United Kingdom with the image of a
person superimposed on Wales, the pronoun ‘it’ finds no clear referent in
the same cluster, and requires clarification.
The second cluster includes a picture of a child. The legend provides
information on the subject of the picture, adding details about the living
conditions of the boy’s family, his health and the place where they live
(adjunct—meaning extension and spatial enhancement). On the top
left of the image, it is also possible to find information about the author of
the image, meant for meaning extension, but not fundamental for the
general understanding of the text (non-essential agent).
Finally, the third cluster gives detailed information on the main topic of
the article, that is, poverty in the United Kingdom, helping recipients to
assign a clear referent to the pronoun ‘it’ in the title (explicature). The
content of the second cluster is not referred to in any way by the main
body of the verbal content, which is about child poverty in general and
does not concern the subject of the picture or his family. Therefore, the
information included in the second cluster is only thematically related to
the rest of the text (symbiosis—meaning elaboration by exemplifica-
tion). The third cluster also generates explicatures, most notably those
due to the use of deictics such as ‘we’ and ‘our’ in the first column.
Reference assignment for these elements is achieved through context,
since the article is included in the annual report by Save the Children.
The intention in publishing this article in the Save the Children 2012
annual report is to inform supporters and potential supporters of the
living conditions of some children in the United Kingdom and of the
actions taken by Save the Children in order to improve them
(Table 5.8).
136
Grouping Semantic representation of individual Semantic representation of multimodal text Inferential meanings
of items modes
plex: the concepts expressed in the verbal content are not culture-specific;
the internet address would require modification according to the target
locale to direct the audience to a website they can access and consult, but
this does not appear to constitute a problem either; UNICEF’s name has
official translations into different languages that can be easily accessed by
the translator.
A potentially more problematic area of this multimodal text is repre-
sented by the visual contribution. Will the readership’s expected encyclo-
paedic knowledge allow them to identify the atomic mushroom,
understanding the pictorial metaphor? If so, does the TA possess the gen-
eral knowledge on atomic explosions necessary to reach the implicated
premise activated by the visual content? Missing out on the identification
of the first implicated premise would in turn hinder the understanding of
the implicated conclusions, given that these are built partly on the impli-
cated premise. If the readership cannot identify correctly the image of the
atomic bomb, they will miss out on part of the intended message, being
unable to identify the role played by the image.
In the case of a school TA, then, the translator may choose to support
the readership’s processing effort by adding details to the verbal contribu-
tion in order to reinforce the semantic representation of the TT. The ver-
bal contribution, for example, could mention atomic bombs, and be
translated as ‘1 milione e mezzo di bambini muore ogni anno dopo aver
bevuto acqua inquinata. Solo una bomba atomica ne ucciderebbe altret-
tanti.’ (‘1 million and a half children die every year after having drunk
polluted water. Only an atomic bomb would kill as many’). This would
allow the TA to achieve a fuller understanding of the message; children
might still not be able to access their own encyclopaedic knowledge about
atomic bombs, but this gap can be filled by teachers through a discussion
of the topic. The resulting TT would be more accessible for the intended
TA, taking into account their developing skills and knowledge of the
world.
5.4.2
WWF’s Earth Hour
This internet banner was released by WWF to support their 2012 Earth
Hour campaign.4 The Earth Hour ‘is a global annual event where millions
of people switch off their lights for one hour to show they care about our
4
http://earthday2017.today/wp-content/uploads/2017/03/6.png.
MULTIMODAL ST ANALYSIS: THE MODEL APPLIED 143
planet’ (Earth Hour 2016). The text shows two clusters: the first cluster is
made of visual content (sketched image of the Earth in space), which is
also the background of two parts of verbal content (‘Our world is bril-
liant – help keep it that way’ and ‘WWF’s Earth Hour – 8.30PM 31
March’). The second cluster is WWF’s Earth Hour 2012 ‘signature’ clus-
ter in the top right corner on a white background (Fig. 5.10).
In the first cluster, the image of the Earth, depicted in brilliant hues of
green and blue to show the outline of the continents (most notably
Africa at the centre of the planet), is in a complementary relationship
with the top chunk of verbal content, which calls our world ‘brilliant’.
This polysemous adjective is commonly used in English with the literal
meaning of ‘shining’, but it is also common to use the same adjective
with the meaning of ‘superb, wonderful’. Recipients are aware that the
planet is not shining and thus retrieve the first meaning based on the
principle of optimal relevance. However, the image leads the recipient’s
process of sense selection in favour of the second option. Both interpre-
tations are therefore activated in the context of this banner, reinforcing
what is expressed by the visual contribution (elaboration) while at the
same time adding new elements of positive judgement towards the planet
(extension). The image of the Earth was produced with a photographic
technique called ‘light painting’, in which light is used to ‘draw’ on a
144 S. DICERTO
dark background (in this case, the night sky) and the image is captured
by the camera, thanks to a long exposure time. The technique used for
the production of the visual content is then also connected with the cen-
tral theme of the WWF’s Earth Hour. If the audience is aware of the
content of the initiative (implicated premise triggered by the name of
the initiative) and of the means by which the image was produced, they
will understand that the meta-information carried by the visual compo-
nent is relevant to the campaign, and that the Earth in the text is ‘made
of light’ (elaboration). Nevertheless, knowledge of this particular tech-
nique should not be assumed for a non-specialist audience, and recipi-
ents can still access the message without the meta-information and the
relevant elaboration.
Again in the first cluster, the bottom part of the verbal content provides
the readership with temporal details on the initiative (adjunct), enhanc-
ing the meaning of the cluster and, as a consequence, of the message as a
whole. As the semantic representation of the verbal content is incomplete,
this produces an explicature (‘our world is brilliant, so help keep it that
way by taking part in WWF’s Earth Hour at 8.30pm on 31 March 2012’).
The role played by the second cluster is very similar to the role played
by the UNICEF name and logo in the UNICEF text: The WWF Earth
Hour name and logo provide information on the message sender, project-
ing the ideas contained in the text onto the author of the message. Since
the first cluster already states the identity of the agent, the relationship
producing the projection is non-essential to the full understanding of the
text. The presence of the name and logo, however, leads the readership
strongly towards the implicated premise about the nature of WWF as a
charitable organisation concerned with the preservation of the planet and
its species.
The full implicated conclusion, coinciding with the author’s inten-
tion and derived from the interaction of all elements, is that WWF, a
charitable environmental organisation, believes that our world is
brilliant, and encourages the audience to help keep it that way by tak-
ing part in the Earth Hour initiative, to be held at 8.30pm on 31
March 2012 (Table 5.10).
The main challenge for translation of this poster into Italian for a simi-
lar campaign by WWF would arguably be the reproduction of the relations
between the top part of verbal content and the other textual resources.
Word choice is crucial in this sense, as the polysemous adjective ‘brilliant’
is the verbal element that allows the establishment of the relationship with
Table 5.10 Summary table, WWF Earth Hour campaign
Sender’s meaning
March 2012
145
146 S. DICERTO
the image of the planet. The word chosen in the TL should ideally offer
the same polysemy. However, as this may not be possible in some lan-
guages, the translator should consider which of the two meanings to
favour in the TT. The relationship of sense selection established between
the verbal content and the image of the world contributes both to mean-
ing elaboration (re-stating the visual content through the use of ‘brilliant’
with the meaning of ‘shining’) and meaning extension (adding a positive
element of judgement through the use of ‘brilliant’ with the meaning of
‘wonderful’). If a choice needed to be made, the translator may either
choose to favour the elaboration (brilliant = shining), since it makes an
explicit reference to the content of the campaign, or the extension (bril-
liant = wonderful), as this adds new elements that enrich the text and does
not merely restate information provided by the other mode.
In a language like Italian, the two options would result in translations
such as:
G ING,
G ING,
G NE.
We need O’s.
Without enough type O
blood on the shelves, we
risk being unable to meet
the needs of patients
here in our community.
Call 1-800-GIVE-LIFE to
make an appointment at
one of our upcoming
community blood drives
or at a local Red Cross
blood donor center.
(one being a typographical character and the other one being a blood
type). The text treats them as one and the same thing, projecting some of
the qualities of one onto the other. By means of this metaphor, recipients
are led to the implicated conclusion that for the American Red Cross, the
need for type-0 blood is similar to the need for Os in writing. This rein-
forces the explicit request by the American Red Cross that recipients
donate blood, by metaphorically qualifying its importance. In this instance,
then, the absence of an expected component in the message carries mean-
ing. This is not a new concept in pragmatics, in which studies on silence
(meant as the absence of an expected utterance or part of it) as a source of
pragmatic meaning are well known: Jaworski, for example, claims that ‘[s]
ilence definitely belongs to the nonverbal component of communication’
(1993: 85) (Table 5.11).
The challenge for the reproduction of this campaign for the Italian Red
Cross would lie, somewhat paradoxically, in the reproduction of the absent
character. The missing letter Os are, indeed, at the basis of the establish-
ment of the equivalent relationship between the two parts of verbal con-
tent in the top cluster, and they are responsible for the meaning extension
deriving from this relationship. A translator would need to find TL equiva-
lents for the words ‘going’ and ‘gone’ containing the letter O to be able
to reproduce this relationship successfully. In Italian, this could be achieved
with the use of two gerunds and a past participle of the verb finire (finish,
run out), with a solution such as ‘STA FINEND , STA FINEND , È FINIT
’ (it is running out, it is running out, it has run out). The forms of the verb
used for this solution would normally all end in ‘O’ (finendo, finito), and
the character has been omitted to replicate the strategy used in the ST. The
position of the missing letter at the end of the word (as opposed to the
middle, as in the ST) is likely to mean that the main body of the text will
have to be moved to the right if the alignment of the main body with the
missing character is to be achieved. In turn, moving the main body of the
text to the right would determine the relocation of the second cluster
either to the centre or to the left, as the main body occupies the space all
the way to the bottom of the leaflet, leaving no room for the symbol of the
American Red Cross underneath it.
Given that the ultimate purpose of the equivalent relationship between
the two parts of verbal content in the top cluster is only to add to the
strength of a request explicitly formulated in the main body, failure to
reproduce the metaphor would not mean a collapse of the multimodal
organisation. The request would, however, lose some support coming
150
Sender’s meaning
5.4.4 Coldwater Creek
This advertisement was used by Coldwater Creek, a US-based clothing
company and retailer, to publicise their products on their own website. It
shows one cluster only, containing visual and verbal content. The linguis-
tic contribution is divided into two sentences (Fig. 5.12).
The first sentence at the top is an expression of military origin that
refers to having braved adverse circumstances and obtained recognition
for this (implicated premise). While the content suggested verbally seems
unrelated to the topic of fashion, the visual content on the left offers a clue
that makes the verbal contribution relevant to a fashion reader: the stripes
mentioned verbally are shown visually, building an essential exophora
that leads the recipient towards a reinterpretation of the verbal content,
producing a meaning extension.
The second sentence again builds an essential exophora with the visual
content, referring explicitly to the image (‘a vibrant pairing like this’). This
relationship is meant to produce a meaning elaboration through
exemplification.
Both parts of verbal content build an extratextual relationship with the
readership, addressing recipients directly. However, they do so in two dif-
ferent ways: the top part simply addresses the recipient as ‘you’, whereas
the second part builds the extratextual connection through the pronoun
‘us’, ‘embracing’ the recipient and including them in a group. Both pro-
nouns need reference assignment, as well as the deictic ‘this’, which refers
to the outfit in the image (explicatures).
The operative function relies heavily on all the elements of the message:
the first sentence conveys that the message sender believes that the reader-
ship deserve stripes because they have earned them. This suggests through
an implicated premise that stripes are reserved for a select group of peo-
ple who are ready to do something to obtain such desirable items.
The second sentence conveys explicit information about the current
fashion, drawing the recipient’s attention both to the general trend and to
how they can use stripes to attract people’s attention as desired by wearing
‘a vibrant pairing like this’.
Together, the various elements of the message aim to suggest the
implicated conclusion that recipients should acquire clothing with stripes
such as the one in the picture because they are desirable and fashionable,
and the recipients deserve them. The author’s intention is to convince
the readership of the implicated conclusion (Table 5.12).
If Coldwater Creek were to open shops in Italy and required its adver-
tisements to be translated, the first sentence would probably represent a
translation issue for an Italian TT. Indeed, Italian does not possess a widely
known cultural equivalent for the idiomatic expression that could be used
replicating the same play on words. However, using an idiomatic expres-
sion with the same meaning is not strictly necessary to the overall func-
tioning of the text, provided that a reference to stripes is maintained in the
TT (as this is necessary to maintain the essential exophora and the related
meaning extension). The TT also needs to suggest to the recipient, implic-
itly or explicitly, that items of clothing with stripes are desirable and
Table 5.12 Summary table, Coldwater Creek
Sender’s meaning
reserved for an elite who are ready to earn them (in order to preserve the
content of the implicated premise). This will support the implicated con-
clusion, preserving as much of the multimodal semantic representation of
the ST as possible in spite of the change in the verbal content.
A possible solution for an Italian version of the advertisement, then,
could be based on a translation of ‘You’ve earned your stripes’ such as
‘Mettersi in riga a volte è fantastico’ (‘lining up sometimes is great’). The
Italian word for ‘line’ also translates ‘stripe’. Just like its ST counterpart,
the Italian set phrase ‘mettersi in riga’ derives from military slang and
indicates the act of soldiers lining up. In common usage, it signifies the act
of making an effort to go back to duty after a period of undisciplined
behaviour. Therefore, the Italian translation for the first sentence suggests
that, contrary to what is generally believed, there are circumstances in
which ‘mettersi in riga’ can be highly desirable (i.e. getting into clothes
with stripes). This solution would maintain the relation of essential
exophora with the picture, and it would still produce a meaning extension,
mimicking the multimodal semantic representation of the ST.
shed light on whether modifications are required for it to realise its full
potential as an evaluative framework, as an analytical method for research-
ers, as a contribution to the training of translators, and for practising trans-
lators as a basis for reflecting on intuitive decisions.
The second important trend identified is the tendency of translation
issues ascribed to a certain dimension of the model to ‘spill over’ into
other dimensions. Indeed, while it is possible to affirm that a certain trans-
lation issue originates from one of the three analytical dimensions, this
does not mean that applying changes will have an effect on that dimension
only. For example, problems generated by difficulties in reproducing the
ST textual resources might mean that changes are required to the resources
employed in the TT; nevertheless, any change in the area of individual
textual resources is likely to have an impact on the accessibility of the prag-
matic meaning required to get to the sender’s intention. An example of
this phenomenon can be found in text Sect. 5.2.4, in which the paronym
‘lie-bury’ allows the ST audience to access a set of pragmatic meanings
connected to President Bush that help them get to the intended message.
While the translation of the paronym can constitute a problem in itself, in
this specific case, it also reflects on the implicit meaning the TT is capable
of suggesting to its audience by reminding them of the political context in
which the cartoon was published. In other cases, translation issues con-
nected with a certain textual resource impact on the relationships this
establishes with other textual resources; this phenomenon can be exempli-
fied by the text on WWF’s Earth Hour (Sect. 5.4.2), in which difficulty in
reproducing a pun—‘brilliant’—may not allow the establishment of a rela-
tion of complementarity between textual resources that is important for
their interplay in the multimodal text. ‘Chain reactions’ involving all tex-
tual dimensions are not uncommon either: in the Coldwater Creek text
(Sect. 5.4.4), the main translation issue lies in the rendering of the verbal
content, not because of the complexity of its internal structure or a highly
specialised terminology but because the translator requires a textual
resource evoking the same (or a similar) implicature (i.e. clothes with
stripes are desirable) while at the same time maintaining the same (or a
similar) relationship between the linguistic and visual content.
The strong likelihood that translation issues arising in one dimension
will influence one or more of the other dimensions suggests that the analy-
sis of these three ways of conveying meaning cannot be fully compartmen-
talised and that resolving a potential issue belonging to one dimension
does not necessarily mean that the solution found will not in turn create a
challenge in another.
156 S. DICERTO
References
Attardo, S. (1994). Linguistic Theories of Humor. New York: Mouton de Gruyter.
Baldry, A., & Thibault, P. J. (2005). Multimodal Transcription and Text Analysis.
London: Equinox.
Earth Hour. (2016). What Is Earth Hour? [online]. Available at: http://earth-
hour.wwf.org.uk/earth-hour/. Last accessed 15 May 2016.
Federici, F. (2011). Introduction: Dialects, Idiolects, Sociolects: Translation
Problems or Creative Stimuli? In F. Federici (Ed.), Translating Dialects and
Languages of Minorities: Challenges and Solutions (pp. 1–20). Oxford: Peter
Lang.
Forceville, C. (1996). Pictorial Metaphor in Advertising. London: Routledge.
Horn, L. R. (2004). Implicature. In L. R. Horn & G. Ward (Eds.), The Handbook
of Pragmatics (pp. 3–28). Oxford: Blackwell.
Jaworski, A. (1993). The Power of Silence: Social and Pragmatic Perspectives.
Newbury Park: Sage.
Prljic, M. (2014). The Web Designers’ Guide to Photo Selection [blog post]. Available
at: https://webdesign.tutsplus.com/tutorials/the-web-designers-guide-to-
photo-selection--cms-21592. Last accessed 19 June 2017.
MULTIMODAL ST ANALYSIS: THE MODEL APPLIED 157
This final chapter reflects upon several aspects of the model proposed in
this study, aiming to assess to what extent it has met its initial objectives.
The main goal set at the beginning of this work was to develop a model
for the textual analysis of multimodal STs for translation purposes. The
model was developed with an approach bringing together aspects of
has proved useful for organising the results of analysis for each of the three
dimensions of the model and their connection with the (combinations of)
textual resources involved in conveying meaning, offering a mapping of a
multimodal text ‘at a glance’.
The information on the STs obtained by applying the model has been
useful for discussing translation strategies for the selected texts in terms of
interpretive resemblance, detailing the practical challenges of producing a
TT interpretively resembling its ST. The ST analysis has helped to pin-
point potential issues, ascribing them to a particular dimension (as dis-
cussed in Sect. 5.5) and supporting a discussion of how these could be
resolved in the final TT. This discussion focused mainly on the conse-
quences of certain potential solutions in terms of their impact on the over-
all TT organisation, with particular reference to interpretive resemblance.
In some cases, the solutions proposed did not reproduce the same visual-
verbal relations or the same exact inferential meanings detected in the ST
(supporting Gutt’s view that total interpretive resemblance may not
always be possible in translation); consequently, partial solutions were
looked at in terms of the changes they generated in the organisation of the
multimodal TT in order to investigate interpretive compatibility with the
ST and ascertain whether any compensation was desirable or required for
the multimodal TT to interpretively resemble its ST. The application of
the model to a range of multimodal texts has demonstrated that it can be
used for conscious reflection on the translation challenges posed by mul-
timodal texts based on a higher or lower degree of detected interpretive
resemblance. In this sense, the model has achieved its primary goal as an
analytical framework to improve our understanding of the meaningful
organisation of multimodal STs for translation purposes, as stated in
Chap. 1.
The analysis of the texts included in this study was carried out from the
point of view of an individual, involving an interpretation of the text in a
limited knowledge environment that does not allow direct access to the
sender’s intentions, as described in Chap. 3. Other users of the model may
disagree on the meaning assigned to those texts based on their personal
interpretation, identifying different visual-verbal relations or contextual
implications in the same ST. While this may seem at first a limitation of the
model, these are the conditions under which translators operate the world
over, being able to rely on their own interpretation of a text and trying to
guide their audience towards a similar understanding. Application of the
model by other users should be encouraged as a means of testing its
162 S. DICERTO
A possible way to account for this part of meaning that is at the moment
unaccounted for in the model would be to treat extratextual connections
to the recipient as a special type of explicature that requires an assessment
of the expected influence on the level of textual relevance perceived by the
TA. Indeed, this indirectly affects the meanings conveyed by a text and in
particular implicit meanings, as the level of perceived relevance determines
the cognitive effort the audience is willing to make to process a text fully.
A low level of perceived relevance is likely to mean that the explicit and
implicit meanings conveyed by a text will not be fully (or partly) processed
by the readership and that the text will lose all or part of its force—being,
in effect, inadequate to convey the intention behind it.
With or without this integration, researchers in translation studies
could use the model to investigate, among other topics, particular types of
translation. As noted in Chap. 4, the model could find applicability in
research on AVT, localisation or other areas of translation studies dealing
with multimodal meaning. Its applicability as an analytical tool can also
include individual case studies, the analysis of language-pair-specific issues
or systematic investigations of specific ST genres. Extensive application of
the model to individual ST types/genres could be useful to detect trends
specific to certain groups of texts, allowing researchers to produce gener-
alisations regarding the way some multimodal STs are normally organised.
Such generalisations, on top of expanding our current knowledge of mul-
timodal matters, could be useful for informing translation theory with a
more detailed account of the challenges that can be reasonably expected
from each category of texts and, hence, what to ‘look out for’ in the trans-
lation activity. For example, the model’s application to operative and
informative texts would result in a more detailed picture of how these are
respectively centred on their receiver and on their topic. Within the lim-
ited scope of the sets of examples analysed here, operative texts have
already shown a rather heavy reliance on the receiver to contribute to the
text through the elaboration of inferential meanings, while informative
texts have appeared as rather more heavily invested in their semantic
representation.
Topics of authorship and creativity could also be investigated by means
of the model—as anticipated at the beginning of Sect. 5.2, expressive texts
are comparatively free-form, and their shape mostly depends on the
author’s personal idiolect and aesthetic choices. Therefore, the works of
individual authors could be analysed and compared, both in order to iden-
tify common patterns and to see if/how an author’s style develops over
164 S. DICERTO
time. The application of the model to expressive texts may help elaborate
on Reiss’s definition by providing a detailed account of the features form-
ing the ‘multimodal signature’ of a certain author, going beyond the gen-
eral description of expressive texts as author-dependent with an analysis of
the communicative strategies and schemes of interaction between dimen-
sions that are consistently used by an author. Application of the model to
different language combinations could also help to differentiate between
generalisations applicable to a specific language pair and more highly gen-
eralizable trends (or ‘norms’) in the translation of multimodal STs.
The model could also be directly useful for the training of new transla-
tors. The type of analysis supported by the model is likely to encourage
trainee translators to come to terms with the necessity to steer away from
the common starting point of ‘literal’ translation in order to serve the
higher purpose of interpretive resemblance; this realisation is in turn likely
to increase their awareness of potential translation issues and of the strate-
gies that can be applied to overcome them based on a framework that can
help them reflect critically on the strengths and weaknesses of their own
choices.
Nevertheless, it could be argued that, as it stands, the model is too
complex for use by translation students. Indeed, some training is required
to understand how to map out texts. Use of the model requires an under-
standing of Relevance Theory and of visual-verbal relations that should
be delivered to students prior to any potential application of the model.
Also, the analytical process can be lengthy in particular when first used,
making its in-class use potentially impracticable. While it seems hardly
possible to produce a simple model accounting in detail for a complex
and multifaceted reality such as multimodality, didactic difficulties need
to be acknowledged. For these reasons, especially at the beginning,
trainee translators may be presented with a ‘reduced’ version of the
model, in which contextual meanings are addressed in less depth (e.g.
replacing explicatures and implicatures in Table 4.7 with a single column
where contextual factors are listed in connection with textual resources)
and visual-verbal relations are divided into the four broad categories of
equivalence, contradiction, independence and symbiosis without any fur-
ther specification. The tool could find a didactic application in a multimo-
dality module, and be used with tutor guidance in this reduced version.
This would partly resolve the issue of complexity, leaving trainees with the
freedom to study v isual-verbal relations in more detail in their own time,
if they have a specialist interest in multimodal meaning. Alternatively, and
MULTIMODAL ST ANALYSIS: CURRENT STATUS, OPPORTUNITIES, WAYS… 165
perhaps more realistically given its complexity, the model could be used
mainly as a background resource to raise student awareness of the organ-
isation of multimodal STs and its potential impact on translation.
While the model has been developed with translation in mind, its
descriptive value makes it into a valuable tool also for uses other than
translation. The possibility of mapping a multimodal text according to the
three dimensions of the model could indeed be useful for other types of
textual analysis in entirely different fields of research. Multimodality is a
pervasive phenomenon, and relevant studies (e.g. on literature, art, adver-
tising, language for special purposes and many more) could use the blue-
print proposed in this book to produce versions of the model corresponding
to their specific analytical needs. These may require the replacement of the
semiotic modes used for this study with different ones, the addition of
further modes, or maybe a stronger emphasis on one of the dimensions,
and are likely to be unconnected to the original idea of supporting inter-
pretive resemblance between a TT and its ST.
The model’s multidisciplinary nature inherently means that its poten-
tial applications are also multidisciplinary. It is now time for this study to
return the concepts borrowed from each field of studies to the respective
owners, hopefully with some interest gained through interdisciplinary
interaction; in the course of this journey through multimodal meaning,
this interaction, which represents a point of contact between the seem-
ingly parallel lines of independent disciplines, has proven to have not
merely additive but also multiplicative effects—just like the interaction
between modes in a multimodal text.
References
Baldry, A., & Thibault, P. J. (2005). Multimodal Transcription and Text Analysis.
London: Equinox.
Gutt, E. A. (2000). Translation and Relevance: Cognition and Context (2nd ed.).
Manchester: St. Jerome.
Tirkkonen-Condit, S. (1992). A Theoretical Account of Translation—Without
Translation Theory? Target, 4(2), 237–245.
References
Aguiar, D., & Queiroz, J. (2010). Modeling Intersemiotic Translation: Notes
Toward a Peircean Approach [online]. Available at: http://french.chass.uto-
ronto.ca/as-sa/ASSA-No24/Article6en.htm. Last accessed 26 June 2017.
Anstey, M., & Bull, G. (2010). Helping Teachers to Explore Multimodal Texts.
Curriculum Leadership [online]. Available at: http://cmslive.curriculum.edu.
au/leader/default.asp?id=31522&issueID=12141. Last accessed 26 June 2017.
Attardo, S. (1994). Linguistic Theories of Humor. New York: Mouton de Gruyter.
Bach, K. (1994). Conversational Impliciture. Mind & Language, 9(2), 124–162.
Bagley, P. (2013, April 24). Bush Library. Salt Lake Tribune.
Baker, M. (2011). In Other Words: A Coursebook on Translation (2nd ed.).
New York: Routledge.
Baldry, A., & Thibault, P. J. (2005). Multimodal Transcription and Text Analysis.
London: Equinox.
Barthes, R. (1977). Rhetoric of the Image. In R. Barthes (Ed.), Image–Music–Text
(pp. 32–51). London: Fontana.
Bateman, J. A. (2014). Text and Image: A Critical Introduction to the Visual/
Verbal Divide. Oxon: Routledge.
Baumgarten, N. (2008). Yeah, That’s It!: Verbal Reference to Visual Information
in Film Texts and Film Translations. Meta, LIII(1), 6–24.
Bell, S. (2013, May 22). The Backbone. The Guardian.
Berinstein, P. (1997). Moving Multimedia: The Information Value in Images.
Searcher, 5(8), 40–49.
Bernardo, A. M. (2010). Translation as Text Transfer—Pragmatic Implications.
Estudos Linguisticos / Linguistic Studies, 5, 107–115.
Martinec, R., & Salway, A. (2005). A System for Image-text Relations in New
(and Old) Media. Visual Communication, 4(3), 337–371.
McCloud, S. (1994). Understanding Comics: The Invisible Art. New York:
HarperPerennial.
Mubenga, K. S. (2009). Towards a Multimodal Pragmatic Analysis of Film
Discourse in Audiovisual Translation. Translator’ Journal, 54(3), 466–484.
Multimodal Analysis Company. (2013). Concept [online]. Available at: http://
multimodal-analysis.com/about/concept/. Last accessed 16 June 2017.
Munday, J. (2004). Advertising: Some Challenges to Translation Theory. The
Translator, 10, 199–219.
Munday, J. (2008). Introducing Translation Studies: Theories and Applications
(2nd ed.). New York: Routledge.
Munday, J. (2012). Introducing Translation Studies: Theories and Applications
(4th ed.). Oxon: Routledge.
Newmark, P. (1981). Approaches to Translation. Oxford: Pergamon.
Nida, E. (1964). Toward a Science of Translating. Leiden: E.J. Brill.
Nikolajeva, M., & Scott, C. (2000). The Dynamics of Picturebook Communication.
Children’s Literature in Education, 3(3), 225–239.
Nord, C. (2005). Text Analysis in Translation: Theory, Methodology and Didactic
Application of a Model for Translation-Oriented Text Analysis (2nd ed.).
Amsterdam/New York: Rodopi.
Norris, S. (2004). Analysing Multimodal Interaction: A Methodological Framework.
London: Routledge.
O’Sullivan, C. (2013). Introduction: Multimodality as Challenge and Resource
for Translation. In C. O’Sullivan & C. Jeffcote (Eds.), Special Issue on
Translating Multimodalities, JoSTrans (Vol. 20, pp. 2–14).
O’Sullivan, C., & Jeffcote, C. (Eds.). (2013). Special Issue on Translating
Multimodalities, JoSTrans (Vol. 20).
Olson, D. R. (1994). The World on Paper: The Conceptual and Cognitive
Implications of Writing and Reading. Cambridge: Cambridge University Press.
Orero, P. (Ed.). (2004). Topics in Audiovisual Translation. Amsterdam/
Philadelphia: Benjamins.
Orlebar, J. (2009). Understanding Media Language. MediaEdu [online]. Available
at: http://media.edusites.co.uk/article/understanding-media-language/.
Last accessed 26 June 2017.
Pastra, K. (2008). COSMOROE: A Cross-Media Relations Framework for
Modelling Multimedia Dialectics. Multimedia Systems, 14, 299–323.
Pedersen, J. (2008). High Felicity: A Speech Act Approach to Quality Assessment
in Subtitling. In D. Chiaro, C. Heiss, & C. Bucaria (Eds.), Between Text and
Image: Updating Research in Screen Translation. Amsterdam/Philadelphia:
Benjamins.
172 References
Multimodal transcription, 26, 90, 102, Relevance theory (RT), 42–45, 47–49,
160 51, 52, 54–57, 62, 64, 66, 67,
See also Baldry, A.; Thibault, P. J. 70, 78, 88, 160, 164
See also Gutt, E. A.; Sperber, D.;
Wilson, D.
O RT, see Relevance theory
Operative text, 10, 100, 109, 122,
138–154, 163
Optimal relevance, 55, 64–67, 70, 72, S
73, 82, 84, 88, 89, 103, 107, Semantic representation, 50, 66–68,
113, 143, 160 70–85, 87–89, 92–96, 101–103,
presumption of, 55, 64–67, 107, 108, 110–113, 115–117,
70, 72, 73, 82, 84, 88, 120, 124, 125, 128, 129, 131,
89, 107 132, 136, 141, 142, 144–147,
See also Gutt, E. A.; Sperber, D.; 150, 153, 154, 160, 163
Wilson, D. See also Explicatures; Implicatures;
Optimal resemblance, 55 Logical form
See also Implications; Optimal Shared cognitive environment, 38–42
relevance Social semiotics, 8, 9, 16, 23
See also Kress, G.; Van Leeuwen, T.
Sperber, D., 9, 38, 42–50, 54, 56, 62,
P 64–68, 70, 71, 103, 115, 160
Pastra, K., 9, 10, 18, 62, 73, 78–80, See also Gutt, E. A.; Relevance
82, 83 theory (RT)
See also Cross-media interaction Strata, 24, 25
relations (COSMOROE); See also Kress, G.; Van Leeuwen, T.
Logico-semantic relations
Phases, 27, 63, 90, 103
See also Baldry, A.; Multimodal T
transcription; Thibault, P. J. Text, 5
Pragmatics, 8, 9, 16, 22, 34, 37–57, expressive, 10, 100, 101, 103–119,
62–73, 78, 82, 85–89, 114n2, 122, 163, 164 (see also Reiss, K.)
138, 149, 155, 160 informative, 10, 100, 109, 121–137,
See also Cooperativeness; 163 (see also Reiss, K.)
Grice, H. P.; Kress, G.; operative, 10, 100, 109, 122,
Relevance theory (RT); Van 138–154, 163 (see also Reiss, K.)
Leeuwen, T. Textual resources, 2–5, 9, 51, 57, 62,
63, 76, 85, 91–96, 102, 104,
115, 121, 125, 129, 131, 133,
R 138, 140, 144, 146, 151, 155,
Reiss, K., 5, 10, 100, 101, 103, 104, 156, 160, 161, 164
121, 138, 164 Thibault, P. J., 22, 23, 26, 28, 63,
See also Expressive, Informative, 89–92, 102, 161
Operative text See also Multimodal transcription
178 INDEX
Translation, 1–10, 15–18, 22, 23, Visual mode, 31, 32, 86–88, 129
27–29, 38, 40, 50–52, 61–96, 99, Visual-verbal relations, 10, 62, 74, 76,
101–104, 107, 109, 111, 117, 83–85, 88, 102, 104, 113, 119,
119, 121–125, 127, 129, 131, 160, 161, 164
137, 138, 140–142, 144, 146, See also Cross-media interaction
151, 152, 154–156, 159–165 relations (COSMOROE);
Logico-semantic relations
V
Van Leeuwen, T., 9, 23, 24, 28–33, W
67, 86–89 Wilson, D., 9, 38, 42–50, 54, 56, 62,
See also Visual grammar; Visual 64–68, 70, 71, 103, 115, 160
mode; Pragmatics; strata See also Gutt, E. A.; Relevance
Visual grammar, 24, 65, 86–88 theory (RT)