Sie sind auf Seite 1von 70

Machine Translation and Post-

Editing: Is All That Glitters


Gold?

Universidade NOVA de Lisboa. Lisboa, 25th February 2019


1. Introduction.
2. Definition of MT & PE.
3. Differences between MT and
CAT tools.
Index 4. Types of MT engines.
5. The market of MT in Spain.
6. Hands-on.
7. Types of mistakes.
8. Conclusions.
9. References.
2
1. Introduction

3
1. Introduction
• Translation workflow (I)

4
1. Introduction
• Translation workflow (II)

5
2. Definition of
MT & PE

6
2. Definition of MT & PE
• MT:
• Wikipedia: “Machine translation is a sub-field
of computational linguistics that investigates the use of
software to translate text or speech from one language to
another”.

• Merriam Webster: “automatic translation from one


language to another”.

• ISO 8587:2017: Translation services — Post-editing of


machine translation output —
• Automatic translation of text from one natural language
to another using a computer application.
7
2. Definition of MT & PE
• PE:
• ISO 8587:2017: Translation services — Post-
editing of machine translation output —
• Post-edit: edit and correct machine translation
output.
• Types:
• Full post-editing: process of post-editing to obtain a
product comparable to a product obtained by human
translation.
• Light post-editing: process of post-editing (3.1.4) to
obtain a merely comprehensible text without any attempt
to produce a product comparable to a product obtained
by human translation.
8
2. Definition of MT & PE

Preediting Input Translation Output Posteding

9
2. Definition of MT & PE

10
3. Differences
between MT &
CAT tools

11
3. Differences between MT & CAT tools
Bowker & Fisher (2010: 60):
Computer-aided translation (CAT) is the use of computer
software to assist a human translator in the translation
process. The term applies to translation that remains primarily
the responsibility of a person, but involves software that can
facilitate certain aspects of it.
This contrasts with machine translation (MT), which refers to
translation that is carried out principally by computer but may
involve some human intervention, such as pre- or post-
editing. Indeed, it is helpful to conceive of CAT as part of a
continuum of translation possibilities, where various degrees of
machine or human assistance are possible 12
3. Differences between MT & CAT tools
• Computer-assisted translation (CAT) tools automate
certain facets of human translation processing in order
to enhance overall translator productivity (adapted
from Folaron, 2010).
• CAT tools are used to support the translator, by
eliminating repetitive work, automating terminology
lookup activities, and recycling previously translated
texts. (Esselink, 2000: 359).
• Related concepts:
• TMS: Translation Memory Systems.
• TENT: Translation Environment Tools.
13
3. Differences between MT &
CAT tools

Corpora and
corpus
analysis tools

CAT
tools
(Bowker,
2002)
Terminology
TM systems Management
Systems

14
4. Types of MT
engines

15
4. Types of MT engines

• Forcada (2010: 218-220):

• Rule-based machine translation (RBMT).

• Corpus-based machine translation (CBMT)


• Example-based machine translation (EBMT).
• Statistical machine translation (SMT).

• Neural Machine Translation (NMT).


16
4. Types of MT engines

• Rule-based machine translation (RBMT) 


transfer systems.

Analysis Transfer Generation

17
4. Types of MT engines

• Example-based machine translation (EBMT)

Matching Alignment Recombination

18
4. Types of MT engines

• Statistical machine translation (SMT):


• A SL and a TL sentence can be the translation with
a certain probability  Using a probability model
inferred from the bilingual corpus.
• Lexical probabilities.
• Alignment probabilities.
• MT depends strongly on the quality of the raw
machine-translated text.

19
4. Types of MT engines
• Neural Machine Translation (NMT):
• It is a recently proposed approach to machine
translation. Unlike the traditional statistical machine
translation, the neural machine translation aims at
building a single neural network that can be jointly
tuned to maximize the translation performance. The
models proposed recently for neural machine
translation often belong to a family of encoder-
decoders and consists of an encoder that encodes a
source sentence into a fixed-length vector from which a
decoder generates a translation (Bahdanau et al.,
2014: 1).

20
4. Types of MT engines
• Neural Machine Translation (NMT)

21
4. Types of MT engines
• Neural Machine Translation (NMT):
• Example with “dog” (EN>PT):
1. It creats an image in its brain.
2. The neural networks can detect that “cão” is male
genre.
3. In case there is any element in the sentence about
female genre, it will automatically use “cadela”.

22
5. The market of
MT in Spain

23
5. The market of MT in Spain

• Current situation:
• 700 Translation Service Providers (TSP):
• ≤ 20 employees.
• ≤ 1 million euros.

• 20 Translation Service Providers (LSP):


• ≥ 20 employees.
• ≥ 1 million euros.

• 4,000 professional freelance translators.

24
5. The market of MT in Spain

• MT and TSP
• ISO 9001 Quality management (any company).

• European standard EN 15038:2006: It is a quality


standard developed especially for translation
services providers, which defines the following
requirements:
• Basic requirements for the human resources and process
used in the provision of translation services.
• Client – TSP relationship.
• Procedures for translation services.
25
5. The market of MT in Spain

• MT and TSP
• ISO 18587:2017
• It provides requirements for the process of full,
human post-editing of machine translation output
and post-editors' competences.
• It is intended to be used by TSPs, their clients, and post-
editors.
• It is only applicable to content processed by MT
systems.
• NOTE: For translation services in general, see ISO
17100.
26
5. The market of MT in Spain
•MT and TSP
• ISO 18587:2017 Translation services-Post-
editing of machine translation output-
Requirements
• The use of machine translation (MT) systems to meet the needs of an
increasingly demanding translation and localization industry has been
gaining ground. Many translation service providers (TSPs) and clients have
come to realize that the use of such systems is a viable solution for
translating projects that need to be completed within a very tight time frame
and/or with a reduced budget. When an MT system is used, clients can
have material translated that can otherwise not be translated; translation
costs can be decreased and the launch of products on specific markets, as
well as the flow of information, can be accelerated. On the other hand,
TSPs are able to:
• a) improve translation productivity;
• b) improve turn-around times;
• c) remain competitive in an environment where clients show an
increasing demand for using MT in translation.
27
5. The market of MT in Spain
•MT and TSP
• ISO 18587:2017 Translation services-Post-
editing of machine translation output-
Requirements
• However, there is no MT system with an output which can be
qualified as equal to the output of human translation and, therefore,
the final quality of the translation output still depends on human
translators and, for this purpose, their competence in post-editing.
• The rate at which MT systems are changing renders it impractical to
produce an overarching International Standard on these systems,
which could stifle innovation or be ignored by the translation
technology development industry.
• This document therefore restricts its provisions to that part of the
process that begins upon the delivery of the MT output and the
beginning of the human process that is known as post-editing.
28
5. The market of MT in Spain
•MT, prices and TSP
• Great variety of prices.

• Different translation qualities:


• Simple translation.
• PT: Translation with final QA.
• TEP: Translation + proofreading + final QA.
• MT: Machine translation.
• PMT: Post-edited MT.

• In the budget planning the type of translation


must be carefully detailed. 29
5. The market of MT in Spain
• Use of MT by TSP
(Torres-Hostench
et al., 2016).

30
5. The market of MT in Spain
• Use of MT by
TSP (Torres-
Hostench et
al., 2016).

31
5. The market of MT in Spain
• Use of MT by
TSP (Torres-
Hostench et
al., 2016).

32
5. The market of MT in Spain
• Use of MT by TSP (Torres-Hostench et al., 2016).

33
5. The market of MT in Spain
• Use of MT by
TSP (Torres-
Hostench et
al., 2016).

34
5. The market of MT in Spain
• Use of MT by
TSP (Torres-
Hostench et
al., 2016).

35
5. The market of MT in Spain
• Use of MT by
TSP (Torres-
Hostench et
al., 2016).

36
5. The market of MT in Spain
• Use of MT by
TSP (Torres-
Hostench et
al., 2016).

37
5. The market of MT in Spain
• Use of MT by
TSP (Torres-
Hostench et
al., 2016).

38
5. The market of MT in Spain
• Use of MT by
TSP (Torres-
Hostench et
al., 2016).

39
5. The market of MT in Spain
• Use of MT by
TSP (Torres-
Hostench et
al., 2016).

40
5. The market of MT in Spain

• Use of MT by freelance translators  International survey


carried out by Zaretskaya et al. (2015, 2016, 2018).
• 1304 translators fulfilled the survey.

41
5. The market of MT in Spain

42
5. The market of MT in Spain

43
5. The market of MT in Spain

44
5. The market of MT in Spain

45
5. The market of MT in Spain

46
5. The market of MT in Spain

47
5. The market of MT in Spain

48
5. The market of MT in Spain

49
5. The market of MT in Spain

50
5. The market of MT in Spain

1º TA neuronal

2º TA reglas

3º TA estadística

51
6. Hands-on

52
6. Hands-on

• Text to be translated from PT into EN using:


• Babylon: https://traductor.babylon-software.com/ (RBMT)
• SYSTRANet: http://www.systranet.com/translate (SMT)
• DeepL: https://www.deepl.com/translator (NMT)

• Text available at:


• http://www.consigna.uva.es/16471
• Password: lisboa

53
6. Hands-on

• Tasks:
1. To detect mistakes in
the English output
translation using
these parameters
(Adapted from MQM).

54
6. Hands-on
Babylon SYSTRANet DeepL
Terminology
Mistranslation
Omission
Addition
Untranslated
Spelling
Typography
Grammar
Unintelligible

55
7. Types of mistakes

56
7. Types of mistakes
PT BABYLON SYSTRANet DeepL
PRISCA PRISCA a brand with a PRIMEVAL one Marks with PRISCA
uma Marca com História history the tradition, well History A Brand with History
A tradição, o bem fazer e a done and the reputation of The tradition, the good to The tradition, the good work
reputação dos produtos da the products of Prisca, are make and the reputation of and the reputation of Prisca
Prisca, são o resultado do the result of love to the the products of the Primeval products are the result of the
amor à gastronomia e aos gastronomy and the products one, they are the result of the love of gastronomy and local
produtos da terra, of the land, rooted in our love to the gastronomy and products, rooted in our family
enraizados na nossa
family since decades ago. the products of the land, for decades. This activity has
família desde há
This activity is present taken root in our family since been present without
décadas. Esta atividade
está presente continuously since 1917, has decades. This activity is interruption since 1917,
ininterruptamente desde passed from parents to present uninterruptedly since passed on from parents to
1917, transmitido de pais children and grandchildren, 1917, transmitted of parents children and from children to
para filhos e de filhos para children for over four for children and children for grandchildren, over four
netos, ao longo de quatro generations, the teaching of grandsons, along four generations, the teachings of
gerações, os philosophy and the pleasure generations, the teachings of philosophy and the pleasure
ensinamentos da filosofia of transforming the finest the philosophy and the of transforming the best
e o prazer de transformar meats, fruits and vegetables. pleasure of changing the meats, fruits and vegetables.
as melhores carnes, frutas better meats, fruits and
e legumes. vegetables.
57
7. Types of mistakes
PT BABYLON SYSTRANet DeepL
Chouriça de Porco Chouriça Bisaro pork THE Chouriça of Bísaro Pig Bisaro Pork Sausage
Bísaro HOUSE OF Prisca, The House of the
wanted to further improve Primeval one, still Casa da Prisca, also
A Casa da Prisca, quis the quality of their wanted to improve the wanted to improve the
ainda melhorar a product, by means of its quality of its product, quality of its product,
qualidade do seu own production of meat through the proper through the production of
produto, através da 100% bisaro. Bisaro pork production of the meat its own meat 100%
produção própria da meat 100% Bísararace Bísara breed.
carne 100% raça Bísara.
Ingredientes Ingredients (83%), wine, Ingredients Ingredients
Carne de Porco spices, water, salt, Pork bísaro (83%), Bisaro Pork (83%), wine,
bísaro(83%), Vinho, vegetable extract, Garlic, Wine, spices, water, spices, water, salt,
especiarias, água, sal, sugar, dextrose. salt,vegetal extract, vegetal extract, garlic,
extrato vegetal, alho, garlic, sugar, dextrose. sugar, dextrose.
açúcar, dextrose.

58
7. Types of mistakes
PT BABYLON SYSTRANet DeepL
Características Nutritional characteristics Nutritional characteristics Nutritional characteristics
nutricionais Energy value (Kj/ Kcal): Energetic value Energy value (Kj/ Kcal):
Valor energético (Kj/ 1113/267 (Kj/Kcal): 1113/267 · 1113/267
Kcal): 1113/267 Lipids (g): 26.8 Lipid(g): 26.8 · - Lipids (g): 26.8
Lípidos (g): 26.8 Of which saturated (g): Of which saturated (g): - Of which saturated (g):
Dos quais saturados 10.05 10.05· 10.05
(g): 10.05 Carbohydrates (g): 1.97 Carbohydrates (g): 1.97 - Carbohydrates (g): 1.97
Hidratos de Carbono Of which sugars (g): 0.9 Of which sugars (g): 0.9 - Of which sugars (g): 0.9
(g): 1.97 Dietary Fibers (g): 0.81 Alimentary staple - Food fibres (g): 0.81
Dos quais açúcares (g): proteins (g): 24.4 fibres(g): 0.81 · - Proteins (g): 24.4
0.9 Salt (g): 2.8 Proteins (g): 24.4 · - Salt (g): 2.8
Fibras Alimentares (g): Salt (g):2.8
0.81
Proteínas (g): 24.4
Sal (g): 2.8

59
7. Types of mistakes
PT BABYLON SYSTRANet DeepL
Caracteristicas Microbiological Microbiological Microbiological
microbiológicas (Níveis characteristics (levels characteristics (Levels characteristics (Levels
estabelecidos pela established by law) established for established by legislation)
legislação) •*Salmonella: absence legislation) · - Salmonella: Absent in
Salmonella: Ausência in 25g •*Listeria: Salmonella: Absence in 25g
em 25g absence in 25g 25g · Listeria: - Listeria: Absent in 25g
Listeria: Ausência em •*coagulase +: Absence in 25g· - Staphylococcus
25g <5x10^2 •*Counting of coagulase +: <5x10^2
Staphylococcus
Staphylococcus microorganisms to 30ºC: - Microorganisms count at
coagulase +: <5x10 2·
coagulase +: <5x10^2 <5x10^6 •*counting of 30ºC: <5x10^6
Contagem de sulphite-reducing Counting of Micro- - Clostridium sulphite-
Microrganismos a 30ºC: Clostridium: <=1x10^3 organisms 30ºC:<5x10 6 reducing count: <=1x10^3
<5x10^6 · Counting of
Contagem de Clostridium sulfite-
Clostridium sulfito- reducing: <=1x10 3
redutores: <=1x10^3

60
7. Types of mistakes
PT BABYLON SYSTRANet DeepL
Validade: Validade Validity: Validity (days): Validity: Validity (days): Validity: Validity (days):
(dias): 120 120 120 120

Modo de Conservação: Conservation Mode: Keep Way of Conservation: Method of Conservation:


Conservar em local in a cool and dry place. To conserve in local dry Store in a cool and dry
seco e fresco. and fresh place.
Package ·
Embalagem Packaging •*Net Packaging
Peso Liquido: 180 Weight: 180 •*EAN: Weight I eliminate: 180· - Net Weight: 180
EAN: 5605466203306 5605466203306 EAN: 5605466203306 · - EAN: 5605466203306
Unidades por caixa: 15 •*Units per box: 15 Units forbox: 15 - Units per box: 15

Consumidores de Consumers of risk: a Consumers of risk: Consumers at risk:


risco: Grupo de group of people who are Group of intolerant Group of people intolerant
pessoas intolerantes a intolerant to sulphites people the sulfitos to sulphites
sulfitos

61
7. Types of mistakes
Babylon SYSTRANet DeepL
Terminology
Mistranslation
Omission
Addition
Untranslated
Spelling
Typography
Grammar
Unintelligible

62
8. Conclusions

63
8. Conclusions

64
8. Conclusions

65
8. Conclusions

66
8. Conclusions

67
9. References

68
9. References
Bowker, L. 2002. Computer-Aided Translation Technology. Ottawa: University of Ottawa Press.
Bahdanau, D., Kyunghyun, C., Bengio, Y. 2014. Neural Machine Translation by Jointly Learning to Align and Translate.
https://arxiv.org/abs/1409.0473
Bowker, L., y Fisher, D. 2010. “Computer-aided translation”. In Y. Gambier & L. van Doorslaer (Eds.), Handbook of Translation Studies: Vol. 1.
Ámsterdam / Philadelphia: John Benjamins, pp. 60-65.
Esselink, B. 2000. A Practical Guide to Localization. Amsterdam / Philadelphia: John Benjamins.
Folaron, D. 2010. “Translation tools”. In Gambier, Y. & L. van Doorslaer (Eds.), Handbook of Translation Studies. Vol. 1. Amsterdam /
Philadelphia: John Benjamins.
Forcada, M. L. (2010) “Machine Translation Today”. In Y. Gambier & L. van Doorslaer (Eds.), Handbook of Translation Studies. Vol. 1.
Amsterdam / Philadelphia: John Benjamins, pp. 215-223.
ISO (2015). ISO 17100, Translation services — Requirements for translation services.
ISO (2017). ISO 8587:2017: Translation services — Post-editing of machine translation output — Available at
https://www.iso.org/obp/ui/#iso:std:iso:18587:ed-1:v1:en
Torres-Hostench, Olga, Marisa Presas y Pilar Cid-Leal. (Coords.) (2016). El uso de traducción automática y posedición en las empresas de
servicios lingüísticos españolas: Informe de investigación ProjecTA 2015. Bellaterra: UAB.
UNE (2006). EN 15038:2006. Available at http://qualitystandard.bs.en-15038.com/
Zaretskaya, A., Corpas Pastor, G. y Seghiri, M. (2015). “Translators' requirements for translation technologies: a user survey”. En G. Corpas
Pastor, M. Seghiri, R. Gutiérrez, M. Urbano (Eds.), Nuevos horizontes en los Estudios de Traducción e Interpretación (Trabajos completos).
Geneve: Tradulex, pp. 247-254.
Zaretskaya, A., Corpas Pastor, G. & Seghiri, M. (2016). “Corpora in computer assisted translation: a users' view”. In G. Corpas Pastor & M.
Seghiri (Eds.), Corpus-based Approaches to Translation and Interpreting: from theory to applications. Frankfurt: Peter Lang, pp. 253-276.
Zaretskaya, A., Corpas Pastor, G. y Seghiri M. (2018). “User Perspective on Translation Tools: Findings of a User Survey”. In G. Corpas & I.
Durán (Eds.), Trends in e-tools and resources for translators and interpreters. Leiden/Boston: Brill, pp. 37-56.

69
Machine Translation and Post-
Editing: Is All That Glitters
Gold?

Universidade NOVA de Lisboa. Lisboa, 25th February 2019

Das könnte Ihnen auch gefallen