Beruflich Dokumente
Kultur Dokumente
Lei Chen, Miao Li, Jian Zhang, Zede Zhu, and Zhenxin Yang
1 Introduction
X. Shi and Y. Chen (Eds.): CWMT 2014, CCIS 493, pp. 49–60, 2014.
c Springer-Verlag Berlin Heidelberg 2014
50 L. Chen et al.
words choice and word order are two major errors of SMT systems. Therefore,
how to reduce the differences of word order and morphology between Chinese
and these minority languages is an important issue in SMT.
Taking Mongolian, one of Chinese minority languages, as an example, the pa-
per proposes a method which incorporates syntactic information of the source-
side and morphological information of the target-side. First, according to the
word alignment and the phrase structure trees of source language, reordering
rules are extracted automatically to adjust the word order at source side. And
then at target side, a morphological segmentation method is adopted to obtain
the morphological information of target language. The sentences consisting of
Mongolian words are used to construct Hidden Markov Model (HMM) asso-
ciated with the affixes. Based on this model, the affixes corresponding to the
Mongolian words to be segmented are found to obtain the stems. This method
transforms the segmentation problem into sequence labelling problem which is
able to reduce the segmentation ambiguity. Taking advantage of the reordered
source language and the segmented target language, a morpheme-level SMT sys-
tem can be constructed. The experimental results show that the morpheme-level
Chinese-Mongolian SMT system achieves 2.1 BLEU points increment over the
standard phrase-based machine translation system.
2 Related Works