Neural-Based Machine Translation System Outperforming Statistical Phrase-Based Machine Translation For Low-Resource Languages

Neural-Based Machine Translation System
Outperforming Statistical Phrase-Based Machine

Translation for Low-Resource Languages
Muskaan Singh1 ,Ravinder Kumar2 , Inderveer Chana3
1,2,3
Language Engineering and Machine Learning Research Lab, CSED
Thapar Institute of Engineering and Technology(TIET)
Patiala, India
muskaan singh@thapar.edu, ravinder@thapar.edu, inderveer@thapar.edu
Abstract—Natural Language Processing(NLP) involves the devel- language i.e Source text into other natural language i.e Target
opment of computational models that aid in the development of text. Various approaches can be followed to automate this
automated tools for processing and generating natural language. process of translation such as linguistic rich approach i.e
Human developing these computational models require deep
insight of linguistic knowledge and are a time consuming process. Rule-based [3] involving Transfer-based mechanism [4] and
Hence, to automate this process and accelerate the computational Interlingua mechanism [5] to corpus-based approach that is
science we use a data-driven approach i.e Statistical learning entirely based on corpus i.e Statistical Phrase-based [6] and
and Deep Learning. For devolving and sharing of information Neural-based [7]. Even there are some approaches that is less
in natural language and making it accessible in other natu- used nowadays i.e Example-based approach [8], Knowledge-
ral languages, Machine Translation(MT) is entailed. It is an
application of NLP. Sanskrit being ’father of informatics’ [1] based [9].
was considered as ”lingua franca” of world intellectuals [2]. According to the census of India 2018, India States and Union
It is also an important language in the Indo-European family territories Sanskrit is mother tongue of 24,821 and Hindi of
and considered as truly ”donor” language of India. It has vast 52,83,47,193 i.e 43% of total languages in India. Sanskrit is
knowledge reserves in different discipline of studies such as considered as the mother or ’donor’ of all Indian languages
Ayurveda, astronomy, literature etc. MT makes this rich language
available to others with help of the computer. We have proposed with holding a rich grammar confined by Panini near 2500
and presented the prominent Deep Neural-based MT system for years ago formulating 3,949 rules, which were extended later
translation of Sanskrit to Hindi. We also present a comparison on.Sanskrit has strongest and simple non-ambiguous grammar.
of Neural MT outperforming Statistical baseline system for this Reasons for choosing Sanskrit for translation purpose is the
language pair. richness of scientific literature with extensiveness and com-
Index Terms—Neural Machine Translation, Statistical Machine prehensive analysis, structured approach and the traditional
Translation, Keras,LSTM,MT-Hub, Sanskrit,Hindi grammar. Many people have attempted to write a grammar
for Sanskrit language using the Paninian framework and used
I. I NTRODUCTION it to develop translation system [10]. The Panini focused on
decoding the information contained in the language string of
Natural Language Processing(NLP) involves the development particularly given input language by Karaka(syntax-semantic)
of computational models that aid in the development of relations and not thematic roles [2]. It also highlights the
automated tools for processing and generating natural lan- importance of case markers, postpositions and word-order. The
guage. Human developing these computational models require central element of semantic model in Paninian framework is
deep insight of linguistic knowledge and is time-consuming that every verbal root (dhaatu) denotes an action consisting of
process. Hence, to automate this process and accelerate the activity (or vyaapara), and a result (or phala). The result is a
computational science we use a data-driven approach i.e Statis- state which is reached after completion of an action. An activ-
tical learning and Deep Learning. NLP involves technological ity consists of steps that are performed by different participants
motivation for developing intelligent computer systems such or Karakas involved in action. The concept of Karaka relation
as machine translation, understanding printed and handwritten is central theme to Paninian grammar. These Karaka relations
text, text analysis, speech understanding system, text under- are referred as syntactic-semantic relations, and on surface
standing system and natural language interface to database etc. level, it highlights syntactic information and also captures
There is linguistic and cognitive motivation for development semantic information at a deeper level. The Sanskrit grammar
of this approach. It has various applications such as Machine is termed as ’Father of Informatics’ as it builds a relationship
Translation(MT), Text summarization, Information extraction between speech and utterance of speaker and meaning derived
etc. Machine Translation is one such applications of Natural by listener. Hence, primary objective of Paninian Grammar is
Language Processing(NLP). It aims to translate one natural to form a theory of human natural language communication.
978-1-7281-3591-5/19/$31.00 ©2019 IEEE

Sanskrit and Hindi belong to same Indo-Aryan family. They Sanskrit language to other languages. One of the primary
both have structural and lexical similarity as Hindi inherits requirement in Sanskrit translation domain is to translate
from Sanskrit. Sanskrit has the rich and structured grammar in the life-transforming stories(epics), Vedas, etc. to make them
the form of Panini Astadhayayi whereas in Hindi such parallel accessible in other languages. A major issue which arises in
grammar does not exist. Therefore, it becomes difficult to the implementation of Sanskrit based MTS is the approach to
map the divergence between these two languages. The non- develop MTS. As currently, the Sanskrit-based systems are
existence of such grammar leads to exceptional cases which using a dictionary or rule-based approaches which are not
are not covered in linguistic generalizations such as vibhakti easily extendable and are time-consuming. The challenging
in Hindi. The cases where vibhakti in Sanskrit and Hindi task in translation field for Sanskrit is its morphologically
diverges are optional, Exceptional,Differential, Alternative, rich features that consist of various modules, resources, and
Non-Karaka, Verb and Complex-predicate divergence. Despite tools. Thus, it is a complex system. Sanskrit has richness
of rich grammar, choosing Sanskrit as a source language is of scientific literature with in-depth analysis for cognitive
difficult because parsing fails due to its synthetic nature in knowledge description. Sanskrit has extendability to other
which single word can run upto 32 pages. A lot of work languages. The phonetic basis of the Sanskrit language is
is required to explore the potential of this language to open useful for speech analysis as well [11]. Research in the field
perspectives in computational linguistic domain. There are of Sanskrit machine translation is in early stages. Some of
some systems which are confined to specific domains and too the developed systems to provide computation and translation
short sentences and phrases. Due to the morphological richness of Sanskrit language include Sanskrit Translator(2009) which
of Sanskrit, it becomes a challenging task in MTS. There [12] uses example based technique to translate from English
are some limited systems for English to Sanskrit translation. to Sanskrit using various different modules. Merging Rule-
However, more work in this field is highly desirable. based approach with Artificial Neural Network (2010) by
[13]for translating English to Sanskrit language was devel-
A. Contribution oped. English to Sanskrit Machine Translation and Synthe-
sizer System(2010) by [14] using dictionary based approach
Contribution of this paper includes: which includes speech synthezier in translation process was
developed. English to Sanskrit Machine Translator system
• A novel Deep learning framework has been developed to
was formed (2010) by [15] using rule based approach and
train the MT system for Sanskrit to Hindi language.
modules such as lexical parser, semantic Mapper, translator
• A Parallel corpus of Sanskrit-Hindi with 102,760 parallel
and composer. [16] formed a complete english to Sanskrit
sentences was developed.
speech translation system. It was later enhanced in (2012)
• Training the translation system on Statistical-Phrase
by [17] merging rule-based approach. E-Trans system (2012)
based platforms i.e Moses and MT Hub.
by [18] suggests using rule based approach from English to
• Empirically proving the performance of proposed ap-
Sanskrit using synchronous context free grammar.The results
proach based on automatic evaluation measure i.e BLEU.
are quite promising for small and large sentences. A system for
• Comparative analysis of Neural MT outperforming the
translating Sanskrit to English,i.e TranSish(2014) by [19] uses
traditional baseline statistical phrase based system. We
rule-based approach but only for present tense.This system
have also formulated case study of previous corpus-
was extended for other tense sentences. These research work
based approaches to Sanskrit-Hindi MT and extended
were useful for sharing worldwide knowledge of Sanskrit
their approach.
with traditional translation approaches. Recent approaches to
Sanskrit language translation were performed using statistical
B. Paper Organization
based approach such as English to Sanskrit Machine Translator
The paper remaining is organized in following sections as, with Ubiquitous Application (2012) by [20]. It uses features
Section 2 presents background of work performed for pro- like phrase translation probability, inverse phrase translation
cessing Sanskrit language to other languages and for Sanskrit- probability, lexical weighting probability, phrase penalty, lan-
Hindi translation. An elaborate proposed methodology with guage model probability, distance based distortion model,
statistical phrase-based training and deep Recurrent Neural word penalty to train the system. All of this research work
Network(RNN) is described in Section 3 followed by result were based on processing Sanskrit language to English or
analysis in Section 4 and finally Section 5 concludes the paper vice-versa. There was an ample amount of work for Sanskrit
along with future Scope. to Hindi translation including a rule-based system and a
statistical-based system.
II. BACKGROUND The work for Sanskrit to Hindi MT system include [21]
in 2015 for translating children stories(e-learning content,
India is a country of vast language diversity, with more than multimedia for Kids) using rule-based approach. It will be
1.3 billion people.There are 29 major languages written in extended to yoga and Ayurveda domain.Though this is not yet
13 scripts, Sanskrit being the origin of all almost all Indian available as it is under progress. Another system of translation
languages. Thus, there is a need to provide a translation of and computational tools of Sanskrit i.e Samsaadhani(2009) by
TABLE I speed approximately 2500 words per second. This speed is
PARALLEL AND M ONOLINGUAL DATASET FROM DIFFERENT DOMAINS not possible for normal systems because in this one epoch
Domain Parallel Corpus Monolingual Corpus will take approximately two hours to run. So we use a highly
News 25,000 202,269 configured GPU along with NVIDIA Geforce GTX 980 GPU
Health care NA 5,000 as we trained the system for update of 20 times termed as
Tourism NA 15,395
Literature 28,760 50,000 ’epochs’ of size 64 sentences each time.
Wikipedia NA 259,305
Judicial domain NA 152,776 C. Statistical Phrase-Based Model for Sanskrit to Hindi
General Domain 49,000 36,000
Translation on Moses and MT-Hub
TABLE II Statistical MT has been in research since 1980’s, though
A DDITIONAL M ONOLINGUAL DATASET
later phrase-based translation exhibited better performance
# Lines 221528 [32]. Phrase-based model were based on phrase alignment
#Words 2849514 template [33] and [34] suggested imposing syntactic structure
#Characters 38413350
#Total 2.8 Million
on phrases was not effective.The phrase translation model is
based on Baye’s Noisy channel using Language model and
Translation Model result in translation probability given a
[22] [23] [24] for translating Sanskrit to Hindi also uses rule- source sentence in Sanskrit(s) Language and Target sentence
based linguistic approach. Recent attempts using statistical in Hindi(h) Language.
approach for translating Sanskrit to Hindi have been proposed. argmaxebest P (s|h) = argmaxe p(s|h)p(h) (1)
The system is trained on MT-Hub platform [25] and on Moses
[26] resulting in significant improvement in Sanskrit language where p(h) is n-gram Language model trained on Monolingual
computations. In our work, we have extended this previous Hindi corpus , p(s|h)is Translation model trained on parallel
work, on statistical platforms of Moses and MT-Hub using corpus of Sanskrit-Hindi and mathematically calculated as,
more parallel data. We have also implemented Neural MT Y
I
system using keras on Tensorflow platform details have been p(s−I −I

1 |h1 ) = φ(s̄i |h̄i )d(starti − endi−1 − 1) (2)
included in further section. i=1
where d(.)is the distortion probability with arguments starti
III. P ROPOSED M ETHODOLOGY starting position of ith and endi−1 ending position of the
(i − 1)t h position of Hindi phrase in Sanskrit Phrase. φ is
A. Dataset phrase translation probability. From the training corpora, most
probable phrases would be chosen as output translation and
Initially, parallel and monolingual corpus was created with handle all ambiguity occurring at translation level. Inorder,
different domains details have been mentioned in Table:1 and to optimize the translated output it adds word cost Ω for
Table:2. Some of parallel corpus was collected from Indian each generated Hindi word in addition to language model.This
Language Corpora Initiative(ILCI) [26] and Department of factor biases toward larger output and is usually larger than 1
Public Health relations [27]. Remaining parallel data was inorder to optimize performance.
manually created of Srimad Bhagavad Geeta. We have trained
1) Training on Moses: The training on Moses platform [29]
Machine Translation System across 3 Platforms i.e MT-Hub
is performed. It provides an complete toolkit for academic
[28], Moses [29] for Statistical Phrase based Machine Transla-
research. It consist of all the components to develop translation
tion and keras [30] with Tensorflow [31] for Neural Machine
system such as pre-processing of data, building language
Translation.
model, translation model,Tuning using minimum error rate
training and evaluation of translated output. It is also compat-
B. Experimental Setup ible to external tools such as GIZA++,SRILM,IRSLTM. It is
easy to customize and requires little setup time. We perfomed
After gathering the dataset, system was trained for phrase- the experiment for Sanskrit-Hindi. The input sentence is
based statistical machine translation system on Moses setup fragmented into phrases, assuming uniform probability on all
as well as MT-Hub platform.The model was tuned using a phrases, each of this phrase is translated into Hindi Language
additional corpus for tuning. Phrase tables were generated on which is reordered later. The reordering is performed by
training parallel corpora for Statistical Machine Translation. distortion joint probability distribution performed on Moses
The generated phrase table is merged with the phrase table which follows a sequence of steps. After the creation and
from tuning data and test data. Neural-based Machine Trans- gathering of data, it was pre-processed by 3-step process i.e
lation System is trained on Keras [30] with Tensorflow [31], tokenization, true-casing and cleaning.
a system for large scale machine learning at its Backend. The
Proposed Neural model is processed through highly configured • Tokenization: Meaningful tokens were created by adding
core GPU with 32 GB of RAM to achieve a high throughput spaces between words and punctuations.
• True casing: Conversion of initial words to most-probable • Encoder- The model given a source sentence of Sanskrit
casing. The data converted into weight matrices becomes from i = 1......n words as coded into vectors as input,
less sparse on applying true-casing.
• Cleaning: Empty sentences, very long sentence, too short, S = (s1 , s2 , ....sn ), si ∈ RKs (4)
misaligned sentences were removed from the training and as Hindi output translated sentence containing
data. i1.....m words coded as output vector,
The data prepared is converted into format suitable for H = (h1 , h2 , ...hm ), hi ∈ RKh (5)
GIZA++ [35]. It generates two vocabulary files containing
words, integers word identifiers ,word count and converts par- Here, Ks is vocabulary size of source and Kh of target
allel corpus into number. Sentence alignment is performed by language. m and n are the length of sentence. The forward
running GIZA++ in parallel as it is time-consuming process. states for Bidirectional RNN are computed by using
The monolingual corpus is loaded for building the language update, reset and combination gates.
model using IRSLTM [36]. It improves the fluency of the ~ ug~hi−1 )
~ i ) = φ(Wug Esi + U
U pdate gate(ug (6)
translated output. On the other hand, parallel corpus is loaded
to build translation model. It involves alignment of words, ~ i ) = φ(W
Reset Gate(rg ~ rg~hi−1 )
~ rg Ēsi + U (7)
lexical translation, extraction of phrases, scoring of phrases,
reordering model, generation model and configuring model. ~hi = tanh(W
~ Ēsi + U ~ ⊙ ~hi−1 ])
~ [rg (8)
The model created is stored in the system, which is used by
Next state is computed for all the i word vectors of a
decoder. There are few problems with translation model: 1.
sentence,
It is very slow to load 2. Weights used by different models
are not optimized. In order to overcome these problems we ~hi = (1 − U
~ i ⊙ ~hi−1 + ug
~ i ⊙ ~hi ) (9)
perform tuning on the model, training of weights based on Ks
different parallel data. We can then test the system by giving Here,φ is the logistic sigmoid function Ē ∈ Rd× is
any Sanskrit sentence and translated Hindi output and running matrix of word embedding dimension. where d is word
the BLEU script to calculate the translation accuracy i.e 51 embedding dimension and n is the number of hidden
~ ,W
units. W ~ ug , W
~ rg ∈ Rn×d , and U
~,U
~ ug , U
~ rg ∈ Rn×n
2) Training on MT-Hub: We trained Statistical phrase-based The backward states are computed similarly using update,
system on Microsoft Translation Hub [37], a platform for reset and combination gate.
building the statistical machine translation system. It involved,
sequence of steps such as loading the corpus and dictionaries U pdate gateU g i = φ(Wu Ehi + Uu hi−1 ) (10)
and then start the training, which would output the BLEU
Reset Gaterg i = φ(Wrg Ēhi + Urg hi−1 ) (11)
score [38]. It provides an hub of translation through cloud.
After training, we are able to deploy the system, evaluate and hi = tanh(W Ēhi + U [rg ⊙ hi−1 ]) (12)
re-train the system. This platform also provides us with the
option of review and corrections for improving the translation. Next state is computed for all the i word vectors of a
sentence,
hi = (1 − Ui ) ⊙ hi−1 + ug i ⊙ hi (13)
D. Neural Machine Translation for Sanskrit to Hindi
The forward and backward hidden computations are
The proposed model uses Long Short Term Memory units [39] concatenated ~hkh to form an annotation vector (hi =
containing explicit memory state with input, forget and output h1 , h2 , ...hT s ).
gate. it learns long term dependency better and forming a better • Alignment Model- The annotation vector of hidden state
mode. A vector of LSTM cells forming a layer of LSTM. It is hij from the Encoder for source sentence is given for
possible by node value of above layer it , from previous time computation of decoder hidden state Dhi . The context
step value of hidden layer ht−1 . The input value formed by vector computed by alignment model ci which is required
combining weight matrix multiplication of input node W i and for computation of decoder hidden state.
of hidden layer W h
eij = zaT (tanh(Wai dhi−1 + Uai hij )) (14)
cell input = f (W s st + W h ht−1 ) (3) where za is Rn‘ ,Uai = Rn‘×2n ,Wai = Rn‘×n are weight
matrices
exp(eij )
LSTM are trained in same way by unrolling the recurrent αij = PT x (15)
neural network having derivative equal to 1 using back- k=1 exp(eik )
propagation through time. Hence,we compute update function
X
Tx
and with respect to all parameters of model compute gradient ci = αij hj (16)
for objective function. j=1
Platform Parallel Corpus Monolingual Corpus BLEU(%)
TABLE III Moses 102,760 942,273 58%
D IMENSIONALITY OF THE PARAMETERS IN THE MODEL
MT-Hub 102,760 942,273 45%
NMT 145,34,215 NA 60.4%
Parameter Dimensionality
x (Word Embedding) 1000
n (LSTM hidden state) 1000
d(dimension of word embedding) 620 from Gaussian distribution. Stochastic gradient descent train-
v(Output maxout hidden layer) 500
n‘(alignmen model hidden units) 1000 ing with mini-batch of 64 sentences using Adam [43] is
Ws 32 × 400 performed to minimizes the categorical loss and automatically
Wh 32 × 1 adapt learning of each parameter rate (ρ = 0.95and ǫ = 10−6 ).
Wug 64 × 400
Wrg 65 × 2 We performed normalization for the gradient cost function of
threshold ∈ 1 [44]. After this model was fit to this multi-class
classification problem. We trained the model for 20 epochs for
• Decoder- The next hidden state of decoder is computed update equal to longest sentence in mini-batch.The training
with Context vector Ci , previous generated word hi−1 data was shuffled after every update inorder to minimize the
and previous hidden decoder state dhi−1 error known as normalization of layer. This trained model was
evaluated based on translated output sequence and reference
ugi = φ(Wug Ehi−1 + Uug dhi−1 + Cug ci ) (17) translation by running BLEU script. All of these three exper-
rgi = φ(Wrg Ehi−1 + Urg dhi−1 + Crg ci ) (18) iment carried out were compared based on BLEU score and
results were shown in section 5.
d¯hi = tanh(W Ehi−1 + U [rgi ⊙ dhi−1 ] + Cci ) (19)
dhi = (1 − ugi ) ⊙ dhi−1 + ugi ⊙ d¯hi (20) IV. R ESULT A NALYSIS
Here, E is matrix of word embedding target language. To evaluate the performance of this proposed corpus-based
W, Wrg , Wug = Rn×d are weight matrix with n as approach on different platforms i.e Moses, MT-hub and Keras
number of hidden nodes and m as word embedding based on BLEU Score as shown in Table 3. The Proposed
dimension and context matrix as C, Cug , Crg ∈ Rn×2n . system is also compared with previous developed system sahit
[25] [26] as shown in Figure 3. Bleu score is an important
The probability of Hindi word is defined as,
metric used for calculating the accuracy of translated sentences
ti = [max(t̄i,2j−1 , t̄i,2j )]Tj=1,..v (21) as compared to the human generated reference translations. It
is not good for shorter translations but it provides accurate
prob(hi |dhi , hi−1 , ci ) ≈ exp(hTi W0 ti ) (22) results for longer sentences. Normally BLEU Score values lies
vector t is computed by Wo = RKh×v weight embedding between 0 and 1, simply multiplying it to 100, its percentage
matrix of output hidden state, Uo = R2v×n as output weight can be calculated. It is observed that higher the BLEU score
matrix and Co = R2v×2n , where v is word embedding value, model is more accurate. Formula of BLEU Score is as
output dimension and n is number of hidden layers. These follows:
computations are formed as maxout hidden layer [40] and deep output length Y
4
output [41] BLEU = min(1, )( precisioni )
ref erence length i=1
t̄i = Uo dhi−1 + V0 Ehi−1 + Co ci (23) (24)
Finally, We trained our data set for NMT using Keras
with Tensorflow at its backend. We have used the approach We also performed linguistic error analysis and found the sys-
proposed by [7].For training we require parallel dataset, it tem was not able to handle the translation cases of compound,
is prepared by cleaning and splitting the dataset. It removes complex, Sandhi and karaka level. Rest, in almost cases trained
all white-spaces, punctuations, unicode characters, numerals, systems are producing fluent output.
converting uppercase to lowercase etc. These tokens are then
encoded into integers and padded to maximum phrase length. V. C ONCLUSION
We convert the input sequence into one-hot embedded output
sequence as model would predict probability of each word The proposed system is based on corpus approach i.e to
in vocabulary.We used Encoder-decoder LSTM [42] model train the data based on corpus to give maximum probabil-
for predicting translation. Here, encoder embeddes the input ity. We trained the system across the traditional statistical
sequence in front-end model and at backend model decodes phrase-based approach on different platform i.e Moses and
word-by-word. The model defined takes various parameters MT-Hub and new prominent Deep learning based Neural
stated in Table: III like input and output maximum length, Machine Translation using Keras on Tensorflow. For the
size,number of phrases as well as memory units.The different proposed corpus-based approach an huge amount of corpus
weight matrices used are orthogonal.The matrices used for was collected across different domains. The proposed system
alignment are taken with variance = 0.01 and mean = 0 gives a fluent translation. The result exhibit Neural Network
[11] P. Ramanujan, “Computer processing of sanskrit,” Computer Processing
70% Of Asian Languages CALP-2, IIT Kanpur, 1992.
60%
[12] V. Mishra and R. Mishra, “Divergence patterns between english and
sanskrit machine translation,” INFOCOMP, vol. 8, no. 3, pp. 62–71,
50%
2009.
[13] ——, “Ann and rule based model for english to sanskrit machine
40%
translation,” INFOCOMP, vol. 9, no. 1, pp. 80–89, 2009.
30% [14] D. Mane, P. Devale, and S. SURYAWANS, “A design towards english
to sanskrit machine translatio and sy thesizer syste 1 si grule base
20% approach,” 2010.
10% [15] V. Barkade and P. R. Devale, “English to sanskrit machine translation

semantic mapper,” International Journal of Engineering Science and
0% Technology, vol. 2, no. 10, pp. 5313–5318, 2010.
MT-Hub Moses NMT MT-Hub Moses NMT
sahit Proposed approach
[16] P. Shukla and A. Shukla, “English speech to sanskrit speech (esss) using
rule based translation,” International Journal of Computer Applications,
vol. 92, no. 10, 2014.
Fig. 1. Comparison of results for ”sahit” and proposed system [17] S. G. Rathod and S. Sondur, “English to sanskrit translator and syn-
thesizer (etsts),” International Journal of Emerging Technology and
Advanced Engineering, vol. 2, no. 12, 2012.
with accuracy of 60.4% outperforms statistical phrase based [18] P. Bahadur, A. Jain, and D. Chauhan, “Etrans-a complete framework for
approach with 58% accuracy on Moses and 45% on MT-Hub. english to sanskrit machine translation,” in International Journal of Ad-
In the future, we would extend our system for translation by vanced Computer Science and Applications (IJACSA) from International
Conference and workshop on Emerging Trends in Technology. Citeseer,
merging statistical phrase-based learning approach with deep 2012.
recurrent Network. Also, linguistic feature extraction from
[19] P. Upadhyay, U. C. Jaiswal, and K. Ashish, “Transish: Translator
rule-based to train deep recurrent neural network forming from sanskrit to english-a rule based machine translation,” International
hybrid system for more optimized performance would be Journal of Current Engineering and Technology E-ISSN, pp. 2277–4106,
developed and presented. 2014.
[20] S. R. Warhade, S. H. Patil, and P. R. Devale, “English-to-sanskrit
statistical machine translation with ubiquitous application,” International
R EFERENCES Journal of Computer Applications, vol. 51, no. 1, 2012.
[21] G. N. Jha, “The tdil program and the indian langauge corpora intitiative
[1] R. R. Nair and L. S. Devi, Sanskrit Informatics: Informatics for (ilci).” in LREC, 2010.
Sanskrit studies and research. Centre for Informatics Research an
dDevelopment, 2011. [22] S. S. Nair and A. Kulkarni, “The knowledge structure in amarakośa,”
in Sanskrit Computational Linguistics. Springer, 2010, pp. 173–189.
[2] A. Bharati and A. Kulkarni, “Sanskrit and computational linguistics,” in
First International Sanskrit Computational Symposium. Department of [23] A. Kulkarni. (2002) Samsaadhani, a sanskrit computational toolkit.
Sanskrit Studies, University of Hyderabad, 2007.
[24] A. Kumar, V. Mittal, and A. Kulkarni, “Sanskrit compound processor,”
[3] M. L. Forcada, M. Ginestı́-Rosell, J. Nordfalk, J. ORegan, S. Ortiz- in Sanskrit Computational Linguistics. Springer, 2010, pp. 57–69.
Rojas, J. A. Pérez-Ortiz, F. Sánchez-Martı́nez, G. Ramı́rez-Sánchez,
and F. M. Tyers, “Apertium: a free/open-source platform for rule-based [25] R. K. Pandey and G. N. Jha, “Error analysis of sahit-a statistical sanskrit-
machine translation,” Machine translation, vol. 25, no. 2, pp. 127–144, hindi translator,” Procedia Computer Science, vol. 96, pp. 495–501,
2011. 2016.
[4] G. Noone, “Machine translation a transfer approach,” Computer Science, [26] R. Pandey, A. K. Ojha, and G. N. Jha, “Demo of sanskrit-hindi statistical
Linguistics and a Language (CSLL) Department, University of Dublin, machine translation system,” arXiv preprint arXiv:1804.06716, 2018.
Trinity College, Final Rep, 2003. [27] S. News. (2018) Sanskrit News,”Department of Public Health Rela-
[5] S. Dave, J. Parikh, and P. Bhattacharyya, “Interlingua-based english– tions”, url = http://mpinfo.org/News/SanskritNews.aspx, urldate = 13-
hindi machine translation and language divergence,” Machine Transla- 1-2019.
tion, vol. 16, no. 4, pp. 251–304, 2001. [28] D. Kenny and S. Doherty, “Statistical machine translation in the trans-
[6] P. F. Brown, V. J. D. Pietra, S. A. D. Pietra, and R. L. Mercer, “The lation curriculum: overcoming obstacles and empowering translators,”
mathematics of statistical machine translation: Parameter estimation,” The Interpreter and translator trainer, vol. 8, no. 2, pp. 276–294, 2014.
Computational linguistics, vol. 19, no. 2, pp. 263–311, 1993. [29] P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico,
[7] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens et al., “Moses: Open
jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, source toolkit for statistical machine translation,” in Proceedings of the
2014. 45th annual meeting of the ACL on interactive poster and demonstration
sessions. Association for Computational Linguistics, 2007, pp. 177–
[8] H. Somers, “Example-based machine translation,” Machine translation, 180.
vol. 14, no. 2, pp. 113–157, 1999.
[30] F. Chollet et al., “Keras: Deep learning library for theano and tensor-
[9] S. Nirenburg, “al.(1989) kbmt-89 project report,” Center for Machine flow.(2015),” There is no corresponding record for this reference, 2015.
Translation, Carnegie Mellon University, Pittsburg, p. 286, 1989.
[31] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin,
[10] A. Bharati, V. Chaitanya, and R. Sangal, “Paninian framework and its S. Ghemawat, G. Irving, M. Isard et al., “Tensorflow: a system for large-
application toanusaraka,” Sadhana, vol. 19, no. 1, pp. 113–127, 1994. scale machine learning.” in OSDI, vol. 16, 2016, pp. 265–283.
[32] P. Koehn, F. J. Och, and D. Marcu, “Statistical phrase-based translation,”
in Proceedings of the 2003 Conference of the North American Chapter
of the Association for Computational Linguistics on Human Language
Technology-Volume 1. Association for Computational Linguistics, 2003,
pp. 48–54.
[33] F. J. Och, C. Tillmann, and H. Ney, “Improved alignment models for
statistical machine translation,” in 1999 Joint SIGDAT Conference on
Empirical Methods in Natural Language Processing and Very Large
Corpora, 1999.
[34] K. Yamada and K. Knight, “A syntax-based statistical translation model,”
in Proceedings of the 39th annual meeting on association for compu-
tational linguistics. Association for Computational Linguistics, 2001,
pp. 523–530.
[35] L. Tian, F. Wong, and S. Chao, “Word alignment using giza++ on
windows,” Machine Translation, 2011.
[36] M. Federico, N. Bertoldi, and M. Cettolo, “Irstlm: an open source toolkit
for handling large scale language models,” in Ninth Annual Conference
of the International Speech Communication Association, 2008.
[37] Microsoft. (2018) Microsoft Translator Hub, url =
https://hub.microsofttranslator.com/, urldate = 13-1-2019.
[38] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “Bleu: a method
for automatic evaluation of machine translation,” in Proceedings of
the 40th annual meeting on association for computational linguistics.
Association for Computational Linguistics, 2002, pp. 311–318.
[39] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[40] I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Ben-
gio, “Maxout networks,” arXiv preprint arXiv:1302.4389, 2013.
[41] R. Pascanu, C. Gulcehre, K. Cho, and Y. Bengio, “How to construct
deep recurrent neural networks,” arXiv preprint arXiv:1312.6026, 2013.
[42] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares,
H. Schwenk, and Y. Bengio, “Learning phrase representations using
rnn encoder-decoder for statistical machine translation,” arXiv preprint
arXiv:1406.1078, 2014.
[43] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
arXiv preprint arXiv:1412.6980, 2014.
[44] Y. Bengio, N. Boulanger-Lewandowski, and R. Pascanu, “Advances in
optimizing recurrent networks,” in 2013 IEEE International Conference
on Acoustics, Speech and Signal Processing. IEEE, 2013, pp. 8624–
8628.

Neural-Based Machine Translation System Outperforming Statistical Phrase-Based Machine Translation For Low-Resource Languages

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Neural-Based Machine Translation System Outperforming Statistical Phrase-Based Machine Translation For Low-Resource Languages

Hochgeladen von

Copyright:

Verfügbare Formate

Neural-Based Machine Translation System

Outperforming Statistical Phrase-Based Machine

978-1-7281-3591-5/19/$31.00 ©2019 IEEE

system using keras on Tensorflow platform details have been p(s−I −I

10% [15] V. Barkade and P. R. Devale, “English to sanskrit machine translation

Das könnte Ihnen auch gefallen