Beruflich Dokumente
Kultur Dokumente
Meera M Sony P
College of Engineering, Cherthala, Managed by IHRD College of Engineering, Cherthala, Managed by IHRD
Established by the Government of Kerala Established by the Government of Kerala
Email: mmeera18@gmail.com Email: spsony@gmail.com
V. P ROPOSED S YSTEM
The proposed system presents a rule based direct machine Fig. 3: Architecture
translation system which performs a unidirectional machine
translation from English to Malayalam and Hindi with the help
of a bilingual dictionary. For this system source language is
English and target languages are Malayalam and Hindi. Since A. Sentence Analysis Phase
it works for more than two languages it is a multilingual
translation system. Apart from the previous related works in English sentence has twelve tenses. The data inputted to the
the area of machine translation, the proposed system focus system may be sentence, Wh question or yes/no question and
on word sense disambiguation [11] and rst order predicate belongs to one of the twelve tense forms. Accurate translation
logic(FOPL) based semantic checking. can be generated by analyzing the nature of the input. For that
in this phase English data undergoes tokenization and parts of
The proposed system introduces verb classication based speech tagging [7]. Using the obtained tags, category of the
FOPL rule for semantic checking, parse tree node numbering sentence is analyzed for further processing.
224
parser is converted into FOPL format as: x, y, subject(x)
verb(y) cando(x, y). The predicate evaluates to true only if
x, y belongs to valid combination of subject, verb classes. The
true value of the predicate indicates the semantic correctness
of the sentence.
For example if the inputted sentence are i)Monkey ate a
banana and ii) Banana ate a Monkey, the dependency tree
obtained are shown in Fig. 7.
i)Monkey ate a Banana ii)Banana ate a Monkey
225
previously identied sentence structure and tenses. Since Hindi Recall=Number of word in the candidate solution correctly
language follows the gender matching rule, for English to aligned with reference / Number of words in reference solution
Hindi translation an extra gender processing step is used which
add post position based on the gender of the next word.
1) Disambiguation: Disambiguation [11] is an open prob-
lem in natural language processing, which governs the process
of identifying which sense of a word or preposition which is
used in a sentence, when the word/preposition has multiple
Fig. 9: Observation Result
meanings.
VII. C ONCLUSION
In this work Weka tool is used for the disambiguation of
prepositions and words [11]. The sentence used for training This work introduces an effective methodology for English
purpose is converted into a feature vector containing four to Malayalam and Hindi translation based on the rule based ap-
elds. The four elds include previous tag, word, next tag, proach. The proposed translation system can successfully work
sense. The NNge(Non-Nested Generalized Exampler) classier for almost all simple sentences in their twelve tense forms,
in Weka is used to classify these vectors based on sense eld. their negatives and question forms. Apart from other translation
The result from the classier is converted into rule format for systems it considers the semantic and disambiguation, and is
further processing in the subsequent steps. a success for these. As the result of combining the newly
introduced method with machine translation the evaluation
2) Stemming : In this step each word is converted into its result shows an accuracy of 74% with harmonic mean(Fmean)
root form by deleting the afxes such as ed, s, es, ing, s. of .74. The languages like Malayalam and Hindi are very much
Spelling rules in English are used in reverse form to perform morphologically rich and agglutinative, the performance can be
stemming on each words This step helps for morphology further improved by adding more morphological inections to
generation and dictionary lookup. The split afxes are stored the system.
separately and used at the time of morpheme generation.
3) Dictionary Lookup: This step is used to nd the correct R EFERENCES
translation of a single word in the source sentence into its [1] Antony P. J, Machine Translation Approaches and Survey for Indian
corresponding target word.To decrease the dictionary search Languages , Computational Linguistics and Chinese Language Process-
time the contents are organized in 26 les based on their ing,Vol. 18, No. 1, March 2013, pp. 47-78
starting letter. [2] Mary Priya Sebastian,K. Sheena Kurian,G. Santhosh Kumar , Align-
ment Model and Training Technique in SMT from English to
4) Morphology generation: Analyze the tense forms and Malayalam,Springer-Verlag Berlin Heidelberg 2010, IC3 2010,Part
prepositions in the sentence to make the morphological vari- I,CCIS 94,p.305-315
ations. Morphology generation is performed using Unicode [3] Mallamma V. Reddy, M. Hanumanthappa , Indic Language Machine
Translation Tool: English to Kannada/Telugu,Proceedings of Multimedia
processing [4]. Adjacent words are compared to nd the Processing, Communication and Computing Applications,Springer India
Unicode at the beginning and the end of the morphemes. 2013,p.200-213
By using this Unicode combination rule new morphemes are [4] Karin Kipper, Anna Korhonen, Morphological Analyzer for Malayalam
generated. Using Machine Learning, Language Resources and Evaluation ,Volume
42, Issue 1 , p.21-40
VI. O BSERVATION AND R ESULT [5] Karin Kipper, Anna Korhonen, A large-scale classication of English
verbs,Language Resources and Evaluation, Issue 1, Volume 42, p.21-40
Since it is a rule based translation approach accuracy of [6] Nisheeth Joshi, Hemant Darbari, Human and Automatic Evaluation
translation mainly depends on the correctness of dened rules. of English to Hindi Machine Translation Systems, Proceedings of
For disambiguation Weka tool is used for creating a rule. the Second International Conference on Computer Science Engineer-
ing and Applications,Springer-Verlag Berlin Heidelberg 2010, Volume
Disambiguation can be done more accurately with sufcient 166,2012,AISC 106,p.423-432
amount of training sentences. Sometimes the structure rear- [7] Remya Rajan, Remya Sivan, Remya Ravindran, K.P Soman. Rule
rangement phase creates erroneous result because of the free based machine translation from english to malayalam. In Conference
word order nature of Malayalam and Hindi languages. Apart Proceedings on International Conference on Advances in Computing,
from statistical approach this rule based method can give a Control, and Telecommunication Technologies, pages 439441, 2009
correct translation for most of the inputted sentences. The [8] Mary Priya Sebastian, G Santhosh Kumar,English to malayalam trans-
dictionary content is an other factor that affects the correct- lation: a statistical approach. In Proceedings of the 1st Amrita ACM-W
Celebration on Women in Computing in India, page 64. ACM, 2010.
ness of translations. With more words in dictionary number
[9] Nishtha Jaiswal, Renu Balyan, and Anuradha Sharma. A step towards
of translated sentences can be increased.For the evaluation human-machine unication using translation memory and machine trans-
precision and recall value are used. The formula used for lation system. In International Conference on Languages, Literature and
calculating the precision and recall are given below.where Linguistics, pages 6468.2011.
the candidate solution is generated by the proposed machine [10] Raghavendra Udupa U, Tanveer A. Faruquie , An English-Hindi Sta-
translation system and reference solution is generated with tistical Machine Translation System, First International Joint Conference
human translation.The obtained results for both translations 2004, Hainan Island, China,p.315-325
with different corpus are modeled as graph shown in Fig.9. [11] Roberto Navigli, Word Sense Disambiguation: A Survey, ACM Com-
puting Surveys (CSUR) 2009,Volume 41, Issue 2, Article No. 10
Precision =Number of word in the candidate solution [12] Jignashu Parikh, Pushpak Bhattacharyya , Interlingua-based English-
correctly aligned with reference solution / Number of words Hindi Machine Translation and Language Divergence, Machine Trans-
in candidate solution lation, Volume 16, Issue 4 , p.251-304.
226