Sie sind auf Seite 1von 38

Sanskrit and Natural Language Processing

Dr.Srinivasa Varakhedi
Center for Advanced Studies and Research in Shabdabodha and NLP

RASHTRIYA SANSKRIT VIDYAPEETHA DEEMED UNIVERSITY Tirupati(A.P)

Dream of a bee..

j& Mi i |i* x =ni i {REV&** <il Sxii EMi u* xi xi xx MV =VV**

Present situation of Sanskrit


Sanskrit colleges are like 'zoo'! No Govt. support unless we are productive Humanities and Languages are being neglected How far this support will continue ? Great tradition of learning is being lost No scope for novel research

Innovation is the key

Sanskrit Shastras are competent enough to enter the science world Move out of Humanities and get merged with science Analogy : Maths, psychology, Logic. We must find practical approach for these Sanskrit Sciences.

we have lost 80%

Meemamsa - No practical approach ! Nyaya - No use in modern dialectics ? Vyakarana No application ??

What to do ?

Relevance of Sanskrit Shastras in Modern Technology

fortunately these shastras are found relevent in todays technology


Computing ideas in Panini Text processing principles in Meemamsa Formal languages in Nyaya

we lack the technology and application area Story of Babbage!!!

Massage of Acharya Shankara Bhagavatpada


avidyayaa mrtyum tiirtvaa.. vidyayaa amrtamashnute.. - Ishavasya Uapanishad Sri Shankara Bhagavatpada comments on this .. avidyaa = karma ; vidyaa = knowledge

Opportunity

Emerging Info technology has provided a great oportunity to survive Mi ixihJ OJOh E ?

Solve a major contemporary problem like MT basing on the shastras Get new openings for Sanskritists Open a new avenue for research

Know How

Ultimate aim :finding appropriate place for sanskrit Shastras Method: solutions to contemporory problems adopting modern technology Resource needed : Adequate manpower, who act as a bridge between modern scientists and technologists one side and sanskrit scholars on the other side.

Change the scenario


Technology

Western Theories INDIAN THEORIES

Opportunities missed

Industrial revolution

We missed this with some hasty decisions

IT revolution

Indians are serving in the level of coding ; not in designing level ! we should take this advantage

Knowledge Revolution

Need of the hour

we need

to understand how technology works to understand the contempomporary problems we will be able to give solutions in the light of sashtras and show the relevence of Indian theories

Then

History and Progress

Conference held at Bangalore in Dec 1987 on Knowledge Representation and Sanskritam generated tremendous interest Nothing much has been archived, except some efforts and projects here and there in small scale that too in technical institutions Time running out ! What progress has been made since then?

Complexity of the problem


Different Goal : Two disciplines Technology and Shastras - are developed in different context Paradigm difference : Modern Scholars are accustomed to visual teaching method, Traditional Pandits on the other hand prefer oral tradition Language Barrier : Both of them do not understand each others language !

The tuning in of the dialogue will take time

Who would bell the cat ?

It needs a long interaction between technologists and Traditional Sanskrit Scholars Technical institutions are always ready for such activities There is NO much interest is seen in Sanskrit Institutions It is we Sanskritists should to bell the cat

Long process like extraction of ghee from milk

Nothing miracle happens in the initial stage Its a big challenge, one OR two persons are not enough We need hundreds of dedicated persons to achieve a small goal

A person can climb a small hill ; Team can climb the Everest

Identifying the problem

Analogy:- Braman in Upanishads

what is Brahman?

we can NOT show it as it is impercievable. we can NOT describe it as it is beyond words.

Hence , we can direct you towards that by way of negating what we know.

(+{) - JSxpxvix&

Possible areas

Machine Translation Speech Processing Summary Extraction from huge texts Indo Wordnet as a base for IL-wordnets Developing Tools for IL Researchers Knowledge Representation schemes

Machine Translation

English To Indian Languages


Word sense disambiguation Karaka & Syntax Relation Word-grouping Idiomatic Expression Shabdasutra

MT among Indian Languages


Bi-language Electronic Dictionaries Karaka & Vibhakti Relation

Major MT systems

India

Angla-Bharati, IIT Kanpur Shakti, IIIT Hyderabad Mantra, CDAC Pune SaHiT (Sanskrit Hindi Translator), CSS, JNU Anusaaraka (RSV, HCU, IIIT)

Major MT systems

Outside India

UNITRAN BabelFish AltaVista (Systran) ATR (bimodal, Japan) JANUS (bimodal, US-Germany) SLT (SRI, Cambridge) VERBMOBIL (Germany) DIPLOMAT (Carnegie-Mellon)

Get a 125 page directory of available MT systems at http://ourworld.compuserve.com/homepages/WJHutchins/Compendium-11.pdf

Summary Extraction

Meemamsa Principles applied to extract the summary of a text

Upakramaadi Tatparya Lingas are used to extract the summary of a text in Indian Institute of Science, Bangalore, in our consultancy.

Wordnet / Concept-net based on NN ontology

Wordnet is an electronic lexical reference resource system designed on the basis of semantic relations of words

Synonymy {Graha, nivaasa,.} Hypernymy {Amra, vriksha, vanaspati} Antonnymy {Shreemaan, akinchana} Mecronymy {nAsika, mukha, shariira..} Gradation {Shushka,tara,.tama}

Knowledge Engineering

Representation

For Data representation, several databse management systems are available. For representing and retrieving useful information, there are various worked out methodologies Finally Knowledge Representation needs special treatment where Indian Knowledge systems can be applied

Knowledge and its importance in AI


AI researchers are interested in building Intelligent systems Web technologies looking forward to Semantic webs instead of syntactic web Knowledge is more valuable than data and Information Data simple DoB. Info Age calculated. Knowledge the judgment about suitability for job at hand etc. This requires a lot of inputs from various K- sources.

Computational Linguistics and Paninis Grammar

The structure of Paninian Grammar is nothing but a computer program Babbage ! It has captured the base of universal principles of all languages CL requires formal rules for analysis and generation of language Slowly Chomsky and others are turning towards Panini

The System of Panini

Phonetic component

Rule base

Phonemes pratyahara

Lexicon

Vidhi (operations) Samjna paribhasha (metarules) adhikara (headings) atidea (extension) niyama (restriction) Dhatupaatha Ganapaatha Affixes Rule specific items

Lists

Paninian Model for Sentence Analysis


Action Central theme Karakas Syntactico-semantic roles Visheshana-Visheshyabhava Concept of anabhihitein switching to different voice Vivakshaa Intention of speaker Form and meaning

Navya Nyaya -> AI ?


Classify Nyaya into five parts .. 1. Ontology 2. Epistemology 3. Technical Language 4. Semantics 5. Art of debate and fallacies

Ontology
Includes Categories - Substance, Quality etc., Relations SamavAya, SvarUpa Universals Types or classes Ontology helps to various areas like NLP, K-Repr, K-Engg, especially in Cognitive sciences.

Epistemology
Deals with Cognitive process Cognitive structure It helps to solve the problems of cognitive sciences and K-repr.

Technical Language

NNL is a Restricted Language that has both the features power of mechanism of Artificial Languages and power of of expression of Natural Languages.

The basic ideas behind this language will be helpful in Knowledge Represenation.

Semantics

Way of analysis of semantics shown by Navya Naiyayikas has been crucially found helpful in NLP and Machine Translation

Eg. Classification of words rUdha, yoga Syntactical analysis Power of definitions KR & NN

Semantics in MT

Lexicography

Word/concepts nets based NN ontology Classification of padas (words)


Rudha word has convention I.e names Yougik word has etymological meaningcook, driver, Yoga-rudha which has etymology as well as conventionCD-driver

WSD using different techniques

Definitions of Karaka relation without any overlap


Kartrtvam = kriyAnukUlakritimattvam Karmattvam = para-samaveta-kriyA-janyaphala-Ashrayatvam


Going Rama and Forest Who is going where ? Result contact is possible in Rama too.. To avoid such overlap, this def. Is useful

Refinement of karaka Relations

Classification of Karma

Karma Reachable, understandable so on.

Analysis of root semantics

Leave He left the place / left from the place Rats killed cats

Analysis of expectancy (AkAnkshA)

To infinity relation

I stand up to speak I want o speak He goes to London to study law He wants to study law in London To walk in mornings is good for health

Namaste!

Special thanks to The authorities of Sri Chandrashekharendra Sarasvati Vishvamahavidyalaya Kanchipuram