Abstract—In this paper, we describe a Rule Based method used for performing Word Sense Disambiguation task of Text in Marathi Language. In Marathi language which is spoken in Maharashtra state of India, many words are spelled same but semantically (meaning-wise/ sense-wise) different. Such words when need to be from translated from source language to target lead to ambiguity. Our method successfully identifies the correct sense of the given text from the predefined possible senses using word rules and sentence rules.
Index Terms—Rule Based Method, Word Sense Disambiguation, Marathi, WSD
I. INTRODUCTION
Natural languages are used for communication purpose. The words in every language possess their own importance. Vocabulary of a language allows
…show more content…
त्याला धडा पाठ आहे (He has learnt Lesson by heart)
In the above example पाठ word is ambiguous. In first sentence it’s sense is interpreted as Back (anatomical part/ body part), in second sentence it is representing sense as lesson to be learnt by heart of study.
The process of identifying the appropriate sense of a word as well as sentence is considered as Word Sense Disambiguation process. If the problem of Word Sense disambiguation is not handled carefully, it may lead to disastrous results in applications of NLP. A variety of approaches have been proposed to deal with disambiguation in natural language text.
In this paper we report our solution to tackle word sense disambiguation of Marathi language text. Marathi is the official language of state of Maharashtra in India.
The paper is distributed in following sections. Section 1 gives introduction about word sense disambiguation, section 2 details the efforts carried out in the world for dealing with ambiguity problem in various natural languages, section 3 describes our approach of WSD, section 4 is conclusion and last is references.
II. LITERATURE SURVEY
In this section we present literature survey carried out for various efforts done to address WSD
…show more content…
Example: अपवित्र_स्थान 01 03 1223 1102 0400 01 00000006 word=अपवित्र_स्थान pos=01 {*pos: 1(noun), 2(adj), 3(verb), 4(adv)} number of relations exists for word in all its senses=03 relations ids= 1223 1102 0400 number of senses=01 senses=00000006 Structure of onto_txt file is:
(synset_id) (pos) (number of words present in synset) (synset) (number of relations lexical as well as semantic) (four digit code relation id) (synset_id for which that relation exits) (gloss) (example sentence) pos: 1(noun), 2(adj), 3(verb), 4(adv)
Antonymy relations are represented with the help of two four digit code (first four digit represents relation type and second four digit represents the order of words from two synsets for which relation holds )
{onto id} {0001 if parent exists} {parent onto id} {|} {onto description}
Example: 00000002 0001 00000001 | संज्ञा (Noun) {N उदाहरण:-गाय,दूध,मिठाई इत्यादि} ontoid=00000002 parents exists id=0001 parent onto id= 00000001 description of onto id= संज्ञा (Noun) {N उदाहरण :- गाय,दूध,मिठाई इत्यादि}
Format of data_txt file:
(synset_id) (pos) (number of words present in synset) (synset) (number of relations lexical as well as semantic) (four digit code relation id) (synset_id for which that relation exists) (gloss) (example