Kishore Kashyap


2023

pdf bib
GUIT-NLP’s Submission to Shared Task: Low Resource Indic Language Translation
Mazida Ahmed | Kuwali Talukdar | Parvez Boruah | Prof. Shikhar Kumar Sarma | Kishore Kashyap
Proceedings of the Eighth Conference on Machine Translation

This paper describes the submission of the GUIT-NLP team in the “Shared Task: Low Resource Indic Language Translation” focusing on three low-resource language pairs: English-Mizo, English-Khasi, and English-Assamese. The initial phase involves an in-depth exploration of Neural Machine Translation (NMT) techniques tailored to the available data. Within this investigation, various Subword Tokenization approaches, model configurations (exploring differnt hyper-parameters etc.) of the general NMT pipeline are tested to identify the most effective method. Subsequently, we address the challenge of low-resource languages by leveraging monolingual data through an innovative and systematic application of the Back Translation technique for English-Mizo. During model training, the monolingual data is progressively integrated into the original bilingual dataset, with each iteration yielding higher-quality back translations. This iterative approach significantly enhances the model’s performance, resulting in a notable increase of +3.65 in BLEU scores. Further improvements of +5.59 are achieved through fine-tuning using authentic parallel data.

2019

pdf bib
Spoken WordNet
Kishore Kashyap | Shikhar Kr Sarma | Kumari Sweta
Proceedings of the 10th Global Wordnet Conference

WordNets have been used in a wide variety of applications, including in design and development of intelligent and human assisting systems. Although WordNet was initially developed as an online lexical database, (Miller, 1995 and Fellbaum, 1998) later developments have inspired using WordNet database as resources in NLP applications, Language Technology developments, and as sources of structured learned materials. This paper proposes, conceptualizes, designs, and develops a voice enabled information retrieval system, facilitating WordNet knowledge presentation in a spoken format, based on a spoken query. In practice, the work converts the WordNet resource into a structured voiced based knowledge extraction system, where a spoken query is processed in a pipeline, and then extracting the relevant WordNet resources, structuring through another process pipeline, and then presented in spoken format. Thus the system facilitates a speech interface to the existing WordNet and we named the system as “Spoken WordNet”. The system interacts with two interfaces, one designed and developed for Web, and the other as an App interface for smartphone. This is also a kind of restructuring the WordNet as a friendly version for visually challenged users. User can input query string in the form of spoken English sentence or word. Jaccard Similarity is calculated between the input sentence and the synset definitions. The one with highest similarity score is taken as the synset of interest among multiple available synsets. User is also prompted to choose a contextual synset, in case of ambiguities.