Nisheeth Joshi
2024
System Description of BV-SLP for Sindhi-English Machine Translation in MultiIndic22MT 2024 Shared Task
Nisheeth Joshi
|
Pragya Katyayan
|
Palak Arora
|
Bharti Nathani
Proceedings of the Ninth Conference on Machine Translation
This paper presents our machine translation system that was developed for the WAT2024 MultiInidc MT shared task. We built our system for the Sindhi-English language pair. We developed two MT systems. The first system was our baseline system where Sindhi was translated into English. In the second system we used Hindi as a pivot for the translation of text. In both the cases we had identified the name entities and translated them into English as a preprocessing step. Once this was done, the standard NMT process was followed to train and generate MT outputs for the task. The systems were tested on the hidden dataset of the shared task
2021
Part of Speech Tagging for a Resource Poor Language : Sindhi in Devanagari Script using HMM and CRF
Bharti Nathani
|
Nisheeth Joshi
Proceedings of the 18th International Conference on Natural Language Processing (ICON)
Part of speech tagging is a pre-processing step of various NLP applications. Mainly it is used in Machine Translation. This research proposes two POS taggers, i.e., an HMM-based and CRF based tagger. To develop this tagger, the corpus of manually annotated 30,000 sentences has been prepared with the help of language experts. In this paper, we have developed POS taggers for Sindhi Language (in Devanagari Script), a resource poor language, using HMM (Hidden Markov Model) and Conditional Random Field (CRF).Evaluation results demonstrated the accuracies of 76.60714% and 88.79% in the HMM, and CRF, respectively.