Sivaji Bandyopadhyay

Also published as: Sivaji B, Sivaji Bandopadhyay, Sivaju Bandyopadhyay


2023

pdf bib
Transfer learning in low-resourced MT: An empirical study
Sainik Kumar Mahata | Dipanjan Saha | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 20th International Conference on Natural Language Processing (ICON)

Translation systems rely on a large and goodquality parallel corpus for producing reliable translations. However, obtaining such a corpus for low-resourced languages is a challenge. New research has shown that transfer learning can mitigate this issue by augmenting lowresourced MT systems with high-resourced ones. In this work, we explore two types of transfer learning techniques, namely, crosslingual transfer learning and multilingual training, both with information augmentation, to examine the degree of performance improvement following the augmentation. Furthermore, we use languages of the same family (Romanic, in our case), to investigate the role of the shared linguistic property, in producing dependable translations.

pdf bib
A comparative study of transformer and transfer learning MT models for English-Manipuri
Kshetrimayum Boynao Singh | Ningthoujam Avichandra Singh | Loitongbam Sanayai Meetei | Ningthoujam Justwant Singh | Thoudam Doren Singh | Sivaji Bandyopadhyay
Proceedings of the 20th International Conference on Natural Language Processing (ICON)

In this work, we focus on the development of machine translation (MT) models of a lowresource language pair viz. English-Manipuri. Manipuri is one of the eight scheduled languages of the Indian constitution. Manipuri is currently written in two different scripts: one is its original script called Meitei Mayek and the other is the Bengali script. We evaluate the performance of English-Manipuri MT models based on transformer and transfer learning technique. Our MT models are trained using a dataset of 69,065 parallel sentences and validated on 500 sentences. Using 500 test sentences, the English to Manipuri MT models achieved a BLEU score of 19.13 and 29.05 with mT5 and OpenNMT respectively. The results demonstrate that the OpenNMT model significantly outperforms the mT5 model. Additionally, Manipuri to English MT system trained with OpenNMT model reported a BLEU score of 30.90. We also carried out a comparative analysis between the Bengali script and the transliterated Meitei Mayek script for English-Manipuri MT models. This analysis reveals that the transliterated version enhances the MT model performance resulting in a notable +2.35 improvement in the BLEU score.

pdf bib
NITS-CNLP Low-Resource Neural Machine Translation Systems of English-Manipuri Language Pair
Kshetrimayum Boynao Singh | Avichandra Singh Ningthoujam | Loitongbam Sanayai Meetei | Sivaji Bandyopadhyay | Thoudam Doren Singh
Proceedings of the Eighth Conference on Machine Translation

This paper describes the transformer-based Neural Machine translation (NMT) system for the Low-Resource Indic Language Translation task for the English-Manipuri language pair submitted by the Centre for Natural Language Processing in National Institute of Technology Silchar, India (NITS-CNLP) in the WMT 2023 shared task. The model attained an overall BLEU score of 22.75 and 26.92 for the English to Manipuri and Manipuri to English translations respectively. Experimental results for English to Manipuri and Manipuri to English models for character level n-gram F-score (chrF) of 48.35 and 48.64, RIBES of 0.61 and 0.65, TER of 70.02 and 67.62, as well as COMET of 0.70 and 0.66 respectively are reported.

2022

pdf bib
Investigation of Multilingual Neural Machine Translation for Indian Languages
Sahinur Rahman Laskar | Riyanka Manna | Partha Pakray | Sivaji Bandyopadhyay
Proceedings of the 9th Workshop on Asian Translation

In the domain of natural language processing, machine translation is a well-defined task where one natural language is automatically translated to another natural language. The deep learning-based approach of machine translation, known as neural machine translation attains remarkable translational performance. However, it requires a sufficient amount of training data which is a critical issue for low-resource pair translation. To handle the data scarcity problem, the multilingual concept has been investigated in neural machine translation in different settings like many-to-one and one-to-many translation. WAT2022 (Workshop on Asian Translation 2022) organizes (hosted by the COLING 2022) Indic tasks: English-to-Indic and Indic-to-English translation tasks where we have participated as a team named CNLP-NITS-PP. Herein, we have investigated a transliteration-based approach, where Indic languages are transliterated into English script and shared sub-word level vocabulary during the training phase. We have attained BLEU scores of 2.0 (English-to-Bengali), 1.10 (English-to-Assamese), 4.50 (Bengali-to-English), and 3.50 (Assamese-to-English) translation, respectively.

pdf bib
English to Bengali Multimodal Neural Machine Translation using Transliteration-based Phrase Pairs Augmentation
Sahinur Rahman Laskar | Pankaj Dadure | Riyanka Manna | Partha Pakray | Sivaji Bandyopadhyay
Proceedings of the 9th Workshop on Asian Translation

Automatic translation of one natural language to another is a popular task of natural language processing. Although the deep learning-based technique known as neural machine translation (NMT) is a widely accepted machine translation approach, it needs an adequate amount of training data, which is a challenging issue for low-resource pair translation. Moreover, the multimodal concept utilizes text and visual features to improve low-resource pair translation. WAT2022 (Workshop on Asian Translation 2022) organizes (hosted by the COLING 2022) English to Bengali multimodal translation task where we have participated as a team named CNLP-NITS-PP in two tracks: 1) text-only and 2) multimodal translation. Herein, we have proposed a transliteration-based phrase pairs augmentation approach which shows improvement in the multimodal translation task and achieved benchmark results on Bengali Visual Genome 1.0 dataset. We have attained the best results on the challenge and evaluation test set for English to Bengali multimodal translation with BLEU scores of 28.70, 43.90 and RIBES scores of 0.688931, 0.780669, respectively.

pdf bib
Investigation of English to Hindi Multimodal Neural Machine Translation using Transliteration-based Phrase Pairs Augmentation
Sahinur Rahman Laskar | Rahul Singh | Md Faizal Karim | Riyanka Manna | Partha Pakray | Sivaji Bandyopadhyay
Proceedings of the 9th Workshop on Asian Translation

Machine translation translates one natural language to another, a well-defined natural language processing task. Neural machine translation (NMT) is a widely accepted machine translation approach, but it requires a sufficient amount of training data, which is a challenging issue for low-resource pair translation. Moreover, the multimodal concept utilizes text and visual features to improve low-resource pair translation. WAT2022 (Workshop on Asian Translation 2022) organizes (hosted by the COLING 2022) English to Hindi multimodal translation task where we have participated as a team named CNLP-NITS-PP in two tracks: 1) text-only and 2) multimodal translation. Herein, we have proposed a transliteration-based phrase pairs augmentation approach, which shows improvement in the multimodal translation task. We have attained the second best results on the challenge test set for English to Hindi multimodal translation with BLEU score of 39.30, and a RIBES score of 0.791468.

pdf bib
Image Caption Generation for Low-Resource Assamese Language
Prachurya Nath | Prottay Kumar Adhikary | Pankaj Dadure | Partha Pakray | Riyanka Manna | Sivaji Bandyopadhyay
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)

Image captioning is a prominent Artificial Intelligence (AI) research area that deals with visual recognition and a linguistic description of the image. It is an interdisciplinary field concerning how computers can see and understand digital images& videos, and describe them in a language known to humans. Constructing a meaningful sentence needs both structural and semantic information of the language. This paper highlights the contribution of image caption generation for the Assamese language. The unavailability of an image caption generation system for the Assamese language is an open problem for AI-NLP researchers, and it’s just an early stage of the research. To achieve our defined objective, we have used the encoder-decoder framework, which combines the Convolutional Neural Networks and the Recurrent Neural Networks. The experiment has been tested on Flickr30k and Coco Captions dataset, which have been originally present in the English language. We have translated these datasets into Assamese language using the state-of-the-art Machine Translation (MT) system for our designed work.

pdf bib
CNLP-NITS-PP at WANLP 2022 Shared Task: Propaganda Detection in Arabic using Data Augmentation and AraBERT Pre-trained Model
Sahinur Rahman Laskar | Rahul Singh | Abdullah Faiz Ur Rahman Khilji | Riyanka Manna | Partha Pakray | Sivaji Bandyopadhyay
Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)

In today’s time, online users are regularly exposed to media posts that are propagandistic. Several strategies have been developed to promote safer media consumption in Arabic to combat this. However, there is a limited available multilabel annotated social media dataset. In this work, we have used a pre-trained AraBERT twitter-base model on an expanded train data via data augmentation. Our team CNLP-NITS-PP, has achieved the third rank in subtask 1 at WANLP-2022, for propaganda detection in Arabic (shared task) in terms of micro-F1 score of 0.602.

pdf bib
CNLP-NITS-PP at MixMT 2022: Hinglish-English Code-Mixed Machine Translation
Sahinur Rahman Laskar | Rahul Singh | Shyambabu Pandey | Riyanka Manna | Partha Pakray | Sivaji Bandyopadhyay
Proceedings of the Seventh Conference on Machine Translation (WMT)

The mixing of two or more languages in speech or text is known as code-mixing. In this form of communication, users mix words and phrases from multiple languages. Code-mixing is very common in the context of Indian languages due to the presence of multilingual societies. The probability of the existence of code-mixed sentences in almost all Indian languages since in India English is the dominant language for social media textual communication platforms. We have participated in the WMT22 shared task of code-mixed machine translation with the team name: CNLP-NITS-PP. In this task, we have prepared a synthetic Hinglish–English parallel corpus using transliteration of original Hindi sentences to tackle the limitation of the parallel corpus, where, we mainly considered sentences that have named-entity (proper noun) from the available English-Hindi parallel corpus. With the addition of synthetic bi-text data to the original parallel corpus (train set), our transformer-based neural machine translation models have attained recall-oriented understudy for gisting evaluation (ROUGE-L) scores of 0.23815, 0.33729, and word error rate (WER) scores of 0.95458, 0.88451 at Sub-Task-1 (English-to-Hinglish) and Sub-Task-2 (Hinglish-to-English) for test set results respectively.

2021

pdf bib
Proceedings of the 18th International Conference on Natural Language Processing (ICON)
Sivaji Bandyopadhyay | Sobha Lalitha Devi | Pushpak Bhattacharyya
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

pdf bib
An Experiment on Speech-to-Text Translation Systems for Manipuri to English on Low Resource Setting
Loitongbam Sanayai Meetei | Laishram Rahul | Alok Singh | Salam Michael Singh | Thoudam Doren Singh | Sivaji Bandyopadhyay
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

In this paper, we report the experimental findings of building Speech-to-Text translation systems for Manipuri-English on low resource setting which is first of its kind in this language pair. For this purpose, a new dataset consisting of a Manipuri-English parallel corpus along with the corresponding audio version of the Manipuri text is built. Based on this dataset, a benchmark evaluation is reported for the Manipuri-English Speech-to-Text translation using two approaches: 1) a pipeline model consisting of ASR (Automatic Speech Recognition) and Machine translation, and 2) an end-to-end Speech-to-Text translation. Gaussian Mixture Model-Hidden Markov Model (GMM-HMM) and Time delay neural network (TDNN) Acoustic models are used to build two different pipeline systems using a shared MT system. Experimental result shows that the TDNN model outperforms GMM-HMM model significantly by a margin of 2.53% WER. However, their evaluation of Speech-to-Text translation differs by a small margin of 0.1 BLEU. Both the pipeline translation models outperform the end-to-end translation model by a margin of 2.6 BLEU score.

pdf bib
On the Transferability of Massively Multilingual Pretrained Models in the Pretext of the Indo-Aryan and Tibeto-Burman Languages
Salam Michael Singh | Loitongbam Sanayai Meetei | Alok Singh | Thoudam Doren Singh | Sivaji Bandyopadhyay
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

In recent times, machine translation models can learn to perform implicit bridging between language pairs never seen explicitly during training and showing that transfer learning helps for languages with constrained resources. This work investigates the low resource machine translation via transfer learning from multilingual pre-trained models i.e. mBART-50 and mT5-base in the pretext of Indo-Aryan (Assamese and Bengali) and Tibeto-Burman (Manipuri) languages via finetuning as a downstream task. Assamese and Manipuri were absent in the pretraining of both mBART-50 and the mT5 models. However, the experimental results attest that the finetuning from these pre-trained models surpasses the multilingual model trained from scratch.

pdf bib
An Efficient Keyframes Selection Based Framework for Video Captioning
Alok Singh | Loitongbam Sanayai Meetei | Salam Michael Singh | Thoudam Doren Singh | Sivaji Bandyopadhyay
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

Describing a video is a challenging yet attractive task since it falls into the intersection of computer vision and natural language generation. The attention-based models have reported the best performance. However, all these models follow similar procedures, such as segmenting videos into chunks of frames or sampling frames at equal intervals for visual encoding. The process of segmenting video into chunks or sampling frames at equal intervals causes encoding of redundant visual information and requires additional computational cost since a video consists of a sequence of similar frames and suffers from inescapable noise such as uneven illumination, occlusion and motion effects. In this paper, a boundary-based keyframes selection approach for video description is proposed that allow the system to select a compact subset of keyframes to encode the visual information and generate a description for a video without much degradation. The proposed approach uses 3 4 frames per video and yields competitive performance over two benchmark datasets MSVD and MSR-VTT (in both English and Hindi).

pdf bib
EnKhCorp1.0: An English–Khasi Corpus
Sahinur Rahman Laskar | Abdullah Faiz Ur Rahman Khilji Darsh Kaushik | Partha Pakray | Sivaji Bandyopadhyay
Proceedings of the 4th Workshop on Technologies for MT of Low Resource Languages (LoResMT2021)

In machine translation, corpus preparation is one of the crucial tasks, particularly for lowresource pairs. In multilingual countries like India, machine translation plays a vital role in communication among people with various linguistic backgrounds. There are available online automatic translation systems by Google and Microsoft which include various languages which lack support for the Khasi language, which can hence be considered lowresource. This paper overviews the development of EnKhCorp1.0, a corpus for English–Khasi pair, and implemented baseline systems for EnglishtoKhasi and KhasitoEnglish translation based on the neural machine translation approach.

pdf bib
Improved English to Hindi Multimodal Neural Machine Translation
Sahinur Rahman Laskar | Abdullah Faiz Ur Rahman Khilji | Darsh Kaushik | Partha Pakray | Sivaji Bandyopadhyay
Proceedings of the 8th Workshop on Asian Translation (WAT2021)

Machine translation performs automatic translation from one natural language to another. Neural machine translation attains a state-of-the-art approach in machine translation, but it requires adequate training data, which is a severe problem for low-resource language pairs translation. The concept of multimodal is introduced in neural machine translation (NMT) by merging textual features with visual features to improve low-resource pair translation. WAT2021 (Workshop on Asian Translation 2021) organizes a shared task of multimodal translation for English to Hindi. We have participated the same with team name CNLP-NITS-PP in two submissions: multimodal and text-only NMT. This work investigates phrase pairs injection via data augmentation approach and attains improvement over our previous work at WAT2020 on the same task in both text-only and multimodal NMT. We have achieved second rank on the challenge test set for English to Hindi multimodal translation where Bilingual Evaluation Understudy (BLEU) score of 39.28, Rank-based Intuitive Bilingual Evaluation Score (RIBES) 0.792097, and Adequacy-Fluency Metrics (AMFM) score 0.830230 respectively.

pdf bib
Neural Machine Translation for Tamil–Telugu Pair
Sahinur Rahman Laskar | Bishwaraj Paul | Prottay Kumar Adhikary | Partha Pakray | Sivaji Bandyopadhyay
Proceedings of the Sixth Conference on Machine Translation

The neural machine translation approach has gained popularity in machine translation because of its context analysing ability and its handling of long-term dependency issues. We have participated in the WMT21 shared task of similar language translation on a Tamil-Telugu pair with the team name: CNLP-NITS. In this task, we utilized monolingual data via pre-train word embeddings in transformer model based neural machine translation to tackle the limitation of parallel corpus. Our model has achieved a bilingual evaluation understudy (BLEU) score of 4.05, rank-based intuitive bilingual evaluation score (RIBES) score of 24.80 and translation edit rate (TER) score of 97.24 for both Tamil-to-Telugu and Telugu-to-Tamil translations respectively.

pdf bib
Sentiment Classification of Code-Mixed Tweets using Bi-Directional RNN and Language Tags
Sainik Mahata | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages

Sentiment analysis tools and models have been developed extensively throughout the years, for European languages. In contrast, similar tools for Indian Languages are scarce. This is because, state-of-the-art pre-processing tools like POS tagger, shallow parsers, etc., are not readily available for Indian languages. Although, such working tools for Indian languages, like Hindi and Bengali, that are spoken by the majority of the population, are available, finding the same for less spoken languages like, Tamil, Telugu, and Malayalam, is difficult. Moreover, due to the advent of social media, the multi-lingual population of India, who are comfortable with both English ad their regional language, prefer to communicate by mixing both languages. This gives rise to massive code-mixed content and automatically annotating them with their respective sentiment labels becomes a challenging task. In this work, we take up a similar challenge of developing a sentiment analysis model that can work with English-Tamil code-mixed data. The proposed work tries to solve this by using bi-directional LSTMs along with language tagging. Other traditional methods, based on classical machine learning algorithms have also been discussed in the literature, and they also act as the baseline systems to which we will compare our Neural Network based model. The performance of the developed algorithm, based on Neural Network architecture, garnered precision, recall, and F1 scores of 0.59, 0.66, and 0.58 respectively.

pdf bib
Proceedings of the First Workshop on Multimodal Machine Translation for Low Resource Languages (MMTLRL 2021)
Thoudam Doren Singh | Cristina España i Bonet | Sivaji Bandyopadhyay | Josef van Genabith
Proceedings of the First Workshop on Multimodal Machine Translation for Low Resource Languages (MMTLRL 2021)

pdf bib
Multiple Captions Embellished Multilingual Multi-Modal Neural Machine Translation
Salam Michael Singh | Loitongbam Sanayai Meetei | Thoudam Doren Singh | Sivaji Bandyopadhyay
Proceedings of the First Workshop on Multimodal Machine Translation for Low Resource Languages (MMTLRL 2021)

Neural machine translation based on bilingual text with limited training data suffers from lexical diversity, which lowers the rare word translation accuracy and reduces the generalizability of the translation system. In this work, we utilise the multiple captions from the Multi-30K dataset to increase the lexical diversity aided with the cross-lingual transfer of information among the languages in a multilingual setup. In this multilingual and multimodal setting, the inclusion of the visual features boosts the translation quality by a significant margin. Empirical study affirms that our proposed multimodal approach achieves substantial gain in terms of the automatic score and shows robustness in handling the rare word translation in the pretext of English to/from Hindi and Telugu translation tasks.

pdf bib
Low Resource Multimodal Neural Machine Translation of English-Hindi in News Domain
Loitongbam Sanayai Meetei | Thoudam Doren Singh | Sivaji Bandyopadhyay
Proceedings of the First Workshop on Multimodal Machine Translation for Low Resource Languages (MMTLRL 2021)

Incorporating multiple input modalities in a machine translation (MT) system is gaining popularity among MT researchers. Unlike the publicly available dataset for Multimodal Machine Translation (MMT) tasks, where the captions are short image descriptions, the news captions provide a more detailed description of the contents of the images. As a result, numerous named entities relating to specific persons, locations, etc., are found. In this paper, we acquire two monolingual news datasets reported in English and Hindi paired with the images to generate a synthetic English-Hindi parallel corpus. The parallel corpus is used to train the English-Hindi Neural Machine Translation (NMT) and an English-Hindi MMT system by incorporating the image feature paired with the corresponding parallel corpus. We also conduct a systematic analysis to evaluate the English-Hindi MT systems with 1) more synthetic data and 2) by adding back-translated data. Our finding shows improvement in terms of BLEU scores for both the NMT (+8.05) and MMT (+11.03) systems.

2020

pdf bib
Hindi-Marathi Cross Lingual Model
Sahinur Rahman Laskar | Abdullah Faiz Ur Rahman Khilji | Partha Pakray | Sivaji Bandyopadhyay
Proceedings of the Fifth Conference on Machine Translation

Machine Translation (MT) is a vital tool for aiding communication between linguistically separate groups of people. The neural machine translation (NMT) based approaches have gained widespread acceptance because of its outstanding performance. We have participated in WMT20 shared task of similar language translation on Hindi-Marathi pair. The main challenge of this task is by utilization of monolingual data and similarity features of similar language pair to overcome the limitation of available parallel data. In this work, we have implemented NMT based model that simultaneously learns bilingual embedding from both the source and target language pairs. Our model has achieved Hindi to Marathi bilingual evaluation understudy (BLEU) score of 11.59, rank-based intuitive bilingual evaluation score (RIBES) score of 57.76 and translation edit rate (TER) score of 79.07 and Marathi to Hindi BLEU score of 15.44, RIBES score of 61.13 and TER score of 75.96.

pdf bib
The NITS-CNLP System for the Unsupervised MT Task at WMT 2020
Salam Michael Singh | Thoudam Doren Singh | Sivaji Bandyopadhyay
Proceedings of the Fifth Conference on Machine Translation

We describe NITS-CNLP’s submission to WMT 2020 unsupervised machine translation shared task for German language (de) to Upper Sorbian (hsb) in a constrained setting i.e, using only the data provided by the organizers. We train our unsupervised model using monolingual data from both the languages by jointly pre-training the encoder and decoder and fine-tune using backtranslation loss. The final model uses the source side (de) monolingual data and the target side (hsb) synthetic data as a pseudo-parallel data to train a pseudo-supervised system which is tuned using the provided development set(dev set).

pdf bib
Zero-Shot Neural Machine Translation: Russian-Hindi @LoResMT 2020
Sahinur Rahman Laskar | Abdullah Faiz Ur Rahman Khilji | Partha Pakray | Sivaji Bandyopadhyay
Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages

Neural machine translation (NMT) is a widely accepted approach in the machine translation (MT) community, translating from one natural language to another natural language. Although, NMT shows remarkable performance in both high and low resource languages, it needs sufficient training corpus. The availability of a parallel corpus in low resource language pairs is one of the challenging tasks in MT. To mitigate this issue, NMT attempts to utilize a monolingual corpus to get better at translation for low resource language pairs. Workshop on Technologies for MT of Low Resource Languages (LoResMT 2020) organized shared tasks of low resource language pair translation using zero-shot NMT. Here, the parallel corpus is not used and only monolingual corpora is allowed. We have participated in the same shared task with our team name CNLP-NITS for the Russian-Hindi language pair. We have used masked sequence to sequence pre-training for language generation (MASS) with only monolingual corpus following the unsupervised NMT architecture. The evaluated results are declared at the LoResMT 2020 shared task, which reports that our system achieves the bilingual evaluation understudy (BLEU) score of 0.59, precision score of 3.43, recall score of 5.48, F-measure score of 4.22, and rank-based intuitive bilingual evaluation score (RIBES) of 0.180147 in Russian to Hindi translation. And for Hindi to Russian translation, we have achieved BLEU, precision, recall, F-measure, and RIBES score of 1.11, 4.72, 4.41, 4.56, and 0.026842 respectively.

pdf bib
EnAsCorp1.0: English-Assamese Corpus
Sahinur Rahman Laskar | Abdullah Faiz Ur Rahman Khilji | Partha Pakray | Sivaji Bandyopadhyay
Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages

The corpus preparation is one of the important challenging task for the domain of machine translation especially in low resource language scenarios. Country like India where multiple languages exists, machine translation attempts to minimize the communication gap among people with different linguistic backgrounds. Although Google Translation covers automatic translation of various languages all over the world but it lags in some languages including Assamese. In this paper, we have developed EnAsCorp1.0, corpus of English-Assamese low resource pair where parallel and monolingual data are collected from various online sources. We have also implemented baseline systems with statistical machine translation and neural machine translation approaches for the same corpus.

pdf bib
English to Manipuri and Mizo Post-Editing Effort and its Impact on Low Resource Machine Translation
Loitongbam Sanayai Meetei | Thoudam Doren Singh | Sivaji Bandyopadhyay | Mihaela Vela | Josef van Genabith
Proceedings of the 17th International Conference on Natural Language Processing (ICON)

We present the first study on the post-editing (PE) effort required to build a parallel dataset for English-Manipuri and English-Mizo, in the context of a project on creating data for machine translation (MT). English source text from a local daily newspaper are machine translated into Manipuri and Mizo using PBSMT systems built in-house. A Computer Assisted Translation (CAT) tool is used to record the time, keystroke and other indicators to measure PE effort in terms of temporal and technical effort. A positive correlation between the technical effort and the number of function words is seen for English-Manipuri and English-Mizo but a negative correlation between the technical effort and the number of noun words for English-Mizo. However, average time spent per token in PE English-Mizo text is negatively correlated with the temporal effort. The main reason for these results are due to (i) English and Mizo using the same script, while Manipuri uses a different script and (ii) the agglutinative nature of Manipuri. Further, we check the impact of training a MT system in an incremental approach, by including the post-edited dataset as additional training data. The result shows an increase in HBLEU of up to 4.6 for English-Manipuri.

pdf bib
JUNLP@ICON2020: Low Resourced Machine Translation for Indic Languages
Sainik Mahata | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 17th International Conference on Natural Language Processing (ICON): Adap-MT 2020 Shared Task

In the current work, we present the description of the systems submitted to a machine translation shared task organized by ICON 2020: 17th International Conference on Natural Language Processing. The systems were developed to show the capability of general domain machine translation when translating into Indic languages, English-Hindi, in our case. The paper shows the training process and quantifies the performance of two state-of-the-art translation systems, viz., Statistical Machine Translation and Neural Machine Translation. While Statistical Machine Translation systems work better in a low-resource setting, Neural Machine Translation systems are able to generate sentences that are fluent in nature. Since both these systems have contrasting advantages, a hybrid system, incorporating both, was also developed to leverage all the strong points. The submitted systems garnered BLEU scores of 8.701943312, 0.6361336198, and 11.78873307 respectively and the scores of the hybrid system helped us to the fourth spot in the competition leaderboard.

pdf bib
Multimodal Neural Machine Translation for English to Hindi
Sahinur Rahman Laskar | Abdullah Faiz Ur Rahman Khilji | Partha Pakray | Sivaji Bandyopadhyay
Proceedings of the 7th Workshop on Asian Translation

Machine translation (MT) focuses on the automatic translation of text from one natural language to another natural language. Neural machine translation (NMT) achieves state-of-the-art results in the task of machine translation because of utilizing advanced deep learning techniques and handles issues like long-term dependency, and context-analysis. Nevertheless, NMT still suffers low translation quality for low resource languages. To encounter this challenge, the multi-modal concept comes in. The multi-modal concept combines textual and visual features to improve the translation quality of low resource languages. Moreover, the utilization of monolingual data in the pre-training step can improve the performance of the system for low resource language translations. Workshop on Asian Translation 2020 (WAT2020) organized a translation task for multimodal translation in English to Hindi. We have participated in the same in two-track submission, namely text-only and multi-modal translation with team name CNLP-NITS. The evaluated results are declared at the WAT2020 translation task, which reports that our multi-modal NMT system attained higher scores than our text-only NMT on both challenge and evaluation test set. For the challenge test data, our multi-modal neural machine translation system achieves Bilingual Evaluation Understudy (BLEU) score of 33.57, Rank-based Intuitive Bilingual Evaluation Score (RIBES) 0.754141, Adequacy-Fluency Metrics (AMFM) score 0.787320 and for evaluation test data, BLEU, RIBES, and, AMFM score of 40.51, 0.803208, and 0.820980 for English to Hindi translation respectively.

2019

pdf bib
Development of POS tagger for English-Bengali Code-Mixed data
Tathagata Raha | Sainik Mahata | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 16th International Conference on Natural Language Processing

Code-mixed texts are widespread nowadays due to the advent of social media. Since these texts combine two languages to formulate a sentence, it gives rise to various research problems related to Natural Language Processing. In this paper, we try to excavate one such problem, namely, Parts of Speech tagging of code-mixed texts. We have built a system that can POS tag English-Bengali code-mixed data where the Bengali words were written in Roman script. Our approach initially involves the collection and cleaning of English-Bengali code-mixed tweets. These tweets were used as a development dataset for building our system. The proposed system is a modular approach that starts by tagging individual tokens with their respective languages and then passes them to different POS taggers, designed for different languages (English and Bengali, in our case). Tags given by the two systems are later joined together and the final result is then mapped to a universal POS tag set. Our system was checked using 100 manually POS tagged code-mixed sentences and it returned an accuracy of 75.29%.

pdf bib
English to Hindi Multi-modal Neural Machine Translation and Hindi Image Captioning
Sahinur Rahman Laskar | Rohit Pratap Singh | Partha Pakray | Sivaji Bandyopadhyay
Proceedings of the 6th Workshop on Asian Translation

With the widespread use of Machine Trans-lation (MT) techniques, attempt to minimizecommunication gap among people from di-verse linguistic backgrounds. We have par-ticipated in Workshop on Asian Transla-tion 2019 (WAT2019) multi-modal translationtask. There are three types of submissiontrack namely, multi-modal translation, Hindi-only image captioning and text-only transla-tion for English to Hindi translation. The mainchallenge is to provide a precise MT output. The multi-modal concept incorporates textualand visual features in the translation task. Inthis work, multi-modal translation track re-lies on pre-trained convolutional neural net-works (CNN) with Visual Geometry Grouphaving 19 layered (VGG19) to extract imagefeatures and attention-based Neural MachineTranslation (NMT) system for translation. The merge-model of recurrent neural network(RNN) and CNN is used for the Hindi-onlyimage captioning. The text-only translationtrack is based on the transformer model of theNMT system. The official results evaluated atWAT2019 translation task, which shows thatour multi-modal NMT system achieved Bilin-gual Evaluation Understudy (BLEU) score20.37, Rank-based Intuitive Bilingual Eval-uation Score (RIBES) 0.642838, Adequacy-Fluency Metrics (AMFM) score 0.668260 forchallenge test data and BLEU score 40.55,RIBES 0.760080, AMFM score 0.770860 forevaluation test data in English to Hindi multi-modal translation respectively.

pdf bib
WAT2019: English-Hindi Translation on Hindi Visual Genome Dataset
Loitongbam Sanayai Meetei | Thoudam Doren Singh | Sivaji Bandyopadhyay
Proceedings of the 6th Workshop on Asian Translation

A multimodal translation is a task of translating a source language to a target language with the help of a parallel text corpus paired with images that represent the contextual details of the text. In this paper, we carried out an extensive comparison to evaluate the benefits of using a multimodal approach on translating text in English to a low resource language, Hindi as a part of WAT2019 shared task. We carried out the translation of English to Hindi in three separate tasks with both the evaluation and challenge dataset. First, by using only the parallel text corpora, then through an image caption generation approach and, finally with the multimodal approach. Our experiment shows a significant improvement in the result with the multimodal approach than the other approach.

pdf bib
JUMT at WMT2019 News Translation Task: A Hybrid Approach to Machine Translation for Lithuanian to English
Sainik Kumar Mahata | Avishek Garain | Adityar Rayala | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

In the current work, we present a description of the system submitted to WMT 2019 News Translation Shared task. The system was created to translate news text from Lithuanian to English. To accomplish the given task, our system used a Word Embedding based Neural Machine Translation model to post edit the outputs generated by a Statistical Machine Translation model. The current paper documents the architecture of our model, descriptions of the various modules and the results produced using the same. Our system garnered a BLEU score of 17.6.

pdf bib
Neural Machine Translation: Hindi-Nepali
Sahinur Rahman Laskar | Partha Pakray | Sivaji Bandyopadhyay
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

With the extensive use of Machine Translation (MT) technology, there is progressively interest in directly translating between pairs of similar languages. Because the main challenge is to overcome the limitation of available parallel data to produce a precise MT output. Current work relies on the Neural Machine Translation (NMT) with attention mechanism for the similar language translation of WMT19 shared task in the context of Hindi-Nepali pair. The NMT systems trained the Hindi-Nepali parallel corpus and tested, analyzed in Hindi ⇔ Nepali translation. The official result declared at WMT19 shared task, which shows that our NMT system obtained Bilingual Evaluation Understudy (BLEU) score 24.6 for primary configuration in Nepali to Hindi translation. Also, we have achieved BLEU score 53.7 (Hindi to Nepali) and 49.1 (Nepali to Hindi) in contrastive system type.

2018

pdf bib
WME 3.0: An Enhanced and Validated Lexicon of Medical Concepts
Anupam Mondal | Dipankar Das | Erik Cambria | Sivaji Bandyopadhyay
Proceedings of the 9th Global Wordnet Conference

Information extraction in the medical domain is laborious and time-consuming due to the insufficient number of domain-specific lexicons and lack of involvement of domain experts such as doctors and medical practitioners. Thus, in the present work, we are motivated to design a new lexicon, WME 3.0 (WordNet of Medical Events), which contains over 10,000 medical concepts along with their part of speech, gloss (descriptive explanations), polarity score, sentiment, similar sentiment words, category, affinity score and gravity score features. In addition, the manual annotators help to validate the overall as well as individual category level of medical concepts of WME 3.0 using Cohen’s Kappa agreement metric. The agreement score indicates almost correct identification of medical concepts and their assigned features in WME 3.0.

pdf bib
A Content-based Recommendation System for Medical Concepts: Disease and Symptom
Anupam Mondal | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 15th International Conference on Natural Language Processing

pdf bib
SMT vs NMT: A Comparison over Hindi and Bengali Simple Sentences
Sainik Kumar Mahata | Soumil Mandal | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 15th International Conference on Natural Language Processing

pdf bib
JUCBNMT at WMT2018 News Translation Task: Character Based Neural Machine Translation of Finnish to English
Sainik Kumar Mahata | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

In the current work, we present a description of the system submitted to WMT 2018 News Translation Shared task. The system was created to translate news text from Finnish to English. The system used a Character Based Neural Machine Translation model to accomplish the given task. The current paper documents the preprocessing steps, the description of the submitted system and the results produced using the same. Our system garnered a BLEU score of 12.9.

2017

pdf bib
BUCC2017: A Hybrid Approach for Identifying Parallel Sentences in Comparable Corpora
Sainik Mahata | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 10th Workshop on Building and Using Comparable Corpora

A Statistical Machine Translation (SMT) system is always trained using large parallel corpus to produce effective translation. Not only is the corpus scarce, it also involves a lot of manual labor and cost. Parallel corpus can be prepared by employing comparable corpora where a pair of corpora is in two different languages pointing to the same domain. In the present work, we try to build a parallel corpus for French-English language pair from a given comparable corpus. The data and the problem set are provided as part of the shared task organized by BUCC 2017. We have proposed a system that first translates the sentences by heavily relying on Moses and then group the sentences based on sentence length similarity. Finally, the one to one sentence selection was done based on Cosine Similarity algorithm.

pdf bib
Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017)
Sivaji Bandyopadhyay
Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017)

pdf bib
Relationship Extraction based on Category of Medical Concepts from Lexical Contexts
Anupam Mondal | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017)

pdf bib
Retrieving Similar Lyrics for Music Recommendation System
Braja Gopal Patra | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017)

2016

pdf bib
JU_NLP at SemEval-2016 Task 6: Detecting Stance in Tweets using Support Vector Machines
Braja Gopal Patra | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
JU_NLP at SemEval-2016 Task 11: Identifying Complex Words in a Sentence
Niloy Mukherjee | Braja Gopal Patra | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
JU-USAAR: A Domain Adaptive MT System
Koushik Pahari | Alapan Kuila | Santanu Pal | Sudip Kumar Naskar | Sivaji Bandyopadhyay | Josef van Genabith
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
Genetic Algorithm (GA) Implementation for Feature Selection in Manipuri POS Tagging
Kishorjit Nongmeikapam | Sivaji Bandyopadhyay
Proceedings of the 13th International Conference on Natural Language Processing

pdf bib
Statistical Natural Language Generation from Tabular Non-textual Data
Joy Mahapatra | Sudip Kumar Naskar | Sivaji Bandyopadhyay
Proceedings of the 9th International Natural Language Generation conference

pdf bib
Multimodal Mood Classification - A Case Study of Differences in Hindi and Western Songs
Braja Gopal Patra | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Music information retrieval has emerged as a mainstream research area in the past two decades. Experiments on music mood classification have been performed mainly on Western music based on audio, lyrics and a combination of both. Unfortunately, due to the scarcity of digitalized resources, Indian music fares poorly in music mood retrieval research. In this paper, we identified the mood taxonomy and prepared multimodal mood annotated datasets for Hindi and Western songs. We identified important audio and lyric features using correlation based feature selection technique. Finally, we developed mood classification systems using Support Vector Machines and Feed Forward Neural Networks based on the features collected from audio, lyrics, and a combination of both. The best performing multimodal systems achieved F-measures of 75.1 and 83.5 for classifying the moods of the Hindi and Western songs respectively using Feed Forward Neural Networks. A comparative analysis indicates that the selected features work well for mood classification of the Western songs and produces better results as compared to the mood classification systems for Hindi songs.

pdf bib
WME: Sense, Polarity and Affinity based Concept Resource for Medical Events
Anupam Mondal | Dipankar Das | Erik Cambria | Sivaji Bandyopadhyay
Proceedings of the 8th Global WordNet Conference (GWC)

In order to overcome the lack of medical corpora, we have developed a WordNet for Medical Events (WME) for identifying medical terms and their sense related information using a seed list. The initial WME resource contains 1654 medical terms or concepts. In the present research, we have reported the enhancement of WME with 6415 number of medical concepts along with their conceptual features viz. Parts-of-Speech (POS), gloss, semantics, polarity, sense and affinity. Several polarity lexicons viz. SentiWordNet, SenticNet, Bing Liu’s subjectivity list and Taboda’s adjective list were introduced with WordNet synonyms and hyponyms for expansion. The semantics feature guided us to build a semantic co-reference relation based network between the related medical concepts. These features help to prepare a medical concept network for better sense relation based visualization. Finally, we evaluated with respect to Adaptive Lesk Algorithm and conducted an agreement analysis for validating the expanded WME resource.

2015

pdf bib
Mood Classification of Hindi Songs based on Lyrics
Braja Gopal Patra | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 12th International Conference on Natural Language Processing

2014

pdf bib
Word Alignment-Based Reordering of Source Chunks in PB-SMT
Santanu Pal | Sudip Kumar Naskar | Sivaji Bandyopadhyay
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Reordering poses a big challenge in statistical machine translation between distant language pairs. The paper presents how reordering between distant language pairs can be handled efficiently in phrase-based statistical machine translation. The problem of reordering between distant languages has been approached with prior reordering of the source text at chunk level to simulate the target language ordering. Prior reordering of the source chunks is performed in the present work by following the target word order suggested by word alignment. The testset is reordered using monolingual MT trained on source and reordered source. This approach of prior reordering of the source chunks was compared with pre-ordering of source words based on word alignments and the traditional approach of prior source reordering based on language-pair specific reordering rules. The effects of these reordering approaches were studied on an English–Bengali translation task, a language pair with different word order. From the experimental results it was found that word alignment based reordering of the source chunks is more effective than the other reordering approaches, and it produces statistically significant improvements over the baseline system on BLEU. On manual inspection we found significant improvements in terms of word alignments.

pdf bib
JU_CSE: A Conditional Random Field (CRF) Based Approach to Aspect Based Sentiment Analysis
Braja Gopal Patra | Soumik Mandal | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
How Sentiment Analysis Can Help Machine Translation
Santanu Pal | Braja Gopal Patra | Dipankar Das | Sudip Kumar Naskar | Sivaji Bandyopadhyay | Josef van Genabith
Proceedings of the 11th International Conference on Natural Language Processing

pdf bib
Manipuri Chunking: An Incremental Model with POS and RMWE
Kishorjit Nongmeikapam | Thiyam Ibungomacha Singh | Ngariyanbam Mayekleima Chanu | Sivaji Bandyopadhyay
Proceedings of the 11th International Conference on Natural Language Processing

2013

pdf bib
Improving MT System Using Extracted Parallel Fragments of Text from Comparable Corpora
Rajdeep Gupta | Santanu Pal | Sivaji Bandyopadhyay
Proceedings of the Sixth Workshop on Building and Using Comparable Corpora

pdf bib
A Hybrid Word Alignment Model for Phrase-Based Statistical Machine Translation
Santanu Pal | Sudip Naskar | Sivaji Bandyopadhyay
Proceedings of the Second Workshop on Hybrid Approaches to Translation

pdf bib
Proceedings of the 3rd Workshop on Sentiment Analysis where AI meets Psychology
Sivaji Bandyopadhyay | Manabu Okumura
Proceedings of the 3rd Workshop on Sentiment Analysis where AI meets Psychology

pdf bib
Automatic Music Mood Classification of Hindi Songs
Braja Gopal Patra | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 3rd Workshop on Sentiment Analysis where AI meets Psychology

pdf bib
Event and Event Actor Alignment in Phrase Based Statistical Machine Translation
Anup Kolya | Santanu Pal | Asif Ekbal | Sivaji Bandyopadhyay
Proceedings of the 11th Workshop on Asian Language Resources

pdf bib
On Application of Conditional Random Field in Stemming of Bengali Natural Language Text
Sandipan Sarkar | Sivaji Bandyopadhyay
Proceedings of the 4th Workshop on South and Southeast Asian Natural Language Processing

pdf bib
JU_CSE: A CRF Based Approach to Annotation of Temporal Expression, Event and Temporal Relations
Anup Kumar Kolya | Amitava Kundu | Rajdeep Gupta | Asif Ekbal | Sivaji Bandyopadhyay
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

pdf bib
Construction of Emotional Lexicon Using Potts Model
Braja Gopal Patra | Hiroya Takamura | Dipankar Das | Manabu Okumura | Sivaji Bandyopadhyay
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
An Empirical Study of Combing Multiple Models in Bengali Question Classification
Somnath Banerjee | Sivaji Bandyopadhyay
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
MWE Alignment in Phrase Based Statistical Machine Translation
Santanu Pal | Sudip Kumar Naskar | Sivaji Bandyopadhyay
Proceedings of Machine Translation Summit XIV: Papers

pdf bib
Emotion Co-referencing - Emotional Expression, Holder, and Topic
Dipankar Das | Sivaji Bandyopadhyay
International Journal of Computational Linguistics & Chinese Language Processing, Volume 18, Number 1, March 2013

2012

pdf bib
A Light Weight Stemmer in Kokborok
Braja Gopal Patra | Khumbar Debbarma | Swapan Debbarma | Dipankar Das | Amitava Das | Sivaji Bandyopadhyay
Proceedings of the 24th Conference on Computational Linguistics and Speech Processing (ROCLING 2012)

pdf bib
Bootstrapping Method for Chunk Alignment in Phrase Based SMT
Santanu Pal | Sivaji Bandyopadhyay
Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)

pdf bib
Detection and Correction of Preposition and Determiner Errors in English: HOO 2012
Pinaki Bhaskar | Aniruddha Ghosh | Santanu Pal | Sivaji Bandyopadhyay
Proceedings of the Seventh Workshop on Building Educational Applications Using NLP

pdf bib
Bengali Question Classification: Towards Developing QA System
Somnath Banerjee | Sivaji Bandyopadhyay
Proceedings of the 3rd Workshop on South and Southeast Asian Natural Language Processing

pdf bib
Morphological Analyzer for Kokborok
Khumbar Debbarma | Braja Gopal Patra | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 3rd Workshop on South and Southeast Asian Natural Language Processing

pdf bib
Manipuri Morpheme Identification
Kishorjit Nongmeikapam | Vidya Raj RK | Nirmal Y | Sivaji B
Proceedings of the 3rd Workshop on South and Southeast Asian Natural Language Processing

pdf bib
Proceedings of the 2nd Workshop on Sentiment Analysis where AI meets Psychology
Sivaji Bandyopadhyay | Manabu Okumura
Proceedings of the 2nd Workshop on Sentiment Analysis where AI meets Psychology

pdf bib
Classification of Interviews - A Case Study on Cancer Patients
Braja Gopal Patra | Amitava Kundu | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 2nd Workshop on Sentiment Analysis where AI meets Psychology

pdf bib
Question Classification and Answering from Procedural Text in English
Somnath Banerjee | Sivaji Bandyopadhyay
Proceedings of the Workshop on Question Answering for Complex Domains

pdf bib
Part of Speech (POS) Tagger for Kokborok
Braja Gopal Patra | Khumbar Debbarma | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of COLING 2012: Posters

pdf bib
Keyphrase Extraction in Scientific Articles: A Supervised Approach
Pinaki Bhaskar | Kishorjit Nongmeikapam | Sivaji Bandyopadhyay
Proceedings of COLING 2012: Demonstration Papers

pdf bib
JU_CSE_NLP: Multi-grade Classification of Semantic Similarity between Text Pairs
Snehasis Neogi | Partha Pakray | Sivaji Bandyopadhyay | Alexander Gelbukh
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf bib
JU_CSE_NLP: Language Independent Cross-lingual Textual Entailment System
Snehasis Neogi | Partha Pakray | Sivaji Bandyopadhyay | Alexander Gelbukh
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

2011

pdf bib
Dr Sentiment Knows Everything!
Amitava Das | Sivaji Bandyopadhyay
Proceedings of the ACL-HLT 2011 System Demonstrations

pdf bib
Semantic Clustering: an Attempt to Identify Multiword Expressions in Bengali
Tanmoy Chakraborty | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World

pdf bib
Identifying Event-Sentiment Association using Lexical Equivalence and Co-reference Approaches
Anup Kolya | Dipankar Das | Asif Ekbal | Sivaji Bandyopadhyay
Proceedings of the ACL 2011 Workshop on Relational Models of Semantics

pdf bib
Shared Task System Description: Measuring the Compositionality of Bigrams using Statistical Methodologies
Tanmoy Chakraborty | Santanu Pal | Tapabrata Mondal | Tanik Saikh | Sivaju Bandyopadhyay
Proceedings of the Workshop on Distributional Semantics and Compositionality

pdf bib
Developing Japanese WordNet Affect for Analyzing Emotions
Yoshimitsu Torii | Dipankar Das | Sivaji Bandyopadhyay | Manabu Okumura
Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA 2.011)

pdf bib
May I check the English of your paper!!!
Pinaki Bhaskar | Aniruddha Ghosh | Santanu Pal | Sivaji Bandyopadhyay
Proceedings of the 13th European Workshop on Natural Language Generation

pdf bib
A Rule Based Approach for Analysis of Comparative or Evaluative Questions in Tourism Domain
Bidhan Chandra Pal | Pinaki Bhaskar | Sivaji Bandyopadhyay
Proceedings of the KRAQ11 workshop

pdf bib
Proceedings of the Workshop on Sentiment Analysis where AI meets Psychology (SAAIP 2011)
Sivaji Bandyopadhyay | Manabu Okumura
Proceedings of the Workshop on Sentiment Analysis where AI meets Psychology (SAAIP 2011)

pdf bib
Analyzing Emotional Statements – Roles of General and Physiological Variables
Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the Workshop on Sentiment Analysis where AI meets Psychology (SAAIP 2011)

pdf bib
Integration of Reduplicated Multiword Expressions and Named Entities in a Phrase Based Statistical Machine Translation System
Thoudam Doren Singh | Sivaji Bandyopadhyay
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Handling Multiword Expressions in Phrase-Based Statistical Machine Translation
Santanu Pal | Tanmoy Chakraborty | Sivaji Bandyopadhyay
Proceedings of Machine Translation Summit XIII: Papers

pdf bib
A Hybrid Approach for Event Extraction and Event Actor Identification
Anup Kumar Kolya | Asif Ekbal | Sivaji Bandyopadhyay
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

2010

pdf bib
Discerning Emotions of Bloggers based on Topics – a Supervised Coreference Approach in Bengali
Dipankar Das | Sivaji Bandyopadhyay
ROCLING 2010 Poster Papers

pdf bib
JU: A Supervised Approach to Identify Semantic Relations from Paired Nominals
Santanu Pal | Partha Pakray | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 5th International Workshop on Semantic Evaluation

pdf bib
JU_CSE_TEMP: A First Step towards Evaluating Events, Time Expressions and Temporal Relations
Anup Kumar Kolya | Asif Ekbal | Sivaji Bandyopadhyay
Proceedings of the 5th International Workshop on Semantic Evaluation

pdf bib
Identifying Emotional Expressions, Intensities and Sentence Level Emotion Tags Using a Supervised Framework
Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation

pdf bib
A Supervised Machine Learning Approach for Event-Event Relation Identification
Anup Kumar Kolya | Asif Ekbal | Sivaji Bandyopadhyay
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation

pdf bib
A Query Focused Multi Document Automatic Summarization
Pinaki Bhaskar | Sivaji Bandyopadhyay
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation

pdf bib
Finding Emotion Holder from Bengali Blog Texts—An Unsupervised Syntactic Approach
Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation

pdf bib
Towards the Global SentiWordNet
Amitava Das | Sivaji Bandyopadhyay
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation

pdf bib
English to Indian Languages Machine Transliteration System at NEWS 2010
Amitava Das | Tanik Saikh | Tapabrata Mondal | Asif Ekbal | Sivaji Bandyopadhyay
Proceedings of the 2010 Named Entities Workshop

pdf bib
Labeling Emotion in Bengali Blog Corpus – A Fine Grained Tagging at Sentence Level
Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the Eighth Workshop on Asian Language Resouces

pdf bib
SentiWordNet for Indian Languages
Amitava Das | Sivaji Bandyopadhyay
Proceedings of the Eighth Workshop on Asian Language Resouces

pdf bib
SemanticNet-Perception of Human Pragmatics
Amitava Das | Sivaji Bandyopadhyay
Proceedings of the 2nd Workshop on Cognitive Aspects of the Lexicon

pdf bib
Clause Identification and Classification in Bengali
Aniruddha Ghosh | Amitava Das | Sivaji Bandyopadhyay
Proceedings of the 1st Workshop on South and Southeast Asian Natural Language Processing

pdf bib
Web Based Manipuri Corpus for Multiword NER and Reduplicated MWEs Identification using SVM
Thoudam Doren Singh | Sivaji Bandyopadhyay
Proceedings of the 1st Workshop on South and Southeast Asian Natural Language Processing

pdf bib
Automatic Extraction of Complex Predicates in Bengali
Dipankar Das | Santanu Pal | Tapabrata Mondal | Tanmoy Chakraborty | Sivaji Bandyopadhyay
Proceedings of the 2010 Workshop on Multiword Expressions: from Theory to Applications

pdf bib
Handling Named Entities and Compound Verbs in Phrase-Based Statistical Machine Translation
Santanu Pal | Sudip Kumar Naskar | Pavel Pecina | Sivaji Bandyopadhyay | Andy Way
Proceedings of the 2010 Workshop on Multiword Expressions: from Theory to Applications

pdf bib
Identification of Reduplication in Bengali Corpus and their Semantic Analysis: A Rule Based Approach
Tanmoy Chakraborty | Sivaji Bandyopadhyay
Proceedings of the 2010 Workshop on Multiword Expressions: from Theory to Applications

pdf bib
Manipuri-English Bidirectional Statistical Machine Translation Systems using Morphology and Dependency Relations
Thoudam Doren Singh | Sivaji Bandyopadhyay
Proceedings of the 4th Workshop on Syntax and Structure in Statistical Translation

pdf bib
JU_CSE_GREC10: Named Entity Generation at GREC 2010
Amitava Das | Tanik Saikh | Tapabrata Mondal | Sivaji Bandyopadhyay
Proceedings of the 6th International Natural Language Generation Conference

pdf bib
Topic-Based Bengali Opinion Summarization
Amitava Das | Sivaji Bandyopadhyay
Coling 2010: Posters

2009

pdf bib
Voted Approach for Part of Speech Tagging in Bengali
Asif Ekbal | Md. Hasanuzzaman | Sivaji Bandyopadhyay
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 1

pdf bib
Named Entity Recognition for Manipuri Using Support Vector Machine
Thoudam Doren Singh | Kishorjit Nongmeikapam | Asif Ekbal | Sivaji Bandyopadhyay
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 2

pdf bib
Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies (CLIAWS3)
Sivaji Bandyopadhyay | Pushpak Bhattacharyya | Vasudeva Varma | Sudeshna Sarkar | A Kumaran | Raghavendra Udupa
Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies (CLIAWS3)

pdf bib
JUNLG-MSR: A Machine Learning Approach of Main Subject Reference Selection with Rule Based Improvement
Samir Gupta | Sivaji Bandopadhyay
Proceedings of the 2009 Workshop on Language Generation and Summarisation (UCNLG+Sum 2009)

pdf bib
Bengali Verb Subcategorization Frame Acquisition - A Baseline Model
Somnath Banerjee | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 7th Workshop on Asian Language Resources (ALR7)

pdf bib
English to Hindi Machine Transliteration System at NEWS 2009
Amitava Das | Asif Ekbal | Tapabrata Mondal | Sivaji Bandyopadhyay
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)

pdf bib
Voted NER System using Appropriate Unlabeled Data
Asif Ekbal | Sivaji Bandyopadhyay
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)

pdf bib
Word to Sentence Level Emotion Tagging for Bengali Blogs
Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

2008

pdf bib
JU-PTBSGRE: GRE Using Prefix Tree Based Structure
Sibabrata Paladhi | Sivaji Bandyopadhyay
Proceedings of the Fifth International Natural Language Generation Conference

pdf bib
Coling 2008: Proceedings of the workshop Multi-source Multilingual Information Extraction and Summarization
Sivaji Bandyopadhyay | Thierry Poibeau | Horacio Saggion | Roman Yangarber
Coling 2008: Proceedings of the workshop Multi-source Multilingual Information Extraction and Summarization

pdf bib
Multi-Engine Approach for Named Entity Recognition in Bengali
Asif Ekbal | Sivaji Bandyopadhyay
Proceedings of the 22nd Pacific Asia Conference on Language, Information and Computation

pdf bib
Named Entity Recognition in Bengali: A Conditional Random Field Approach
Asif Ekbal | Rejwanul Haque | Sivaji Bandyopadhyay
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II

pdf bib
Generation of Referring Expression Using Prefix Tree Structure
Sibabrata Paladhi | Sivaji Bandyopadhyay
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II

pdf bib
Design of a Rule-based Stemmer for Natural Language Text in Bengali
Sandipan Sarkar | Sivaji Bandyopadhyay
Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages

pdf bib
Morphology Driven Manipuri POS Tagger
Thoudam Doren Singh | Sivaji Bandyopadhyay
Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages

pdf bib
Invited Talk: Multilingual Named Entity Recognition
Sivaji Bandyopadhyay
Proceedings of the IJCNLP-08 Workshop on Named Entity Recognition for South and South East Asian Languages

pdf bib
Language Independent Named Entity Recognition in Indian Languages
Asif Ekbal | Rejwanul Haque | Amitava Das | Venkateswarlu Poka | Sivaji Bandyopadhyay
Proceedings of the IJCNLP-08 Workshop on Named Entity Recognition for South and South East Asian Languages

pdf bib
Bengali Named Entity Recognition Using Support Vector Machine
Asif Ekbal | Sivaji Bandyopadhyay
Proceedings of the IJCNLP-08 Workshop on Named Entity Recognition for South and South East Asian Languages

pdf bib
A Document Graph Based Query Focused Multi-Document Summarizer
Sibabrata Paladhi | Sivaji Bandyopadhyay
Proceedings of the 2nd workshop on Cross Lingual Information Access (CLIA) Addressing the Information Need of Multilingual Societies

pdf bib
Bengali, Hindi and Telugu to English Ad-hoc Bilingual Task
Sivaji Bandyopadhyay | Tapabrata Mondal | Sudip Kumar Naskar | Asif Ekbal | Rejwanul Haque | Srinivasa Rao Godavarthy
Proceedings of the 2nd workshop on Cross Lingual Information Access (CLIA) Addressing the Information Need of Multilingual Societies

pdf bib
Development of Bengali Named Entity Tagged Corpus and its Use in NER Systems
Asif Ekbal | Sivaji Bandyopadhyay
Proceedings of the 6th Workshop on Asian Language Resources

2007

pdf bib
JU-SKNSB: Extended WordNet Based WSD on the English All-Words Task at SemEval-1
Sudip Kumar Naskar | Sivaji Bandyopadhyay
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

2006

pdf bib
A Modified Joint Source-Channel Model for Transliteration
Asif Ekbal | Sudip Kumar Naskar | Sivaji Bandyopadhyay
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf bib
Dialogue based Question Answering System in Telugu
Rami Reddy | Nandi Reddy | Sivaji Bandyopadhyay
Proceedings of the Workshop on Multilingual Question Answering - MLQA ‘06

pdf bib
Handling of Prepositions in English to Bengali Machine Translation
Sudip Kumar Naskar | Sivaji Bandyopadhyay
Proceedings of the Third ACL-SIGSEM Workshop on Prepositions

2005

pdf bib
A Phrasal EBMT System for Translating English to Bengali
Sudip Kumar Naskar | Sivaji Bandyopadhyay
Proceedings of Machine Translation Summit X: Posters

The present work describes a Phrasal Example Based Machine Translation system from English to Bengali that identifies the phrases in the input through a shallow analysis, retrieves the target phrases using a Phrasal Example base and finally combines the target language phrases employing some heuristics based on the phrase ordering rules for Bengali. The paper focuses on the structure of the noun, verb and prepositional phrases in English and how these phrases are realized in Bengali. This study has an effect on the design of the phrasal Example Base and recombination rules for the target language phrases.

pdf bib
Use of Machine Translation in India: Current Status
Sudip Naskar | Sivaji Bandyopadhyay
Proceedings of Machine Translation Summit X: Posters

A survey of the machine translation systems that have been developed in India for translation from English to Indian languages and among Indian languages reveals that the MT softwares are used in field testing or are available as web translation service. These systems are also used for teaching machine translation to the students and researchers. Most of these systems are in the English-Hindi or Indian language-Indian language domain. The translation domains are mostly government documents/reports and news stories. There are a number of other MT systems that are at their various phases of development and have been demonstrated at various forums. Many of these systems cover other Indian languages beside Hindi.

pdf bib
A Semantics-based English-Bengali EBMT System for Translating News Headlines
Diganta Saha | Sivaji Bandyopadhyay
Workshop on example-based machine translation

The paper reports an Example based Machine Translation System for translating News Headlines from English to Bengali. The input headline is initially searched in the Direct Example Base. If it cannot be found, the input headline is tagged and the tagged headline is searched in the Generalized Tagged Example Base. If a match is obtained, the tagged headline in Bengali is retrieved from the example base, the output Bengali headline is generated after retrieving the Bengali equivalents of the English words from appropriate dictionaries and then applying relevant synthesis rules for generating the Bengali surface level words. If some named entities and acronyms are not present in the dictionary, transliteration scheme is applied for obtaining the Bengali equivalent. If a match is not found, the tagged input headline is analysed to identify the constituent phrase(s). The target translation is generated using English-Bengali phrasal example base, appropriate dictionaries and a set of heuristics for Bengali phrase reordering. If the headline still cannot be translated using example base strategy, a heuristic translation strategy will be applied. Any new input tagged headline along with its translation by the user will be inserted in the tagged Example base after generalization.

2002

pdf bib
Teaching MT - an Indian pespective
Sivaji Bandyopadhyay
Proceedings of the 6th EAMT Workshop: Teaching Machine Translation

2000

pdf bib
An example-based MT system in news items domain from English to Indian languages
Sivaji Bandyopadhyay
Proceedings of the International Conference on Machine Translation and Multilingual Applications in the new Millennium: MT 2000

pdf bib
Detection and Correction of Phonetic Errors with a New Orthographic Dictionary
Sivaji Bandyopadhyay
Proceedings of the 14th Pacific Asia Conference on Language, Information and Computation

Search
Co-authors